1. 1
FORECASTING ECONOMIC TIME SERIES DATA USING ARMA
Nagendra Belvadi V
Black Belt
and
Dr. Tansen Chaudhari
Head of Process Asia Pacific
Xchanging, Xchanging Towers, SJR IPark,
EPIP Area, Whitefield, Bangalore - 560 066. India.
Why is forecasting necessary?
The subject matter of forecasting is uncertainty. Uncertainty means no clarity of future. In a state of
uncertainty, organizations make decisions based on historical experience or even gut feeling. The
decisions thus taken through gut feeling are detrimental to organizations. This necessitates scientific
approach to decision making. Forecasting is one such scientific technique which helps
organizations/processes in decision making in the state of uncertainty. In this article I am making an
earnest effort to take you through one of the sophisticated forecasting techniques called ARMA (Auto
Regressive Integrated Moving Average).
Introduction: This article encompasses application of ARMA models in forecasting economic variables,
its merits, demerits and advantages of the model in comparison with conventional time series models.
ARMA is the acronym for “Autoregressive Moving Average”, invented by two great Statisticians George
Box and Gwilym Jenkins and hence ARMA models are also known as Box and Jenkins models. ARMA
models are suitable for high frequency data.
Since most of the economic time series data are non-stationary, a method called differencing is
employed to convert non-stationary data into stationary. The differenced series is regressed on to the
original series and hence ARMA model becomes ARIMA (Auto Regressive Integrated Moving Average).
Box Jenkins methodology: ARIMA models produce accurate forecasts based on the historical patterns of
the time series data. ARIMA belongs to the class of linear models and can represent both stationary and
non-stationary data. ARIMA models do not involve the dependent variable; instead they make use of
information in the series to generate the series itself. Stationary series is the one which vary about a
fixed value and non-stationary series do not vary about a fixed value. The seasonal ARIMA is
represented as below:
ARIMA (p, d, q) (P, D, Q)
Where (p-auto regressive parameter, d-order of differencing, q-moving average parameter) is the
regular model and (P, D, Q) are the seasonal elements.
2. 2
The model building process involves the following steps:
Model identification
The first step in the model identification is to determine whether the time series data is
stationary or non- stationary. The stationarity can be assessed either using Dickey Fuller test or
run sequence plots. If the series has either growing or declining trend the data is said to be
” non- stationary “and series with no trend is termed “stationary”. If the original series has no
trend then the series is an ideal candidate for ARIMA. If the original series has trend, the series
can be converted to stationary by differencing the series. If the series is non stationary, the
autocorrelations fail to die out rapidly and auto correlations die out rapidly for stationary series.
The order of differencing is zero for a stationary series and greater than zero for non- stationary
series. Based on sample auto correlations and partial auto correlations the regular and seasonal
parameters are determined.
Model parameter estimation
The estimation of parameters is of paramount importance in the model building exercise. The
parameters thus obtained are estimated statistically by the method of least squares. A t-statistic
shall be employed to test the parameters significance.
Model Diagnostics:
Once the parameters are statistically estimated, before forecasting the series, it is necessary to
check the adequacy of the tentatively identified model. The model is declared adequate if the
residuals cannot improve forecast anymore. In other words, residuals are random. To check the
overall model adequacy, the Ljung-Box Statistic is employed which follows a Chi-Square
distribution. The null hypothesis is either rejected or not rejected based on the low or high p-
value associated with the Statistic.
Forecasting:
Once the model adequacy is established the series in question shall be forecasted for specified
period. It is always advisable to keep track on the forecast errors and depending on the
magnitude of errors, the model shall be re-evaluated.
ARIMA when compared to other conventional models and methods is more robust in terms of
accuracy of the forecast as it takes seasonality into consideration. If the original series do not
exhibit season, non-seasonal ARIMA shall be fitted. The disadvantage of ARIMA building is that it
is tedious to build model manually without the aid of Statistical software(s). Also, there are
situations where in the final model may not fit the requirement due to error terms with non-
constant variances which is known as heteroskedasticity which are treated separately using
ARCH(Autoregressive Conditional Heteroskedasticity) and GARCH(Generalized Autoregressive
Conditional Heteroskedasticity) techniques.
3. 3
Demonstration Problem: Suppose an analyst in a service industry is interested in fitting ARIMA
model for a daily volume (volume of work) data and he collects daily volume data for 103
days.(forecast using an ARIMA using data series less than 50 data points are unreliable) The
very first step he does is understanding whether data is stationary or not. He studies time series
plots and applies Dickey Fuller augmented test to check stationarity as shown below:
1009080706050403020101
900
800
700
600
500
400
300
200
100
0
Index
Volume
Time Series Plot of Volume
Dickey Fuller Test Statistic -4.62799
p-value 0.01
Lag order 4
Ouput
The data in the above figure is not showing a trend like behavior and the same is true with Dickey Fuller
test with negative statistic and low p-value.
4. 4
2624222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
PartialAutocorrelation
Partial Autocorrelation Function for Volume
(with 5% significance limits for the partial autocorrelations)
2624222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
Autocorrelation
Autocorrelation Function for Volume
(with 5% significance limits for the autocorrelations)
The ACF and PACF shown above are not significant at any lags. Hence the analyst went ahead with first
differencing to study the regular parameters.
5. 5
2624222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
Autocorrelation
Autocorrelation Function for first diffrencing
(with 5% significance limits for the autocorrelations)
2624222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
PartialAutocorrelation
Partial Autocorrelation Function for first diffrencing
(with 5% significance limits for the partial autocorrelations)
After first differencing, the analyst found that the AR and MA parameters significant at lag 1 and 2
respectively and he identifies the regular model as (2, 1, 1) using PACF and ACF respectively.
Next the analyst moves on to identify parameters for seasonal model. Since the data is daily, he suspects
five working days as the season and takes fifth differencing on the first differenced series. The ACF and
PACFs help him to identify the seasonal elements (P, D, Q).
6. 6
24222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
Autocorrelation
Autocorrelation Function for fifth diffrencing
(with 5% significance limits for the autocorrelations)
24222018161412108642
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
Lag
PartialAutocorrelation
Partial Autocorrelation Function for fifth diffrencing
(with 5% significance limits for the partial autocorrelations)
From PACF it is evident that the partial autocorrelations are significant at periods 5 and 10 indicating
two seasons (one season is equal to five days) and concludes 2 as autoregressive parameter. Also,
moving average parameter is 1 from ACF. The seasonal model now comprise of parameters (2,1,1).
Now the analyst with the seasonal ARIMA (2,1,1)(2,1,1) forecasts the series under study for specified
periods and the Minitab output is given below:
8. 8
Observations: ACF for residuals are not significant and p-values greater than 0.05 corresponding to Q-
statistics indicates model adequacy. If adequacy is not established, the model needs to be diagnosed for
heteroskedasticity using ARCH & GARCH techniques which is beyond the scope of this article.
Summary: Though ARIMA models are robust enough for accurate forecasts, historically they have not
enjoyed wide usage in corporate world due to complexities involved in model identification and lack of
considerable Statistical knowledge. With the advent of sophisticated Statistical software(s) like SAS,
SPSS, R and Minitab, which outperform human brain, ARIMA is gaining prominence in service industries
like BPO, call centers etc., for capacity planning and scheduling. ARIMA models outperform conventional
time series models like moving average and other smoothing models in terms of degree of forecast
accuracy, treating seasonality and long term forecasts. The disadvantage of ARIMA compared to other
conventional models is that forecasts are unreliable with data series less than 50 data points. ARIMA
models are more sensitive to outliers present in the original series. However if the degree of accuracy is
not of great concern, conventional time series models shall be employed which are simple and less time
consuming.
(Note: Authors have not directly referred to any other existing papers/articles while writing this paper. If
it matches with the views expressed in any existing articles, it is purely coincidental)