2. Frequently there is a time lag between awareness of an impending
event or need and occurrence of the event
This lead-time is the main reason for planning and forecasting. If
the lead-time is zero or very small, there is no need for planning
If the lead-time is long, and the outcome of the final event is
conditional on identifiable factors, planning can perform an
important role
In management and administrative situations the need for
planning is great because the lead-time for decision making
ranges from several years to a few days or hours to a few second.
Therefore, forecasting is an important aid in effective and
efficient planning
Introduction
3. Forecasting is a prediction of what will occur in the future, and it
is an uncertain process
One of the most powerful methodologies for generating forecasts
is time series analysis
A data set containing observations on a single phenomenon
observed over multiple time periods is called time-series. In time
series data, both values and the ordering of the data points have
meaning. For many agricultural products, data are usually
collected over time
Introduction…
4. Time series analysis and its applications have become
increasingly important in various fields of research, such as
business, economics, agriculture, engineering, medicine, social
sciences, politics etc.
Realization of the fact that “ Time is Money " in business
activities, the time series analysis techniques presented here,
would be a necessary tool for applying to a wide range of
managerial decisions successfully where time and money are
directly related
Introduction
5. On the time scale we are standing at a certain point called the
point of reference (Yt) and we look backward over past
observations (Yt-1, Yt-2,…,Yt-n+1) and foreword into the future
(Ft+1, Ft+2, …,Ft+m)
Once a forecasting model has been selected, we fit the model
to the known data and obtain the fitted values. For the known
observations this allows calculation fitted errors (Yt-1- Ft-1)
A measure of goodness-of-fit of the model and as new
observations become available we can examine forecasting
errors (Yt+1- Ft+1)
Forecasting scenario
6. Measuring Forecast accuracy
n
1
t
t
e
n
1
ME
Mean error :
n
1
t
t |
e
|
n
1
MAE
Mean absolute error :
Mean squared error :
Percentage error (PE) :
n
1
t
2
t |
e
|
n
1
MSE
PEt = 100*(Yt - Ft )/Yt
Mean percentage error (MPE) :
n
1
t
t
PE
n
1
MPE
Mean absolute percentage error (MAPE) :
n
1
t
t |
PE
|
n
1
MAPE
7. Main components of time series data
There are four types of components in time series analysis
Seasonal component (S)
Trend component (T)
Cyclical component (C)
Irregular component ( I )
Yt = St , Tt , Ct , I
Time Series
Yt=S,T,C, I
Seasonal
removing using
smoothing
Trend removal
using
regression
Cyclical
removing using
% ratio
I
8. Moving averages
Smoothing techniques are used to reduce irregularities (random
fluctuations) in time series data
Moving averages rank among the most popular techniques for the
preprocessing of time series. They are used to filter random "white
noise" from the data, to make the time series smoother
There are several methods of Moving averages
Simple Moving Averages
Double moving averages
Centered Moving Average
Weighted Moving Average
9. Simple Moving Averages
Moving Averages (MA) is effective and efficient approach provided
the time series is stationary in both mean and variance
The simple moving average required an odd number of observations
to be included in each average at the middle of the data value being
averaged
Takes a certain number of past periods and add them together; then
divide by the number of periods gives the simple moving average
The following formula is used in finding the moving average of
order n, MA(n) for a period t+1
MAt+1 = [Yt + Yt-1 + ... +Yt-n+1] / n
10. Month Time
period
Observed
values
Three-month
moving
average 3 MA
Five-month
moving
average 5 MA
Jan 1 266.0 - -
Fab 2 145.9 -
-
-
-
Mar 3 183.1
Apr 4 119.3 198.333
149.433
160.900
156.033
193.533
208.267
216.367
180.067
217.400
215.100
238.900
-
May 5 180.3 -
Jun 6 168.5
Jul 7 231.8
Aug 8 224.5
Sep 9 192.8
Oct 10 122.9
Nov 11 336.5
Dec 12 185.9
Jan 13 194.3 .
Feb 14 149.5 .
Example
178.92
159.42
176.60
184.88
199.58
188.10
221.70
212.52
206.48
.
13. 1Centered Moving Average
Suppose we wish to calculate a moving average with an even
number of observations for example, to calculate a 4-term
moving average or 4 MA for, the data
The center of the fist moving average is at 2.5 while the center
of the second moving average is at 3.5
The average of the two moving averages is centered at 4
Therefore, this problem can be overcome by taking an
additional 2-period moving average of the 4-period moving
average
This centered moving average is denoted as 2 X 4 MA
14. 2.1Example
Month Time
period
Observed
values
Four-month
moving
average 4 MA
2 X 4 MA
Jan 1 266.0 -
178.6
-
Fab 2 145.9 -
Mar 3 183.1 157.6 -
Apr 4 119.3 162.8 -
May 5 180.3 174.9 -
Jun 6 168.5 201.3
Jul 7 231.8 204.4
Aug 8 224.5 193.0
Sep 9 192.8 219.2
Oct 10 122.9 209.5
Nov 11 336.5 209.9
Dec 12 185.9 216.6
Jan 13 194.3 .
Feb 14 149.5 .
167.863
159.975
168.887
188.125
202.837
198.700
206.088
214.350
209.712
167.863
15. This method is very powerful with comparing simple moving
averages
Weighted MA(3) can be expressed as,
Weighted MA(3) = w1.Yt + w2.Yt-1 + w3.Yt-2
where w1, w2, & w3 are weights
There are many schemes selecting appropriate weights (Kendall,
Stuart, and Ord (1983)
Weights are any positive numbers such that,
w1 + w2 + w3 = 1
One of the methods of calculating weights is,
w1 = 3/(1 + 2 + 3) = 3/6, w2 = 2/6, and w3 = 1/6
Weighted Moving Average
16. 2.Exponential Smoothing Techniques
One of the most successful forecasting methods is the exponential
smoothing (ES)
ES is an averaging technique that uses unequal weights and assigns
exponentially decreasing weights as the observations get older
There are several exponential smoothing techniques
Single Exponential Smoothing
Holt’s linear method
Holt-Winters’ trend and seasonality method
17. Single Exponential Smoothing
The method of single exponential forecasting takes the
forecast for the previous period and adjust it using the
forecast error. [(Forecast error = (Yt – Ft)]
Ft+1 = Ft + a (Yt - Ft)
Ft+1 = a Yt + (1 - a) Ft
where:
Yt is the actual value
Ft is the forecasted value
a is the weighting factor, which ranges from 0 to 1
t is the current time period
18. Choosing the Best Value for Parameter a (alpha)
In practice, the smoothing parameter is often chosen by a grid
search of the parameter space
That is, different solutions for = 0.1 to are tried starting with
0.9, with increments of 0.1.
Then is chosen so as to produce the smallest sums of squares (or
mean squares) for the residuals.
19. Month Time
period
Observed
values
Exponentially Smoothed values
a = 0.1 a = 0.5 a = 0.9
Jan 1 200.0
Feb 2 135.0 200.0 200.0 200.0
Mar 3 195.0 193.5 167.5 141.5
Apr 4 197.5 193.7 181.3 189.7
May 5 310.0 194.0 189.4 196.7
Jun 6 175.0 205.6 249.7 298.7
Jul 7 155.0 202.6 212.3 187.4
Aug 8 130.0 197.8 183.7 158.2
Sep 9 220.0 191.0 156.8 132.8
Oct 10 277.5 193.9 188.4 211.3
Nov 11 235.0 202.3 233.0 270.9
Dec 12 - 205.6 234.0 238.6
20. Analysis of Errors (Test period : 2 – 11)
a = 0.1 a = 0.5 a = 0.9
Mean Error 5.56 6.80 4.29
Mean Absolute Error 47.76 56.94 61.32
Mean Absolute percentage Error
(MAPE)
24.58 29.20 30.81
Mean Square Error (MSE) 3438.33 4347.24 5039.37
Theil’s U-statistics 0.81 0.92 0.98
22. Holt’s linear method
Holt (1957) extended single exponential smoothing to linear
exponential smoothing to allow forecasting of data with trends
The forecast for Hollt’s linear exponential smoothing is found
using two smoothing constants, a and (values between 0 & 1),
and three equations
)
)(
1
( 1
1
t
t
t
t b
L
Y
L a
a
1
1 )
1
(
)
(
t
t
t
t b
L
L
b
m
b
L
F t
t
m
t
Smoothing of data
Smoothing of trend
Forecast for m period a head
The initialization process : L1 = Y1 and
b1 = Y2 – Y1 or b1 = (Y4-Y1) / 3
23. Holt-Winters’ trend and seasonality method
Holt’s method was extended by Winters (1960) to capture
seasonality
The Holt-Winter’s method is based on three smoothing equations,
one for level, one for trend, and one for seasonality
)
)(
1
( 1
1
t
t
s
t
t
t b
L
S
Y
L a
a
1
1 )
1
(
)
(
t
t
t
t b
L
L
b
s
t
t
t
t S
L
Y
S
)
1
(
m
s
t
t
t
m
t S
m
b
L
F
)
(
Level
Trend
Seasonal
Forecast
25. Season Sales Average Sales Seasonal
Factor
Spring 200 250 200/250 = 0.8
Summer 350 250 350/250 = 1.4
Fall 300 250 300/250 = 1.2
Winter 150 250 150/250 = 0.6
Total 1000 1000
Seasonal Factor
Ratio-to-moving-average
26. Season Average Sales
(1100/4)
Next Year
Forecast
FORECAST
Spring 275 275*0.8 220
Summer 275 275*1.4 385
Fall 275 275*1.2 330
Winter 275 275*0.6 165
Total 1100 1100
If Next year expected sale increment is 10%
27. The table below represent the Quarterly sales
Figures
Year Q1 Q2 Q3 Q4
2008 20 30 39 60
2009 40 51 62 81
2010 50 64 74 85
32. General overview of forecasting techniques
Classification of the widely used Forecasting
Techniques
Causal
Model
Time Series
Model
Smoothing
Techniques
Model
Regression
Analysis
Box-Jenkins
Processes
Moving averages
and Exponential
Smoothing
Techniques
33. Box-Jenkins ARIMA
models (1970)
Plot series
Is variance
stable
Apply
transformation
Obtain ACFs and
PACFs
Is mean
stable
Apply regular and
seasonal
differencing
Model selection
Estimate parameter values
Are residual
uncorrelated
Modify Model
Are parameters
significant
Forecasting
No
No
Yes
No
Yes
Yes
Yes
Box-Jenkins Modeling
Approach to Forecasting
34. The key statistics in time series analysis is the autocorrelation
coefficient (or the correlation of the time series with itself, lagged by
1, 2, or more periods), which is given by the following formula
Autocorrelation function
n
t
t
n
k
t
K
t
t
k
Y
Y
Y
Y
Y
Y
r
1
2
1
)
(
)
)(
(
Then r1 indicates how successive values of Y relate to each other, r2
indicates how Y values two period apart relate to each other, and so on
Autocorrelations at lags 1, 2, …, make up the autocorrelation function
or ACF
35. Partial autocorrelation function
Partial autocorrelations are used to measure the degrees of
association between Yt and Yt-k, when the effects of other time
lags 1, 2, 3,…,k-1 are removed
Suppose there was a significant autocorrelation between Yt and
Yt-1. Then there will also be a significant correlation between Yt-1
and Yt-2 since they are also one time unit apart
Consequently, there will be a correlation between Yt and Yt-2
because both are related to Yt-1
So, to measure the real correlation between Yt and Yt-2, we need to
take out the effect of the intervening value Yt-1. This is what partial
autocorrelation does.
36. Autocorrelation function
Y Y t-1 Yt-Ӯ Yt-1- Ӯ (Yt-Ӯ)( Yt-1- Ӯ) (Yt-Ӯ) (Yt-Ӯ)
2 -4 16
3 2 -3 -4 12 9
5 3 -1 -3 3 1
7 5 1 -1 -1 1
9 7 3 1 3 9
10 9 4 3 12 16
Total 29 52
0.557692
52
29
)
Y
(Y
)
Y
)(Y
Y
(Y
r n
1
t
2
t
n
1
k
t
K
t
t
k
37. Sampling distribution of Auto Correlation
(317)
Portmanteau Tests
An alternative to this would be to examine a whole set of rk values,
say the first 10 of them (r1 to r10) all at once and then test to see
whether the set is significantly different from a zero set. Such a test
is known as a portmanteau test, and the two most common are the
Box-Pierce test and the Ljung-Box Q* statistic..
The Box-Pierce Test
Here is the Box-Pierce formula:
38. Checking for Error Autocorrelation
• Test: Durbin-Watson statistic: k is number
of parameters in the model subtract one
d.f.
K
and
n
for
,
)
(
2
2
1
i
i
i
e
e
e
d
Positive Zone of No Autocorrelation Zone of Negative
autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower
Autocorrelation is clearly evident
Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
39. • Value near 2 indicates non-autocorrelation
• Value toward 0 indicates positive autocorrelation
• Value toward 4 indicates negative autocorrelation
40. To test for positive autocorrelation at significance α,
the test statistic d is compared to lower and upper
critical values (dL,α and dU,α):
•If d < dL,α, there is statistical evidence that the error
terms are positively autocorrelated
•If d > dU,α, there is no statistical evidence that the
error terms are positively autocorrelated
•If dL,α < d < dU,α, the test is inconclusive
Positive serial correlation is serial correlation in which
a positive error for one observation increases the
chances of a positive error for another observation
41. To test for negative autocorrelation at significance α, the
test statistic (4 − d) is compared to lower and upper
critical values (dL,α and dU,α):
•If (4 − d) < dL,α, there is statistical evidence that the
error terms are negatively autocorrelated.
•If (4 − d) > dU,α, there is no statistical evidence that the
error terms are negatively autocorrelated.
•If dL,α < (4 − d) < dU,α, the test is inconclusive.
42.
43. There is no growth or decline in the data. The data must be
roughly horizontal along the time axis. In other words the data
fluctuate around a constant mean, independent of time and the
variance of the fluctuation remain essentially constant over time.
Stationary of the time series data
A series stationary in mean
and variance
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Lag Number
1.0
0.5
0.0
-0.5
-1.0
ACF
ACF for non-stationary time
series data
44. Non-stationary of the time series data
A series non-stationary in mean
and variance
A series non-stationary in mean
46. Unit Roots
• H0: δ = 0 (there is a unit root) not stationary
• HA: δ ≠ 0 (there is not a unit root) stationary
• If δ = 0, then we can rewrite equation 2 as
Δyt = εt
Thus first differences of a random walk time series are
stationary, because by assumption, εt is purely
random.
In general, a time series must be differenced d times to
become stationary; it is integrated of order d or I(d).
A stationary series is I(0). A random walk series is
I(1).
47. Tests for Unit Roots
• Dickey-Fuller test
– Estimates a regression equation
– The usual t-statistic is not valid, thus D-F
developed appropriate critical values.
– You can include a constant, trend, or both in the
test.
– If you accept the null hypothesis, you conclude
that the time series has a unit root.
– In that case, you should first difference the
series before proceeding with analysis.
48. Tests for Unit Roots
• Augmented Dickey-Fuller test
– We can use this version if we suspect there is autocorrelation in
the residuals.
– This model is the same as the DF test, but includes lags of the
residuals too.
• Phillips-Perron test
– Makes milder assumptions concerning the error term, allowing
for the εt to be weakly dependent and heterogenously
distributed.
• Other tests include Variance Ratio test, Modified
Rescaled Range test, & KPSS test.
• There are also unit root tests for panel data (Levin et al
2002).
49. Removing the non-stationary in a time series
One way of removing non-stationarity is through the method of
differencing
Differenced series can be expressed as
1
'
t
t
t Y
Y
Y
'
t
Y
'
t
Y
The differenced series will have only n-1 values since it is not possible
to calculate a difference for the first observation
With seasonal data, which is non-stationary, it may be appropriate to
take seasonal differences. A seasonal difference is the difference
between an observation and the corresponding observation from the
previous year. So for monthly data having an annual 12-month pattern
12
'
t
t
t Y
Y
Y
50. Backshift Notation
• A very useful notational device is the backward
shift operator, B which is used as follows:
1
t
t Y
BY
• B operating on Yt , has the effect on shifting
the data back one period
51. 2
2
)
(
t
t
t Y
Y
B
BY
B
1
t
t Y
BY
2
2
t
t Y
Y
B
3
3
t
t Y
Y
B
d
t
t
d
Y
Y
B
Therefore,
55. )
( '
1
'
'
'
t
t
t Y
Y
Y
)
(
)
( 2
1
1
'
'
t
t
t
t
t Y
Y
Y
Y
Y
2
1
1
'
'
t
t
t
t
t Y
Y
Y
Y
Y
2
1
'
'
2
t
t
t
t Y
Y
Y
Y
t
t
t
t Y
B
BY
Y
Y 2
'
'
2
t
t Y
B
B
Y )
2
1
( 2
'
'
t
t Y
B
Y 2
'
'
)
1
(
Second Order Difference
56. Linear time series models
t
p
t
p
t
t
t
t Y
Y
Y
Y
C
Y
...
3
3
2
2
1
1
AR (p) ARIMA (p,0,0) :
t
q
t
q
t
t
t
t C
Y
...
3
3
2
2
1
1
MA (q) ARIMA (0,0,q) :
q
t
q
t
t
p
t
p
t
t
t Y
Y
Y
C
Y
...
... 2
2
1
1
2
2
1
1
ARMA (p,q) :
where B is Back shift notation ( BYt = Y t-1)
t
d
t
d
B
Y
)
1
(
ARIMA (p,d,q) :
t
t B
C
Y
B
B
)
1
(
)
1
)(
1
( 1
1
ARIMA (p,1,q) :
57. ARIMA models for time series data
The general model introduced by Box and Jenkins (1970) includes
autoregressive as well as moving average parameters, and explicitly
includes differencing in the formulation of the model
In the notation introduced by Box and Jenkins, models are
summarized as ARIMA (p, d, q)
Three types of parameters in the model are:
autoregressive parameters (p), number of differencing passes (d), and
moving average parameters (q)
For example, a model described as (0, 1, 2) means that it contains 0
(zero) autoregressive and 2 moving average which were computed
for the series after it was differenced once
58. It is needed to decide on (identify) the specific number and type of
ARIMA parameters to be estimated
The major tools used in the identification phase are plots of the
series, correlograms of auto correlation (ACF), and partial
autocorrelation (PACF)
A majority of empirical time series patterns can be sufficiently
approximated using one of the 5 basic models that can be
identified based on the shape of the autocorrelogram (ACF) and
partial auto correlogram (PACF)
Estimation of parameters
59. Estimation of parameters…
AR(1): ACF - exponential decay; PACF - spike at lag 1, no
correlation for other lags
AR(2): ACF - a sine-wave shape pattern or a set of exponential
decays; PACF - spikes at lags 1 and 2, no correlation for
other lags
MA(1): ACF - spike at lag 1, no correlation for other lags; PACF
- damps out exponentially
MA(2) : ACF - spikes at lags 1 and 2, no correlation for other lags;
PACF - a sine-wave shape pattern or a set of exponential
decays
ARMA(1.1): ACF - exponential decay starting at lag 1; PACF -
exponential decay starting at lag 1
60. ACF and PACF functions AR (1) models
Autocorrelation Partial autocorrelation
Autocorrelation
Partial autocorrelation
ACF and PACF for AR(1) and > 0 model
ACF and PACF for AR(1) and < 0 model
61. ACF and PACF functions MA (1) model
Autocorrelation Partial autocorrelation
Autocorrelation
Partial autocorrelation
PACF for MA(1) and < 0 model
ACF and PACF for MA(1) and > 0 model
62. Seasonal models
In addition to the non-seasonal parameters, seasonal parameters
for a specified lag need to be estimated
Analogous to the simple ARIMA parameters, these are:
seasonal autoregressive (Ps), seasonal differencing (Ds), and
seasonal moving average parameters (Qs) and seasonal models are
summarized as ARIMA (p, d, q) (P, D, Q)
For example, the model (0,1,2)(0,1,1) describes a model that
includes no autoregressive parameters, 2 regular moving average
parameters and 1 seasonal moving average parameter, and these
parameters were computed for the series after it was differenced
once with lag 1, and once seasonally differenced
63. Seasonal models…
ARIMA(p,d,q)(P,D,Q)S:
The general recommendations concerning the selection of
parameters to be estimated (based on ACF and PACF) also apply to
seasonal models
t
S
t
S
S
B
B
Y
B
B
B
B
)
1
)(
1
(
)
1
)(
1
)(
1
)(
1
( 1
1
1
1
φ
Non-seasonal
AR (1)
Seasonal
AR (1)
Non-seasonal
difference
Seasonal
difference
Non-seasonal
MA (1)
Seasonal
MA (1)
64. ACF/PACF
• The seasonal part of an AR or MA model will be seen in the
seasonal lags of the PACF and ACF respectively
• For example, an ARIMA(0,0,0)(0,0,1)12 model will show: a
spike at lag 12 in the ACF but no other significant spikes
• The PACF will show exponential decay in the seasonal lags;
that is, at lags 12, 24, 36, ….
• Similarly, an ARIMA(0,0,0)(1,0,0)12 model will show:
exponential decay in the seasonal lags of the ACF a single
significant spike at lag 12 in the PACF.
65.
66.
67.
68.
69.
70.
71.
72. Example
Fitting of a time series model for describing the average monthly Sri Lankan
spot price of black pepper 1949 – 1960 in Sri Lankan Rupees per Kg
’49 ‘50 ‘51 ‘52 ‘53 ‘54 ‘55 ‘56 ‘57 ‘58 ‘59 ‘60
Jan. 112 115 145 171 196 204 242 284 315 340 360 417
Feb 118 126 150 180 196 188 233 277 301 318 342 391
Mar 132 141 178 193 236 135 267 317 356 362 406 419
Aprl 129 135 163 181 235 227 269 313 348 348 396 461
May 121 125 172 183 229 234 270 318 355 363 420 472
Jun 135 149 178 218 243 264 315 374 422 435 472 535
July 148 170 199 230 264 302 364 413 465 491 548 622
Aug 148 170 199 242 272 293 347 405 467 505 559 606
Sep 136 158 184 209 237 259 312 355 404 404 463 508
Oct. 119 133 162 191 211 229 274 306 347 359 407 461
Nov 104 114 146 172 180 203 237 271 205 310 362 390
Dec 118 140 166 194 201 229 278 306 336 337 405 432
78. Comparison of ARIMA models with AIC
Akaike’s Information Criteria (AIC) = -2 log L + 2m
Where, L = Likelihood
m = p+q+P+Q
σ2 = variance of residual
80. Conclusions
The Moving average methods and single exponential smoothing
emphasizes the short-range perspective, on the condition that
there is no trend no seasonality
Holt’s linear exponential smoothing captures information about
recent trend
Hotl’s method was extended by Winters (1960) to capture
seasonality
The class of ARIMA models are useful for both stationary and
non-stationary time series
There are numerical indicators for assessing the accuracy of the
forecasting technique, the most widely use approach is use of
several indicators to assess their accuracy
81. Box, G. E. P. and Jenkins G. M. (1970). Time series analysis: forecasting and control, San
Francisco: Holden day.
Box, G. E. P. and Pierce D. A. (1970). Distribution of the residual autocorrelation in
autoregressive-integrated moving-average time series models, Journal American
Statistical Association, 65, 1509-1526.
Gardner, E. S. (1985). Exponential smoothing: the state of the art, Journal of Forecasting, 4, 1-
28.
Holt, C.C. (1957). Forecasting seasonal and trends by exponentially weighed moving average,
Office of Navel Research, Research memorandum No. 52.
Makridakis, S. and Hibon, M. (1979). Accuracy of forecasting an empirical investigation.
Journal of Royal Statistical Society A, 142, 97-145.
Makridakis, S., Steven, C. W. and Rob J. Hyndman (1998). Forecasting Methods and
Applications, 3rd edition, John Wiley & sons, New Yark.
Winters, P. R.(1960). Forecasting sales by exponentially weighted moving averages, seasonal and
trends by exponentially weighed moving average, Management Science, 6, 324-342.
Yar, M and C. Chatfield (1990), Prediction intervals for the Holt-Winters forecasting procedure,
International Journal of Forecasting 6, 127-137.
References
82. ARCH Model
• At any point in a series, the error terms will
have a characteristic size or variance.
• In particular ARCH models assume the
variance of the current error term or innovation
to be a function of the actual sizes of the
previous time periods' error terms: often the
variance is related to the squares of the previous
innovations.