1. Time Series Forecasting
Time series forecasting involves making projections or forecasts about future performance on
the basis of historical and current data. A Time series is a collection of observations of well-
defined data items collected at different points of time but Data collected irregularly or only
once are not time series. Specifically forecasting would be helpful in situations where statistical
models can account for what might happen in non specific time. So these time series
forecasting data helps us in the process of making statements about the events whose actual
outcomes have not yet been observed.
There are two main goals of time series analysis: (a) identifying the nature of the phenomenon
represented by the sequence of observations, and (b) forecasting (predicting future values of
the time series variable). Both of these goals require that the pattern of observed time series
data is identified and more or less formally described. Once the pattern is established, we can
interpret and integrate it with other data (i.e., use it in our theory of the investigated
phenomenon, e.g., seasonal commodity prices). Regardless of the depth of our understanding
and the validity of our interpretation (theory) of the phenomenon, we can extrapolate the
identified pattern to predict future events.
Smoothing
Smoothing is a very common statistical process. Smoothing techniques are used to remove
random variation (noise) from our historical time series data. This allows us to better identify
data patterns that are primarily trend and seasonality and using these trend and seasonality
estimates we can get data levels that can be used to forecast the future data points. Moving
averages rank among the most popular techniques for the preprocessing of time series. They
are used to filter random "white noise" from the data, to make the time series smoother or
even to emphasize certain informational components contained in the time series. A moving
average just uses a predefined number of periods to calculate the average, and those periods
move as time passes.
The basic assumption behind averaging and smoothing models is that the time series is locally
stationary with a slowly varying mean. Hence, we take a moving (local) average to estimate the
current value of the mean and then use that as the forecast for the near future. This can be
considered as a compromise between the mean model and the random-walk-without-drift-
model. The same strategy can be used to estimate and extrapolate a local trend. A moving
average is often called a "smoothed" version of the original series because short-term
averaging has the effect of smoothing out the bumps in the original series. By adjusting the
degree of smoothing (the width of the moving average), we can hope to strike some kind of
optimal balance between the performance of the mean and random walk models
2. Simple Moving Average
The method of simple moving averages smoothes out random fluctuations of data. This method
is best used for short-term forecasts in the absence of seasonal or cyclical variations. On the
other hand, this method is not particularly good in situations where the series has a trend. The
forecast for the value of Y at time t+1 that is made at time t equals the simple average of the
most recent m observations:
“Y-hat” – forecast of the time series at time t+1
This average is centered at period t-(m+1)/2, which implies that the estimate of the local mean
will tend to lag behind the true value of the local mean by about (m+1)/2 periods. Thus, we say
the average age of the data in the simple moving average is (m+1)/2 relative to the period for
which the forecast is computed: this is the amount of time by which forecasts will tend to lag
behind turning points in the data.
Weighted Moving Averages (WMA)
The method of weighted moving averages is another averaging time series forecasting method
that smoothes out random fluctuations of data. This method is also best used for short-term
forecasts in the absence of seasonal or cyclical variations. Weighted Moving Average forecast is
computed as the weighted average of the most recent k-observations where the most recent
observation has the highest weight.
Simple moving average method assigns equal weights (1/m) to all m data points. Arguably,
recent observations provide more relevant information than do observations in the past. So we
want a weighting scheme that assigns decreasing weights to the more distant observations.
Weighted MA(m) = w1.Yt + w2.Yt-1 + w3.Yt-2 + . …. Wm.Yt-m
where:
Yt is the actual value of the dependentvariableforperiodt
m isthe numberof time periodsincludedinthe average
As weights are used to vary the effect of past data, based on the fact that more recent data is
more important, the weights should go up and always add up to 1.
As this method uses weights, it is more suitable in situations where the series has a trend. The
stronger the trend the more heavily recent data needs to be weighted. The forecaster should
remember, however, that if recent data is weighted too heavily, the resulting forecast might be
3. an overreaction to what is simply a random fluctuation. On the other hand, weighting too
lightly might result in an underreaction (lagging) to an actual change in the pattern.
Like the case of simple moving averages, the forecaster should experiment with different sets
of weights until a model which seems to be producing satisfactory results has been found.
Advantages
It is easy to learn and apply.
It has a relatively low computational cost.
It can produce accurate forecasts.
It produces forecasts quickly.
It responds more rapidly to changes in the pattern.
It can produce more accurate forecasts than a SMA model if applied to a trended series.
Disadvantages
It fails to produce accurate forecasts if the data has cyclical or seasonal variations.
The actual data values have to be multiplied by some weights and this makes calculations
more difficult.
Componentsof Time Series Data
A variety of factors are likely influencing data. It is very important that these different
influences or components be separated or decomposed out of the 'raw' data levels. In general,
there are four types of components in time series analysis: Seasonality, Trend, Cycling
and Irregularity.
Xt = St . Tt. Ct . I
The first three components are deterministic which are called "Signals", while the last
component is a random variable, which is called "Noise". To be able to make a proper forecast,
we must know to what extent each component is present in the data. Hence, to understand
and measure these components, the forecast procedure involves initially removing the
component effects from the data (decomposition). After the effects are measured, making a
forecast involves putting back the components on forecast estimates (recomposition).
4. Trend component
The trend is the long term pattern of a time series. A trend can be positive or negative
depending on whether the time series exhibits an increasing long term pattern or a decreasing
long term pattern. If a time series does not show an increasing or decreasing pattern then the
series is stationary in the mean. Eg. Increase in population, large scale shift in consumer
demands
Seasonal component
Seasonality occurs when the time series exhibits regular fluctuations during the same month (or
months) every year, or during the same quarter every year. For instance, retail sales peak
during the month of December.
Cyclical component
Any pattern showing an up and down movement around a given trend is identi_ed as a cyclical
pattern. The duration of a cycle depends on the type of business or industry being analyzed.
5. Irregular component
This component is unpredictable. Every time series has some unpredictable component that
makes it a random variable. In prediction, the objective is to model" all the components to the
point that the only component that remains unexplained is the random component.
Data vs Methods
6. So the first step is to identify the time series pattern and accordingly fit the smoothing
technique to get the forecast
7. Exponential Smoothing Method
Forecasts produced using exponential smoothing methods are weighted averages of past
observations, with the weights decaying exponentially as the observations get older. In other
words, the more recent the observation the higher the associated weight. This framework
generates reliable forecasts quickly and for a wide spectrum of time series which is a great
advantage and of major importance to applications in industry.
Exponential smoothing methods are averaging methods (in fact, exponential smoothing is a
short name for an exponentially weighted moving average) that require only three pieces of
data: the forecast for the most recent time period (Ft), the actual value for that time period (Yt)
and the value of the smoothing constant (denoted by ).
Simple Exponential Smoothing
Simple exponential smoothing (usually referred to as exponential smoothing) is a time series
forecasting method that smoothes out random fluctuations of data. It is best used for short-
term forecasts in the absence of seasonal or cyclical variations. Similarly, the method does not
work very well if the series has a trend.
Exponential smoothing weights past data with weights that decrease exponentially with time,
thus adjusting for previous inaccuracies in forecasts. To do that, the method uses a weighting
factor (known as the smoothing constant), which reflects the weight given to the most recent
data values.
Where:
Ft= Forecast value for period t
Yt-1 = Actual value for period t-1
Ft-1 = Forecast value for period t-1
= Alpha (smoothing constant)
The value of the smoothing constant lies between 0 and 1. Its value determines the degree of
smoothing and how responsive the model is to fluctuations in the data. The larger the value
given to the more strongly the model reacts to the most recent data.
When the value of is close to 1, the new forecast will include a substantial adjustment for any
error that occurred in the preceding forecast. On the other hand, when the value of is close
to 0, the new forecast will be very similar to the old one.
11 )1( ttt FYF
8. If a time series is fluctuating erratically, as a result of random variability, the forecaster should
choose a small value of . On the other hand, the forecaster should choose a larger value of ,
if the series is more stable and shows little random fluctuation.
If it is desired that predictions be stable and random variations smoothed, then a small value of
is required. If a rapid response to a real change in the pattern of observations is desired, then
a larger value of is appropriate.
The method of moving averages fails to produce accurate forecasts if the series has a significant
trend or a seasonal variation. There are other versions of exponential smoothing which can
handle strong trend patterns (Holt's method) or strong trend and seasonal variation patterns
(Winter's method).
The advantages of exponential smoothing are as follows:
It is easy to learn and apply.
It has a relatively low computational cost.
It can produce accurate forecasts.
It can produce forecasts quickly.
It gives greater weight to more recent observations.
It requires a significantly smaller amount of data to be stored compared to the methods of
moving averages.
It considers the data as a whole and does not require cut-off points as is the case with the
methods of moving averages.
It can alter the value of the smoothing constant to fit the model properly in any different
circumstances.
The disadvantages of exponential smoothing are as follows:
It has a tendency to produce forecasts that lag behind the actual trend.
It fails to produce accurate forecasts if the data has cyclical or seasonal variations.
It does not handle trend very well.
The forecasts generated by an exponential smoothing model are sensitive to the
specification of the smoothing constant.
Holt’s Trend Exponential Smoothing
Holt (1957) extended simple exponential smoothing to allow forecasting of data with a trend.
This method involves a forecast equation and two smoothing equations (one for the level and
one for the trend):
The model: Separate smoothing equations for level and trend
Level Equation : Lt = a(Current Value) + (1 - a) (Level + Trend Adjustment)t-1
9. Lt = aYt + (1 - a) (Lt-1 + bt-1)
Trend Equation : Tt = β (Lt - Lt-1) + (1 - β) bt-1
Forecasting Equation : Ft+h = Lt + h bt
Where:
Lt denotesanestimate of the level of the seriesattime t
bt denotesanestimate of the trend(slope) of the seriesattime t
αis the smoothingparameterforthe level, 0<α<1
β isthe smoothingparameterforthe trend, 0<β<1
As with simple exponential smoothing, the level equation here shows that Lt is a weighted
average of observation yt and the within-sample one-step-ahead forecast for time t, here given
by Lt−1+bt−1. The trend equation shows that bt is a weighted average of the estimated trend at
time t based on Lt −Lt−1 and bt−1, the previous estimate of the trend.
The forecast function is no longer flat but trending. The h-step-ahead forecast is equal to the
last estimated level plus h times the last estimated trend value. Hence the forecasts are a linear
function of h.
Holt-Winters Methods
Holt-Winters method is an exponential smoothing approach for handling SEASONAL data. In
some time series, seasonal variation is so strong it obscures any trends or cycles, which are very
important for the understanding of the process being observed. Winter ’s smoothing method
can remove seasonality and makes long term fluctuations in the series stand out more clearly.
Two Holt-Winters methods are designed for time series that exhibit linear trend :
Additive Holt-Winters method : used for time series with constant (additive) seasonal
variations.
Multiplicative Holt-Winters method : used for time series with increasing (multiplicative)
seasonal variations.
The multiplicative Holt-Winters method is the better known of the two methods.
With the additive method, the seasonal component is expressed in absolute terms in the scale
of the observed series, and in the level equation the series is seasonally adjusted by subtracting
the seasonal component. Within each year the seasonal component will add up to
approximately zero. With the multiplicative method, the seasonal component is expressed in
relative terms (percentages) and the series is seasonally adjusted by dividing through by the
seasonal component. Within each year, the seasonal component will sum up to
approximately m.
10. The basic equations for Holt’s Winters’ additive method are:
Where s is the period of seasonality γ
Lt denotesanestimate of the levelof the seriesattime t
bt denotes anestimate of the trend(slope) of the seriesattime t
St denotesanestimate of the seasonality of the seriesattime t
αisthe smoothingparameterforthe level, 0<α<1
β isthe smoothingparameterforthe trend, 0<β<1
ϒ isthe smoothingparameterforthe trend, 0<ϒ<1
The basic equations for Holt’s Winters’ multiplicative method are:
Where s is the period of seasonality
smtttmt
stttt
tttt
ttsttt
SmbLF
SLyS
bLLb
bLSyL
)1()(
)1()(
))(1()(
11
11
smtttmt
sttttt
tttt
ttsttt
SmbLF
SbLyS
bLLb
bLSyL
)*(
)1())/((
)1()(
))(1()/(
11
11
11
11. Advantages
Used for short term forecast
Gives interpretable results
Easy to implement
Being able to adapt to changes in trends and seasonal patterns
Disadvantages
Presence of outliers distorts the results
Can’t generalize to multivariate approach
Accounts for only single seasonal pattern
Holt’s only forecasts the past information of the forecasting variable
Holt Winter’s Vs ARIMA
Exponential moving average is equivalent to ARIMA models with the hyper parameters
(p,d,q) set to (0,1,1). These models have a difficult time adapting to time series data
exhibiting a trend or drift.
Simple Exponential Smoothing – ARIMA(0,1,1)
Double exponential smoothing is designed to account for data that is trending by
incorporating a second recursive term and hyper-parameter that can be tuned to
account for this drift. However, double exponential smoothing models have a difficult
time capturing cyclicality in the data.
Double Exponential Smoothing – ARIMA(0,2,2)
Triple exponential smoothing adds yet another recursive term and two hyper-parameters (a
third “exponential parameter” and a season-period parameter) to capture this seasonality.
Triple Exponential Smoothing – S-ARIMA(0,1,M+1)(0,1,0)m (Additive)
Multiplicative models are not derived from ARIMA
Measuring forecast accuracy
Part of the decision to use a particular forecasting model must be based upon the forecaster’s
belief that, when implemented, the model will work reasonably well. Since modelling involves
simplification, it would be unrealistic to expect a forecasting model to predict perfectly all the
time. On the other hand, it would be realistic to expect to find a model that produces relatively
small forecast errors.
12. The forecast error (et) for a time period is the difference between the actual value (Yt) and the
forecast value (Ft) for that period.
The purpose of measuringforecastaccuracy isto:
Produce a single measure of a model’s usefulness or reliability.
Compare the accuracy of two forecasting models.
Search for an optimal model.
By measuring forecast accuracy, the forecaster can carry out a validation study. In other words,
the forecaster can try out a number of different forecasting models on some historical data, in
order to see how each of these models would have worked had it been used in the past.
This part of the section introduces four simple tests that can be used to measure forecast
accuracy: the mean absolute deviation, the mean square error, the root mean square error, and
the mean absolute percentage error. All these tests measure the average forecast error of the
forecasts produced by the various forecasting models and are commonly used in time series
forecasting in order to assess the accuracy of the various forecasting models.
Mean Absolute Deviation (MAD)
The mean absolute deviation measures forecast accuracy by averaging the magnitudes of the
forecast errors. The test is based on the following relation:
|et|
MAD = ----------------
n
where:
et isthe forecasterrorfor periodt
n is the numberof forecasterrors
The test uses the absolute values of the forecast errors in order to avoid positive and negative
values cancelling out when added up together.
Mean Square Error (MSE)
The mean square error measures forecast accuracy by averaging the squares of the forecast
errors. The test is based on the following relation:
(et
2
)
MSE = ------------
n
where:
et isthe forecasterror for periodt
13. n isthe numberof forecasterrors
The reason why the forecast errors are squared is in order to remove all negative terms before
the values are added up. Using the squares of the errors achieves the same outcome as using
the absolute values of the errors, as the square of a number will always result in a non-negative
value.
Mean Absolute Percentage Error (MAPE)
Errors in measurement are often expressed as a percentage of relative error. The mean
absolute percentage error expresses each et value as a percentage of the corresponding Yt value
using the following relation:
|et/Yt|
MAPE = ---------- x 100
n
where:
Yt isthe actual value of the dependentvariable forperiodt
et isthe forecasterror for periodt
n isthe numberof forecasterrors
The advantages of the mean absolute percentage error over other tests of forecast accuracy
are as follows:
It relates each forecast error to its actual data value.
It is easier to interpret as it expresses the forecast error as a percentage of the actual
data.
It is very simple to use.
Which forecast accuracy measure to use is a matter of personal preference, as they all have
their own advantages and limitations. Different measures of forecast error will not necessarily
produce the same results, as no single error measure has been shown to give an unambiguous
indication of forecast accuracy. However, if a forecasting model is by far superior to the others,
then all tests should agree. If the various tests do not agree, then that would be an indication
that no single forecasting model is far better than the others. With the help of Excel, it is not a
bad idea if more than one measure of forecast accuracy is used in order to assess the accuracy
of different forecasting models.