TIME SERIES ANALYSIS
Modeling and Forecasting
Presented by
Vaibhav Jain (A13021)
Maruthi Nataraj (A13009)
Sunil Kumar (A13020)
Punit Kishore (A13011)
Arbind Kumar (A13003)
AGENDA
 Introduction
 Objective
 Data Preparation
 Check for Volatility
 Check for Non-Stationarity
 Check for Seasonality
 Model Identification and Estimation
 Forecasting
 Graphical Forecast
INTRODUCTION
 Time Series relates to values taken by a variable over time (such as daily sales
revenue, weekly orders, monthly overheads, yearly income) and tabulated or
plotted as chronologically ordered numbers or data points to yield valid
statistical inferences.
 Trend : A Time series data may show steady upward trend or downward
movement with little fluctuation for a period of years and this may be due to
factors like increase in population, change in technological progress ,large scale
shift in consumers demands etc.
 Seasonal variations: They are short-term fluctuation in a time series which
occur periodically within a year. The major factors that are responsible for the
repetitive pattern of seasonal variations are weather conditions and customs of
people. Each year more ice creams are sold in summer and very little in Winter
season.
INTRODUCTION
 Cyclical variations : They are recurrent
upward or downward movements in a time
series but the period of cycle is greater than
a year.
 Irregular variations : They are
fluctuations in time series that are short in
duration, erratic in nature and following no
regularity in the occurrence(random).
OBJECTIVE
 The two main objectives of Time Series Analysis are :
 To understand the underlying structure of Time Series represented by sequence
of observations by breaking it down to its components.
 To fit a mathematical model and proceed to forecast the future.
 In this study, we have to project the sales for the next 12 months .
 The dataset used here is SASHELP.AIR which is Airline data and contains
two variables – DATE and AIR( labeled as International Airline Travel).
It contains the data from JAN 1949 to DEC 1960.
DATA PREPARATION
Check for Volatility
 The plot of the data with time on horizontal axis and time series on
vertical axis provides an indication for volatility.
 A fan shaped or an inverted fan shaped plot shows high volatility.
 For fan shaped plot, ‘log’ or ‘square root’ transformation is used to
reduce volatility ,while for inverted fan shaped plot ,’ exponential’ or
‘square’ transformation is used.
(AIR data has been copied to MAIR dataset)
DATA PREPARATION
Check for Volatility
Fan shaped plot indicating the presence of high volatility
DATA PREPARATION
Check for Volatility
After square root transformation
DATA PREPARATION
Check for Volatility
After log transformation ,with reduced volatility (constant variance)
DATA PREPARATION
Check for Non-Stationarity
 If the data is completely random with no fixed pattern, it is called non-
stationary data and cannot be used for future forecasting. This is checked by
‘Augmented Dickey-Fuller Unit Root Test’ (ADF).Here,
 H0 : Data is non-stationary
 If p < alpha, we reject H0 to claim that the data is stationary and hence
can be used for forecasting.
 If p > alpha, we get non-stationary data which can be converted to
stationary by successive differencing.
 We can start with first difference (y[t]-y[t-1]) which can obtained using
DIF(L_AIR) or L_AIR(1).Similarly, if we need second difference, it is
DIF2(L_AIR) .
DATA PREPARATION
Check for Non-Stationarity
DATA PREPARATION
Check for Non-Stationarity
Non stationary data is
converted into stationary by
first differencing.
DATA PREPARATION
Check for Seasonality
 The Auto Correlation function (ACF) gives the correlation between y[t]-y[t-s]
where ‘s’ is the period of lag.
 If the ACF gives high values at fixed interval, that interval can be considered as the
period of seasonality. A differencing of same order will deseasonalize the data.
 In the previous output of ACF as shown below, we can see that 12 years is period of
seasonality.
Here, we have deseasonalized data by 12th order differencing as
shown above.
DATA PREPARATION
Check for Seasonality
MODEL IDENTIFICATION
AND ESTIMATION
 Depending upon the number of future time points to be forecasted, we set
aside few of the most recent time points data as the validation sample(V). The
rest of the data which is the development sample(D), is used to generate forecasts
for different models.
 MINIC (Minimum Information Criteria) option under PROC ARIMA generates
the minimum BIC (Bayesian Information Criteria) Model after exploring all the
possible combinations of ‘p’ (Auto Regressive) and ‘q’ (Moving Average) lags
from 0 to 5 (default).
MODEL IDENTIFICATION
AND ESTIMATION
 By observation, we can see that the minimum of the matrix is the value -6.3503
corresponding to AR 3 and MA 0 location(i.e. p=3 & q=0).
 We consider all the models in the neighborhood of this model and for each of
them generate AIC (Akaike Information Criteria) and SBC (Schwartz Bayesian
Criteria) and calculate the average of them.
 We select the top 6-7 models based on relatively lower value of the average
and for each of them generate forecasts.
p=0 , q=1
MODEL IDENTIFICATION
AND ESTIMATION
AIC & SBC for all the neighborhood
models [ (0,1) to (3,3)]
Top 7 models based on lower
average value
FORECASTING
 Forecasts are generated using the FORECAST option in PROC ARIMA.
 Here,
LEAD = No of future time points to forecast
ID = Name of time variable
INTERVAL = Unit of time variable
OUT = Name of the output file which saves the forecast
 The forecasts generated (for 1960 in this case) for each combination selected
from AIC & SBC are separately compared with the actual values of the same time
point stored in the validation dataset (V) and ‘MAPE’ (Mean Absolute Percentage
Error) is calculated.
FORECASTING
p=2 , q=3
FORECASTING
We select the combination
(p, q) which has the
minimum MAPE and that
model is applied on the
entire data to generate the
final forecast (for 1961).
Here, we need to
apply Antilog(exp) to
get back original data
for convenience in
comparison.
FORECASTING
Forecasted Data
GRAPHICAL FORECAST
0.00
100.00
200.00
300.00
400.00
500.00
600.00
700.00
800.00
Jan/49
Jun/49
Nov/49
Apr/50
Sep/50
Feb/51
Jul/51
Dec/51
May/52
Oct/52
Mar/53
Aug/53
Jan/54
Jun/54
Nov/54
Apr/55
Sep/55
Feb/56
Jul/56
Dec/56
May/57
Oct/57
Mar/58
Aug/58
Jan/59
Jun/59
Nov/59
Apr/60
Sep/60
Feb/61
Jul/61
Dec/61
InternationalAirlineTravel(Thousands)
Actual Vs Forecasted
AIR
AIR_M
Thank You

Time Series Analysis - Modeling and Forecasting

  • 1.
    TIME SERIES ANALYSIS Modelingand Forecasting Presented by Vaibhav Jain (A13021) Maruthi Nataraj (A13009) Sunil Kumar (A13020) Punit Kishore (A13011) Arbind Kumar (A13003)
  • 2.
    AGENDA  Introduction  Objective Data Preparation  Check for Volatility  Check for Non-Stationarity  Check for Seasonality  Model Identification and Estimation  Forecasting  Graphical Forecast
  • 3.
    INTRODUCTION  Time Seriesrelates to values taken by a variable over time (such as daily sales revenue, weekly orders, monthly overheads, yearly income) and tabulated or plotted as chronologically ordered numbers or data points to yield valid statistical inferences.  Trend : A Time series data may show steady upward trend or downward movement with little fluctuation for a period of years and this may be due to factors like increase in population, change in technological progress ,large scale shift in consumers demands etc.  Seasonal variations: They are short-term fluctuation in a time series which occur periodically within a year. The major factors that are responsible for the repetitive pattern of seasonal variations are weather conditions and customs of people. Each year more ice creams are sold in summer and very little in Winter season.
  • 4.
    INTRODUCTION  Cyclical variations: They are recurrent upward or downward movements in a time series but the period of cycle is greater than a year.  Irregular variations : They are fluctuations in time series that are short in duration, erratic in nature and following no regularity in the occurrence(random).
  • 5.
    OBJECTIVE  The twomain objectives of Time Series Analysis are :  To understand the underlying structure of Time Series represented by sequence of observations by breaking it down to its components.  To fit a mathematical model and proceed to forecast the future.  In this study, we have to project the sales for the next 12 months .  The dataset used here is SASHELP.AIR which is Airline data and contains two variables – DATE and AIR( labeled as International Airline Travel). It contains the data from JAN 1949 to DEC 1960.
  • 6.
    DATA PREPARATION Check forVolatility  The plot of the data with time on horizontal axis and time series on vertical axis provides an indication for volatility.  A fan shaped or an inverted fan shaped plot shows high volatility.  For fan shaped plot, ‘log’ or ‘square root’ transformation is used to reduce volatility ,while for inverted fan shaped plot ,’ exponential’ or ‘square’ transformation is used. (AIR data has been copied to MAIR dataset)
  • 7.
    DATA PREPARATION Check forVolatility Fan shaped plot indicating the presence of high volatility
  • 8.
    DATA PREPARATION Check forVolatility After square root transformation
  • 9.
    DATA PREPARATION Check forVolatility After log transformation ,with reduced volatility (constant variance)
  • 10.
    DATA PREPARATION Check forNon-Stationarity  If the data is completely random with no fixed pattern, it is called non- stationary data and cannot be used for future forecasting. This is checked by ‘Augmented Dickey-Fuller Unit Root Test’ (ADF).Here,  H0 : Data is non-stationary  If p < alpha, we reject H0 to claim that the data is stationary and hence can be used for forecasting.  If p > alpha, we get non-stationary data which can be converted to stationary by successive differencing.  We can start with first difference (y[t]-y[t-1]) which can obtained using DIF(L_AIR) or L_AIR(1).Similarly, if we need second difference, it is DIF2(L_AIR) .
  • 11.
  • 12.
    DATA PREPARATION Check forNon-Stationarity Non stationary data is converted into stationary by first differencing.
  • 13.
    DATA PREPARATION Check forSeasonality  The Auto Correlation function (ACF) gives the correlation between y[t]-y[t-s] where ‘s’ is the period of lag.  If the ACF gives high values at fixed interval, that interval can be considered as the period of seasonality. A differencing of same order will deseasonalize the data.  In the previous output of ACF as shown below, we can see that 12 years is period of seasonality.
  • 14.
    Here, we havedeseasonalized data by 12th order differencing as shown above. DATA PREPARATION Check for Seasonality
  • 15.
    MODEL IDENTIFICATION AND ESTIMATION Depending upon the number of future time points to be forecasted, we set aside few of the most recent time points data as the validation sample(V). The rest of the data which is the development sample(D), is used to generate forecasts for different models.  MINIC (Minimum Information Criteria) option under PROC ARIMA generates the minimum BIC (Bayesian Information Criteria) Model after exploring all the possible combinations of ‘p’ (Auto Regressive) and ‘q’ (Moving Average) lags from 0 to 5 (default).
  • 16.
    MODEL IDENTIFICATION AND ESTIMATION By observation, we can see that the minimum of the matrix is the value -6.3503 corresponding to AR 3 and MA 0 location(i.e. p=3 & q=0).  We consider all the models in the neighborhood of this model and for each of them generate AIC (Akaike Information Criteria) and SBC (Schwartz Bayesian Criteria) and calculate the average of them.  We select the top 6-7 models based on relatively lower value of the average and for each of them generate forecasts. p=0 , q=1
  • 17.
    MODEL IDENTIFICATION AND ESTIMATION AIC& SBC for all the neighborhood models [ (0,1) to (3,3)] Top 7 models based on lower average value
  • 18.
    FORECASTING  Forecasts aregenerated using the FORECAST option in PROC ARIMA.  Here, LEAD = No of future time points to forecast ID = Name of time variable INTERVAL = Unit of time variable OUT = Name of the output file which saves the forecast  The forecasts generated (for 1960 in this case) for each combination selected from AIC & SBC are separately compared with the actual values of the same time point stored in the validation dataset (V) and ‘MAPE’ (Mean Absolute Percentage Error) is calculated.
  • 19.
  • 20.
    FORECASTING We select thecombination (p, q) which has the minimum MAPE and that model is applied on the entire data to generate the final forecast (for 1961). Here, we need to apply Antilog(exp) to get back original data for convenience in comparison.
  • 21.
  • 22.
  • 23.