Implementation of Time
Series with R
Before implementing Time Series,
let’s quickly brush up what we
have discusses in Part 1 so far..
What is Time Series?
It is time-dependent
These data points (past values) are analyzed to forecast a future
A Time Series is a sequence of data being recorded at specific time intervals
Time Series is affected by
four main components
Trend Seasonality Cyclicity Irregularity
How do you differentiate
between a stationary and
Non-Stationary time series?Stationarity of Time Series depends on:
Mean
Variance
Co-Variance
Stationarity of Time Series
The mean of the series should not be a function of time rather should be a constant
Here, mean is
constant with time
Here, mean is
increasing with
time
Stationary Non-Stationary
What is that?
We will give a time code(t) variable to each row indicating each time period, let’s discuss this variable later:
t Year Quarter Sales(1000s) MA(4) CMA
1 Year 1 1 2.8
2 2 2.1
3 3 4 3.4 3.5
4 4 4.5 3.6 3.7
5 Year 2 1 3.8 3.9 4.0
6 2 3.2 4.1 4.2
7 3 4.8 4.3 4.3
8 4 5.4 4.4 4.4
9 Year 3 1 4 4.5 4.5
10 2 3.6 4.6 4.7
11 3 5.5 4.7 4.8
12 4 5.8 4.8 4.8
13 Year 4 1 4.3 4.9 4.9
14 2 3.9 5.0 5.1
15 3 6 5.2
16 4 6.4
Forecast Time Series
Here we can see that the predicted values overlap with the actual values and continues till year
5 which shows the accuracy of our forecasted values
Now, let’s move on with our
implementation of Time Series
using R
What’s in it for you?
Introduction to ARIMA Model
Auto-Correlation & Partial Auto-Correlation
Use Case: Forecast the sales of air-tickets using ARIMA
Model Validating using Ljung-Box Test
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
ARIMA stands for Auto Regressive Integrated Moving Average
It is specified by three order parameters: p,d,q
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
ARIMA models are classified by three factors:
p = number of autoregressive terms (AR),
d = how many non-seasonal differences are needed to
achieve stationarity (I),
q = number of lagged forecast errors in the prediction
equation (MA)
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
AR(p): number of autoregressive terms (AR)
Auto-Regressive
Parameter(AR), p
Example: ARIMA(2,0,0) has a value of p as 2
Degree of
Differencing(I), d
Moving Average(MA),
q
But what are ‘AR terms’?
In terms of a regression model,
autoregressive components refer
to prior values of the current value
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
The second AR component would be x(t-2) and so on
These are often referred to as lagged terms. So the prior
value is called the first lag, and the one prior that the
second lag, and so on
If x(t)  Current value
then AR component = x(t-1) * a
Where a = fitted coefficient
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
It is equal to the number of non-seasonal differences needed to
achieve stationarity
1 level of differencing would mean you take the current value and
subtract the prior value from it
Auto-Regressive
Parameter(AR), p
Degree of
Differencing(I), d
Moving Average(MA),
q
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
Differencing: Subtracting prior values from the current values:
If this series still shows a trend then you can do another level of differencing
with the first level differenced series
Values 1st Order Differencing Result
5 NA NA
4 4-5 -1
6 6-4 2
7 7-6 1
9 9-7 2
12 12-9 3
12 12-12 0
Will use ARIMA model to forecast
the Time Series, let’s have a short
introduction to ARIMA model
It represents the error of the model as a combination of previous error
terms et
Auto-Regressive
Parameter(AR), p
Degree of
Differencing(I), d
Moving Average(MA),
q
But, ARIMA models work on the
assumption of stationarity
Oh you mean, removing trend
and seasonality to make the
data stationary, right?
Yes, like we saw in the previous
part
ACF/PACF
In order to test whether or not the series and their error term is
auto correlated, we usually use:
Auto-correlation function
(ACF)
Partial Auto-correlation function
(PACF)
What is auto-correlation?
Let’s plot these functions to
understand better!
As it crosses the blue dashed line, it shows that the
values are correlated. Hence, non-stationary.
ACF
Autocorrelation is the similarity between values of a same variable
across observations
What is Auto-correlation?
As it crosses the blue dashed line, it shows that the
values are correlated. Hence, non-stationary.
ACF
What is Auto-correlation?
• Auto-Correlation Function(ACF) tells you how correlated points
are with each other, based on how many time steps they are
separated by.
• It is used to determine how past and future data points are related
in a time series. It’s value can range from -1 to 1
When we plot ACF for our dataset, it crosses the
blue dashed line which indicates that the values are
correlated. Hence, non-stationary.
ACF
R plots 95% significance boundaries as blue dotted lines
PACF
The PACF function shows a definite pattern, which does not repeat, we can conclude that the data
does not show any seasonality
What is Partial Auto-correlation?
Partial autocorrelation is the degree of association between two
variables while adjusting the effect of one or more additional variables
The PACF function shows a definite pattern, which does not repeat, we can conclude that the data
does not show any seasonality
PACF
What is Partial Auto-correlation?
• PACF (Partial Auto-Correlation Function) gives partial correlation
of time series with its own lagged values
• It’s value can range from -1 to 1
PACF
The PACF function shows a definite pattern, which does not repeat, we can conclude that the data
does not show any seasonality
USE CASE
Use Case: Time Series Forecasting
Data Description: 10 year air-ticket sales data of airline industry
from 1949-1960
Objective: To predict the airline tickets’ sales of 1961 using Time Series
Analysis
Use Case: Time Series Forecasting
Identificatio
nofthe
important
parameters
and
characteris
tics,which
adequately
describe
thetime
series
behavior
Time Series
Behavior
Time Series
Forecasting
Goal of
Time
Series
Identification of the Time Series components like trend ,
seasonality to describe the behavior
Forecasting the values of the Time Series, depending on its
actual and past values
Use Case: Exploratory Data Analysis
Load the data
Look at the data
It is a Time Series dataset
Use Case: Exploratory Data Analysis
Load the data
Look at the data
It is a Time Series dataset
Use Case: Exploratory Data Analysis
Starting point
Use Case: Exploratory Data Analysis
End point
Use Case: Exploratory Data Analysis
Frequency of the Data
Use Case: Exploratory Data Analysis
Check for the missing values:
Use Case: Exploratory Data Analysis
And view the summary:
Use Case: Exploratory Data Analysis
Let’s plot the raw data using ‘plot’ function:
Use Case: Exploratory Data Analysis
Cycle of the data is:
Use Case: Exploratory Data Analysis
Let’s use the boxplot function to see any seasonal effects:
Use Case: Exploratory Data Analysis
SEASONALITY
TREND
The passenger numbers increase over time indicating an
increasing linear trend
In the boxplot there are more passengers travelling in months 6 to 9,
indicating seasonality with a apparent cycle of 12 months
Thus, we can make some initial inferences:
Use Case: Time Series Decomposition
We will decompose the Time Series
Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity)
Using decompose function in R Applying multiplicative model
Use Case: Time Series Decomposition
We will decompose the Time Series
Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity)
ddata =Decomposed data
Use Case: Time Series Decomposition
We will decompose the Time Series
Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity)
ddata =Decomposed data
The data must have a
constant variance and mean,
right?
Yes, you can easily model your data
if it is Stationary
Let’s fit the model!
Use Case: Fit A Time Series Model
ARIMA Model
Use Case: Fit A Time Series Model
ARIMA Model
Use Case: Fit A Time Series Model
For instructor
ARIMA Model
The ARIMA(2,1,1)(0,1,0)[12] model parameters are:
Lag 1 differencing (d),
An autoregressive term of second lag (p),
A moving average model of order 1 (q)
Then the seasonal model has an autoregressive term of first lag (D) at
model period 12 units, in this case months
Use Case: Fit A Time Series Model
The ARIMA fitted model is:
Y^=0.5960Yt−2+0.2143Yt−12−0.9819et−1+EY^
=0.5960Yt−2+0.2143Yt−12−0.9819et−1+E
where E is error
ARIMA Model
Use Case: Diagnostics
ARIMA Model
Check the plot for the residuals which shows Stationarity:
Use Case: Diagnostics
ARIMA Model
Let’s plot the ACF for residuals:
The residual plots are centered
around 0 as noise, with no pattern.
Hence, the ARIMA model is a fairly
good fit
Use Case: Calculate Forecast
You can plot a forecast of the Time Series using the forecast function, with a 95% confidence interval where h is
the forecast horizon periods in months
Use Case: Calculate Forecast
Can we validate this model?
Validation
To validate the findings of the ARIMA model, we can use the Ljung-Box test
Use Case: Validation
Conclusion
We can arbitrarily select the lag
value. In this case, we take the
lag values as 5, 10, 15
The values of p are quite insignificant
Right, it indicates that our
model is free of auto-correlation
Use Case: Validation
Conclusion
We can conclude from the ARIMA output, that
our model using parameters (2, 1, 1) has been
shown to adequately fit the data
Summary
Arima model Acf and pacf Exploratory data analysis
Forecast and validationTime series decomposition
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data Science | Simplilearn

Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data Science | Simplilearn

  • 1.
  • 2.
    Before implementing TimeSeries, let’s quickly brush up what we have discusses in Part 1 so far..
  • 3.
    What is TimeSeries? It is time-dependent These data points (past values) are analyzed to forecast a future A Time Series is a sequence of data being recorded at specific time intervals
  • 4.
    Time Series isaffected by four main components Trend Seasonality Cyclicity Irregularity
  • 5.
    How do youdifferentiate between a stationary and Non-Stationary time series?Stationarity of Time Series depends on: Mean Variance Co-Variance
  • 6.
    Stationarity of TimeSeries The mean of the series should not be a function of time rather should be a constant Here, mean is constant with time Here, mean is increasing with time Stationary Non-Stationary
  • 7.
    What is that? Wewill give a time code(t) variable to each row indicating each time period, let’s discuss this variable later: t Year Quarter Sales(1000s) MA(4) CMA 1 Year 1 1 2.8 2 2 2.1 3 3 4 3.4 3.5 4 4 4.5 3.6 3.7 5 Year 2 1 3.8 3.9 4.0 6 2 3.2 4.1 4.2 7 3 4.8 4.3 4.3 8 4 5.4 4.4 4.4 9 Year 3 1 4 4.5 4.5 10 2 3.6 4.6 4.7 11 3 5.5 4.7 4.8 12 4 5.8 4.8 4.8 13 Year 4 1 4.3 4.9 4.9 14 2 3.9 5.0 5.1 15 3 6 5.2 16 4 6.4
  • 8.
    Forecast Time Series Herewe can see that the predicted values overlap with the actual values and continues till year 5 which shows the accuracy of our forecasted values
  • 9.
    Now, let’s moveon with our implementation of Time Series using R
  • 10.
    What’s in itfor you? Introduction to ARIMA Model Auto-Correlation & Partial Auto-Correlation Use Case: Forecast the sales of air-tickets using ARIMA Model Validating using Ljung-Box Test
  • 11.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model
  • 12.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model ARIMA stands for Auto Regressive Integrated Moving Average It is specified by three order parameters: p,d,q
  • 13.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model ARIMA models are classified by three factors: p = number of autoregressive terms (AR), d = how many non-seasonal differences are needed to achieve stationarity (I), q = number of lagged forecast errors in the prediction equation (MA)
  • 14.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model AR(p): number of autoregressive terms (AR) Auto-Regressive Parameter(AR), p Example: ARIMA(2,0,0) has a value of p as 2 Degree of Differencing(I), d Moving Average(MA), q
  • 15.
    But what are‘AR terms’?
  • 16.
    In terms ofa regression model, autoregressive components refer to prior values of the current value
  • 17.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model The second AR component would be x(t-2) and so on These are often referred to as lagged terms. So the prior value is called the first lag, and the one prior that the second lag, and so on If x(t)  Current value then AR component = x(t-1) * a Where a = fitted coefficient
  • 18.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model It is equal to the number of non-seasonal differences needed to achieve stationarity 1 level of differencing would mean you take the current value and subtract the prior value from it Auto-Regressive Parameter(AR), p Degree of Differencing(I), d Moving Average(MA), q
  • 19.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model Differencing: Subtracting prior values from the current values: If this series still shows a trend then you can do another level of differencing with the first level differenced series Values 1st Order Differencing Result 5 NA NA 4 4-5 -1 6 6-4 2 7 7-6 1 9 9-7 2 12 12-9 3 12 12-12 0
  • 20.
    Will use ARIMAmodel to forecast the Time Series, let’s have a short introduction to ARIMA model It represents the error of the model as a combination of previous error terms et Auto-Regressive Parameter(AR), p Degree of Differencing(I), d Moving Average(MA), q
  • 21.
    But, ARIMA modelswork on the assumption of stationarity
  • 22.
    Oh you mean,removing trend and seasonality to make the data stationary, right?
  • 23.
    Yes, like wesaw in the previous part
  • 24.
    ACF/PACF In order totest whether or not the series and their error term is auto correlated, we usually use: Auto-correlation function (ACF) Partial Auto-correlation function (PACF)
  • 25.
  • 26.
    Let’s plot thesefunctions to understand better!
  • 27.
    As it crossesthe blue dashed line, it shows that the values are correlated. Hence, non-stationary. ACF Autocorrelation is the similarity between values of a same variable across observations What is Auto-correlation?
  • 28.
    As it crossesthe blue dashed line, it shows that the values are correlated. Hence, non-stationary. ACF What is Auto-correlation? • Auto-Correlation Function(ACF) tells you how correlated points are with each other, based on how many time steps they are separated by. • It is used to determine how past and future data points are related in a time series. It’s value can range from -1 to 1
  • 29.
    When we plotACF for our dataset, it crosses the blue dashed line which indicates that the values are correlated. Hence, non-stationary. ACF R plots 95% significance boundaries as blue dotted lines
  • 30.
    PACF The PACF functionshows a definite pattern, which does not repeat, we can conclude that the data does not show any seasonality What is Partial Auto-correlation? Partial autocorrelation is the degree of association between two variables while adjusting the effect of one or more additional variables
  • 31.
    The PACF functionshows a definite pattern, which does not repeat, we can conclude that the data does not show any seasonality PACF What is Partial Auto-correlation? • PACF (Partial Auto-Correlation Function) gives partial correlation of time series with its own lagged values • It’s value can range from -1 to 1
  • 32.
    PACF The PACF functionshows a definite pattern, which does not repeat, we can conclude that the data does not show any seasonality
  • 33.
  • 34.
    Use Case: TimeSeries Forecasting Data Description: 10 year air-ticket sales data of airline industry from 1949-1960 Objective: To predict the airline tickets’ sales of 1961 using Time Series Analysis
  • 35.
    Use Case: TimeSeries Forecasting Identificatio nofthe important parameters and characteris tics,which adequately describe thetime series behavior Time Series Behavior Time Series Forecasting Goal of Time Series Identification of the Time Series components like trend , seasonality to describe the behavior Forecasting the values of the Time Series, depending on its actual and past values
  • 36.
    Use Case: ExploratoryData Analysis Load the data Look at the data It is a Time Series dataset
  • 37.
    Use Case: ExploratoryData Analysis Load the data Look at the data It is a Time Series dataset
  • 38.
    Use Case: ExploratoryData Analysis Starting point
  • 39.
    Use Case: ExploratoryData Analysis End point
  • 40.
    Use Case: ExploratoryData Analysis Frequency of the Data
  • 41.
    Use Case: ExploratoryData Analysis Check for the missing values:
  • 42.
    Use Case: ExploratoryData Analysis And view the summary:
  • 43.
    Use Case: ExploratoryData Analysis Let’s plot the raw data using ‘plot’ function:
  • 44.
    Use Case: ExploratoryData Analysis Cycle of the data is:
  • 45.
    Use Case: ExploratoryData Analysis Let’s use the boxplot function to see any seasonal effects:
  • 46.
    Use Case: ExploratoryData Analysis SEASONALITY TREND The passenger numbers increase over time indicating an increasing linear trend In the boxplot there are more passengers travelling in months 6 to 9, indicating seasonality with a apparent cycle of 12 months Thus, we can make some initial inferences:
  • 47.
    Use Case: TimeSeries Decomposition We will decompose the Time Series Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity) Using decompose function in R Applying multiplicative model
  • 48.
    Use Case: TimeSeries Decomposition We will decompose the Time Series Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity) ddata =Decomposed data
  • 49.
    Use Case: TimeSeries Decomposition We will decompose the Time Series Decomposing means separating the original Time Series into its components(trend, seasonality, irregularity) ddata =Decomposed data
  • 50.
    The data musthave a constant variance and mean, right?
  • 51.
    Yes, you caneasily model your data if it is Stationary
  • 52.
  • 53.
    Use Case: FitA Time Series Model ARIMA Model
  • 54.
    Use Case: FitA Time Series Model ARIMA Model
  • 55.
    Use Case: FitA Time Series Model For instructor ARIMA Model The ARIMA(2,1,1)(0,1,0)[12] model parameters are: Lag 1 differencing (d), An autoregressive term of second lag (p), A moving average model of order 1 (q) Then the seasonal model has an autoregressive term of first lag (D) at model period 12 units, in this case months
  • 56.
    Use Case: FitA Time Series Model The ARIMA fitted model is: Y^=0.5960Yt−2+0.2143Yt−12−0.9819et−1+EY^ =0.5960Yt−2+0.2143Yt−12−0.9819et−1+E where E is error ARIMA Model
  • 57.
    Use Case: Diagnostics ARIMAModel Check the plot for the residuals which shows Stationarity:
  • 58.
    Use Case: Diagnostics ARIMAModel Let’s plot the ACF for residuals: The residual plots are centered around 0 as noise, with no pattern. Hence, the ARIMA model is a fairly good fit
  • 59.
    Use Case: CalculateForecast You can plot a forecast of the Time Series using the forecast function, with a 95% confidence interval where h is the forecast horizon periods in months
  • 60.
  • 61.
    Can we validatethis model?
  • 62.
    Validation To validate thefindings of the ARIMA model, we can use the Ljung-Box test
  • 63.
    Use Case: Validation Conclusion Wecan arbitrarily select the lag value. In this case, we take the lag values as 5, 10, 15
  • 64.
    The values ofp are quite insignificant
  • 65.
    Right, it indicatesthat our model is free of auto-correlation
  • 66.
    Use Case: Validation Conclusion Wecan conclude from the ARIMA output, that our model using parameters (2, 1, 1) has been shown to adequately fit the data
  • 67.
    Summary Arima model Acfand pacf Exploratory data analysis Forecast and validationTime series decomposition

Editor's Notes

  • #3 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #5 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #6 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #8 Now the moving average is centered, and is proper for 3rd quarter onwards
  • #10 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #11 Remove title case
  • #12 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #13 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #14 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #15 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #16 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #17 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #18 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #19 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #20 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #21 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #22 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #23 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #24 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #26 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #27 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #48 Decomposing a time series means separating it into its constituent components, which are usually a trend component and an irregular component, and if it is a seasonal time series, a seasonal component
  • #49 Decomposing a time series means separating it into its constituent components, which are usually a trend component and an irregular component, and if it is a seasonal time series, a seasonal component
  • #50 Decomposing a time series means separating it into its constituent components, which are usually a trend component and an irregular component, and if it is a seasonal time series, a seasonal component
  • #51 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #52 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #53 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #62 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #65 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?
  • #66 does fat intake/weight affects cholesterol? Will the area of the house affect the house pricing? Do customer satisfaction influence customer loyalty?