Arima model (time series)

TIME SERIES -ARIMA MODELLING
By:
Kumar P

Model Building For ARIMA
time series
Consists of three steps
1. Identification
2. Estimation
3. Diagnostic checking

ARIMA Model building
Identification
Determination of p, d and q
Auto Regressive Integrated Moving Average Method

A non seasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where:
•p is the number of autoregressive terms,
•d is the number of non seasonal differences needed for stationarity, and
•q is the number of lagged forecast errors in the prediction equation.
•Stationary Series: A stationary series has no trend, its variations around its mean have a constant
amplitude. A non stationary series is made stationary by differencing
ARIMA MODEL

To identify an ARIMA(p,d,q) we use extensively
the autocorrelation function
{ρh : -∞ < h < ∞}
and
the partial autocorrelation function,
{Φkk: 0 ≤ k < ∞}.

The definition of the sample covariance function
{Cx(h) : -∞ < h < ∞}
and the sample autocorrelation function
{rh: -∞ < h < ∞}
are given below:
( ) ( )( )∑
−
=
+ −−=
hT
t
httx xxxx
T
hC
1
1
( )
( )0
and
x
x
h
C
hC
r = The divisor is T, some
statisticians use T – h
(If T is large, both give
approximately the
same results.)

It can be shown that:
( ) ∑
∞
−∞=
++ ≈
t
kttkhh
T
rrCov ρρ
1
,
Thus
( )






+≈≈ ∑∑ =
∞
−∞=
q
t
t
t
th r
TT
rVar
1
22
21
11
ρ
Assuming ρk = 0 for k > q
∑=
+=
q
t
tr r
T
s h
1
2
21
1
Let

The sample partial autocorrelation function is defined
by:
1
1
1
1
1
ˆ
21
21
11
21
21
11








−−
−
−
−−
=Φ
kk
k
k
kkk
kk
rr
rr
rr
rrr
rr
rr

It can be shown that:
( ) T
Var kk
1ˆ ≈Φ
T
s
kk
1
Let ˆ =Φ

Identification of an ARIMA process
Determining the values of p,d,q
Steps for ARIMA MODEL
• Visualization
• ACF and PCF plot
• Seasonal variation modelling
• Stationary check
• Identifying p,d,q for non seasonal series
• Model development
• Validating accuracy
• Selecting best model

• Recall that if a process is stationary one of the
roots of the autoregressive operator is equal to
one.
• This will cause the limiting value of the
autocorrelation function to be non-zero.
• Thus a nonstationary process is identified by
an autocorrelation function that does not tail
away to zero quickly or cut-off after a finite
number of steps.

To determine the value of d
Note: the autocorrelation function for a stationary ARMA
time series satisfies the following difference equation
1 1 2 2h h h p h pρ β ρ β ρ β ρ− − −= + + +
The solution to this equation has general form
1 2
1 2
1 1 1
h ph h h
p
c c c
r r r
ρ = + + +
where r1, r2, r1, … rp, are the roots of the polynomial
( ) 2
1 21 p
px x x xβ β β β= − − − −

For a stationary ARMA time series
Therefore
1 2
1 2
1 1 1
0 ash ph h h
p
c c c h
r r r
ρ = + + + → → ∞
The roots r1, r2, r1, … rp, have absolute value greater than 1.
If the ARMA time series is non-stationary
some of the roots r1, r2, r1, … rp, have absolute value
equal to 1, and
1 2
1 2
1 1 1
0 ash ph h h
p
c c c a h
r r r
ρ = + + + → ≠ → ∞

0
0.5
1
0
3
6
9
12
15
18
21
24
27
30
stationary
0
0.5
1
0
3
6
9
12
15
18
21
24
27
30
non-stationary

• If the process is non-stationary then first
differences of the series are computed to
determine if that operation results in a
stationary series.
• The process is continued until a stationary
time series is found.
• This then determines the value of d.

Identification
Determination of the values of p and q.

To determine the value of p and q we use the
graphical properties of the autocorrelation
function and the partial autocorrelation function.
Again recall the following:
Auto-correlation
function
Partial
Autocorrelation
function
Cuts off
Cuts off
Infinite. Tails off.
Damped Exponentials
and/or Cosine waves
Infinite. Tails off.
Infinite. Tails off.Infinite. Tails off.
Dominated by damped
Exponentials & Cosine
waves.
Dominated by damped
Exponentials & Cosine waves
Damped Exponentials
and/or Cosine waves
after q-p.
after p-q.
Process MA(q) AR(p) ARMA(p,q)
Properties of the ACF and PACF of MA, AR and ARMA Series

Summary: To determine p and q.
Use the following table.
MA(q) AR(p) ARMA(p,q)
ACF Cuts after q Tails off Tails off
PACF Tails off Cuts after p Tails off
Note: Usually p + q ≤ 4. There is no harm in over
identifying the time series. (allowing more parameters in
the model than necessary. We can always test to
determine if the extra parameters are zero.)

Examples Using R
IMPORTANT PACKAGES:forecast, tseries, TTR, fpp
Reference link:
https://www.otexts.org/fpp

DATA
Time Series:
Start = 1
End = 72
Frequency = 1
USAccDeaths:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1973 9007 8106 8928 9137 10017 10826 11317 10744 9713 9938 9161 8927
1974 7750 6981 8038 8422 8714 9512 10120 9823 8743 9129 8710 8680
1975 8162 7306 8124 7870 9387 9556 10093 9620 8285 8466 8160 8034
1976 7717 7461 7767 7925 8623 8945 10078 9179 8037 8488 7874 8647
1977 7792 6957 7726 8106 8890 9299 10625 9302 8314 8850 8265 8796
1978 7836 6892 7791 8192 9115 9434 10484 9827 9110 9070 8633 9240

Series Plot:
Seasonality is present

Moving Average plot with window of 12
R Code:
plot(USAccDeaths)
sm <-ma(USAccDeaths,order=12)
lines(sm,col="red")
Downward trend is observed

plot.ts(USAccDeaths$USAccDeaths)
Sm <-ewmaSmooth(USAccDeaths$Index,USAccDeaths$USAccDeaths,lambda=0.2)
lines(sm,col="red")
Exponential Smoothing Method

Exponential Smoothing modelling using HoltWinters methods
R Code:
USAccforecasts <-HoltWinters(USAccDeaths$USAccDeaths, beta=FALSE,
gamma=FALSE)
print(USAccforecasts)
plot(USAccforecasts)
Holt-Winters exponential smoothing without trend and without seasonal component.
Call:
HoltWinters(x = USAccDeaths$USAccDeaths, beta = FALSE, gamma = FALSE)
Smoothing parameters:
alpha: 0.9999339
beta : FALSE
gamma: FALSE
Coefficients:
[,1]
a 9239.96

ACF and PACF plot
s) Augmented Dickey-Fuller Test data: USAccDeaths Dickey-Fuller = -3.8221, Lag order = 4, p-value = 0.02268 alternative h
hs) Augmented Dickey-Fuller Test data: USAccDeaths Dickey-Fuller = -3.8221, Lag order = 4, p-value = 0.02268 alternative h
Test of Stationarity: Augumented Dicky fuller test
(adf test)
adf.test(USAccDeaths)-R code
R output:
Augmented Dickey-Fuller Test
data: USAccDeaths
Dickey-Fuller = -3.8221, Lag order = 4, p-value =
0.02268
alternative hypothesis: stationary
* Since p-value=0.02268 < 0.05 , hence series
stationary

ACF and PACF plot
After taking first difference to remove seasonality

Identification
Seasonal Model : d=1, p=1, q=0
Difference series: d=0,p=4,q=0

Model Result in R
arima(x = USAccDeaths, order = c(4, 0, 0), seasonal = list(order = c(1, 1, 0)))
Coefficients:
ar1 ar2 ar3 ar4 sar1
-0.0833 0.0397 -0.0937 -0.2902 0.0938
s.e. 0.3897 0.1289 0.1139 0.1171 0.4056
sigma^2 estimated as 482426: log likelihood = -565.5, aic = 1143.01

Model Result in R
MODEL <-arima(USAccDeaths, order = c(4,0,0), seasonal = list(order = c(1,1,0)))
Predicted values for next 12 months: 1979
predict(MODEL,n.ahead=12)
$pred
Time Series:
Start = 73
End = 84
Frequency = 1
[1] 9413.415 9478.352 9550.039 9354.267 9317.011 9286.789 9285.364 9344.589 9353.242 9363.774 9358.105 9340.997
$se
Time Series:
Start = 73
End = 84
Frequency = 1
[1] 694.5690 987.4344 1230.8984 1400.4522 1478.0576 1554.3684 1623.6649 1701.3760 1791.3228 1875.7332
[11] 1957.1237 2030.7793

Estimation
of ARIMA parameters

Preliminary Estimation
Using the Method of moments
Equate sample statistics to population
paramaters

Estimation of parameters of an MA(q) series
The theoretical autocorrelation function in terms the
parameters of an MA(q) process is given by.





>
≤≤
++++
+++
=
−+
qh
qh
q
qhqhh
h
0
1
1 22
2
2
1
11
ααα
ααααα
ρ 

To estimate α1, α2, … , αq we solve the system of
equations:
qhr
q
qhqhh
h ≤≤
++++
+++
=
−+
1
ˆˆˆ1
ˆˆˆˆˆ
22
2
2
1
11
ααα
ααααα



This set of equations is non-linear and generally very
difficult to solve
For q = 1 the equation becomes:
Thus
2
1
1
1
ˆ1
ˆ
α
α
+
=r
( ) 0ˆˆ1 11
2
1 =−+ αα r
or 0ˆˆ 11
2
11 =+− rr αα
This equation has the two solutions
1
4
1
2
1
ˆ 2
11
1 −±=
rr
α
One solution will result in the MA(1) time series being invertible

For q = 2 the equations become:
2
2
2
1
211
1
ˆˆ1
ˆˆˆ
αα
ααα
++
+
=r
2
2
2
1
2
2
ˆˆ1
ˆ
αα
α
++
=r

Estimation of parameters of an
ARMA(p,q) series
We use a similar technique.
Namely: Obtain an expression for ρh in terms β1,
β2 , ... , βp ; α1, α1, ... , αq of and set up q + p
equations for the estimates of β1, β2 , ... , βp ; α1,
α2, ... , αq by replacing ρh by rh.

Estimation of parameters of an ARMA(p,q) series
( )( )
112
11
2
1
1111
1
21
1
βρρ
βαα
βαβα
ρ
=
++
++
=
Example: The ARMA(1,1) process
The expression for ρ1 and ρ2 in terms of β1 and α1
are:
Further
( ) ( )0
21
1
11
2
1
2
12
xtuVar σ
βαα
β
σ
++
−
==

( )( )
112
11
2
1
1111
1
ˆ
ˆˆ2ˆ1
ˆˆˆˆ1
β
βαα
βαβα
rr
r
=
++
++
=
Thus the expression for the estimates of β1, α1,
and σ2
are :
and
( )0
ˆˆ2ˆ1
ˆ1
ˆ
11
2
1
2
12
xC
βαα
β
σ
++
−
=

( ) ( )( )111111
2
11
1
2
1
ˆˆˆˆ1ˆˆ2ˆ1
andˆ
βαβαβαα
β
++=++
=
r
r
r
Hence
or






+





+=





++
1
2
1
1
2
1
1
2
1
2
11
ˆˆ1ˆ2ˆ1
r
r
r
r
r
r
r αααα
This is a quadratic equation which can be solved
0ˆ12ˆ
1
2
112
1
2
2
2
2
1
1
2
1 =





−+





−−+





−
r
r
r
r
r
r
r
r
r αα

Example: For ARIMA
the time series was identified as either an
ARIMA(1,0,1) time series or an ARIMA(0,1,1)
series.
If we use the first identification then series xt is an
ARMA(1,1) series.

Identifying the series xt is an ARMA(1,1) series.
The autocorrelation at lag 1 is r1 = 0.570 and the
autocorrelation at lag 2 is r2 = 0.495 .
Thus the estimate of β1 is 0.495/0.570 = 0.87.
Also the quadratic equation
becomes
0ˆ12ˆ
1
2
112
1
2
2
2
2
1
1
2
1 =





−+





−−+





−
r
r
r
r
r
r
r
r
r αα
02984.0ˆ7642.0ˆ2984.0 1
2
1 =++ αα
which has the two solutions -0.48 and -2.08. Again we select
as our estimate of α1 to be the solution -0.48, resulting in an
invertible estimated series.

Since δ = µ(1 - β1) the estimate of δ can be computed as
follows:
Thus the identified model in this case is
xt = 0.87 xt-1 + ut - 0.48 ut-1 + 2.25
( ) 25.2)87.01(062.17ˆ1ˆ
1 =−=−= βδ x

If we use the second identification then series
∆xt = xt – xt-1 is an MA(1) series.
Thus the estimate of α1 is:
1
4
1
2
1
ˆ 2
11
1 −±=
rr
α
The value of r1 = -0.413.
Thus the estimate of α1 is:
( ) ( ) 


−
−
=−
−
±
−
=
53.0
89.1
1
413.04
1
413.02
1
ˆ 21α
The estimate of α1 = -0.53, corresponds to an invertible
time series. This is the solution that we will choose

The estimate of the parameter µ is the sample mean.
Thus the identified model in this case is:
∆xt = ut - 0.53 ut-1 + 0.002 or
xt = xt-1 + ut - 0.53 ut-1 + 0.002
This compares with the other identification:
xt = 0.87 xt-1 + ut - 0.48 ut-1 + 2.25
(An ARIMA(1,0,1) model)
(An ARIMA(0,1,1) model)

Preliminary Estimation
of the Parameters of an AR(p)
Process

( )
pp ρβρβ
σ
σ
−−−
=
11
2
1
0
111 1 −++= pp ρββρ 
2112 −++= pp ρβρβρ 

and
111 ppp βρβρ ++= − 
The regression coefficients β1, β2, …., βp and the auto correlation function ρh satisfy the Yule-Walker equations:

( ) ( )ppx rrC ββσ ˆˆ10ˆ 11
2
−−−×= 
111
ˆ1ˆ
−++= pprr ββ 
2112
ˆˆ
−++= pprrr ββ 

and
1ˆˆ
11 ppp rr ββ ++= − 
The Yule-Walker equations can be used to estimate the
regression coefficients β1, β2, …., βp using the sample auto
correlation function rh by replacing ρh with rh.

Example
Considering the data in example 1 (Sunspot Data) the time series
was identified as an AR(2) time series .
The autocorrelation at lag 1 is r1 = 0.807 and the autocorrelation
at lag 2 is r2 = 0.429 .
The equations for the estimators of the parameters of this series
are
4290ˆ0001ˆ8070
8070ˆ8070ˆ0001
21
21
...
...
=+
=+
ββ
ββ
which has solution
6370ˆ
321.1ˆ
2
1
.−=
=
β
β
Since δ = µ( 1 -β1 - β2) then it can be estimated as follows:

Thus the identified model in this case is
xt = 1.321 xt-1 -0.637 xt-2 + ut +14.9
( ) ( ) 9.14637.0321.11590.46ˆˆ1ˆ
21 =+−=−−= x ββδ

Maximum Likelihood
Estimation
of the parameters of an ARMA(p,q)
Series

The method of Maximum Likelihood
Estimation selects as estimators of a set of
parameters θ1,θ2, ... , θk , the values that
maximize
L(θ1,θ2, ... , θk) = f(x1,x2, ... , xN;θ1,θ2, ... , θk)
where f(x1,x2, ... , xN;θ1,θ2, ... , θk) is the joint
density function of the observations x1,x2, ... , xN.
L(θ1,θ2, ... , θk) is called the Likelihood function.

It is important to note that:
finding the values -θ1,θ2, ... , θk- to maximize
L(θ1,θ2, ... , θk) is equivalent to finding the
values to maximize l(θ1,θ2, ... , θk) = ln L(θ1,θ2,
... , θk).
l(θ1,θ2, ... , θk) is called the log-Likelihood
function.

Again let {ut : t ∈T} be identically distributed
and uncorrelated with mean zero. In addition
assume that each is normally distributed .
Consider the time series {xt : t ∈T} defined by
the equation:
(*) xt = β1xt-1 + β2xt-2 +... +βpxt-p + δ + ut
+α1ut-1 + α2ut-2 +... +αqut-q

Assume that x1, x2, ...,xN are observations on the
time series up to time t = N.
To estimate the p + q + 2 parameters β1, β2, ...
,βp ; α1, α2, ... ,αq ; δ , σ2
by the method of
Maximum Likelihood estimation we need to find
the joint density function of x1, x2, ...,xN
f(x1, x2, ..., xN |β1, β2, ... ,βp ; α1, α2, ... ,αq , δ, σ2
)
= f(x| β, α, δ ,σ2
).

We know that u1, u2, ...,uN are independent
normal with mean zero and variance σ2
.
Thus the joint density function of u1, u2, ...,uN is
g(u1, u2, ...,uN ; σ2
) = g(u ; σ2
) is given by.
( ) ( )






−





== ∑=
N
t
t
n
N uguug
1
2
2
22
1
2
1
exp
2
1
;;,
σσπ
σσ u

It is difficult to determine the exact density
function of x1,x2, ... , xN from this information
however if we assume that p starting values on
the x-process x* = (x1-p,x2-p, ... , xo) and q starting
values on the u-process u* = (u1-q,u2-q, ... , uo) have
been observed then the conditional distribution
of x = (x1,x2, ... , xN) given x* = (x1-p,x2-p, ... , xo) and
u* = (u1-q,u2-q, ... , uo) can easily be determined.

The system of equations :
x1 = β1x0 + β2x-1 +... +βpx1-p + δ + u1 +α1u0
+ α2u-1 +... + αqu1-q
x2 = β1x1 + β2x0 +... +βpx2-p + δ + u2 +α1u1
+ α2u0 +... +αqu2-q
...
xN= β1xN-1 + β2xN-2 +... +βpxN-p + δ + uN
+α1uN-1 + α2uN-2 +... + αquN-q

can be solved for:
u1 = u1 (x, x*, u*; β, α, δ)
u2 = u2 (x, x*, u*; β, α, δ)
...
uN = uN (x, x*, u*; β, α, δ)
(The jacobian of the transformation is 1)

Then the joint density of x given x* and u* is
given by:
( )2
,,,*,*, σδαβuxxf
( )






−





= ∑=
N
t
t
n
u
1
2
2
,,*,*,
2
1
exp
2
1
δ
σσπ
αβux
( )






−





= δ
σσπ
,,*
2
1
exp
2
1
2
αβS
n
( ) ( )∑=
=
N
t
tuS
1
2
,,*,*,,,*where δδ αβuxαβ

Let:
( )2
**,
,,, σδαβuxx
L
( )






−





= ∑=
N
t
t
n
u
1
2
2
,,*,*,
2
1
exp
2
1
δ
σσπ
αβux
( )






−





= δ
σσπ
,,*
2
1
exp
2
1
2
αβS
n
( ) ( )∑=
=
N
t
tuS
1
2
,,*,*,,,*again δδ αβuxαβ
= “conditional likelihood function”

( ) ( )2
**,
2
**,
,,,ln,,, σδσδ αβαβ uxxuxx
Ll =
( ) ( )∑=
−−−=
N
t
tu
nn
1
2
2
2
,,*,*,
2
1
ln
22
δ
σ
σ αβux
( ) ( ) ( )δ
σ
σπ ,,*
2
1
2ln
2
2ln
2 2
2
αβS
nn
−−−=
“conditional log likelihood function” =

( ) ( )2
**,
2
**,
,,,and,,, σδσδ αβαβ uxxuxx
Ll
( ) ( )∑=
=
N
t
tuS
1
2
,,*,*,,,* δδ αβuxαβ
The values that maximize
are the values
that minimize
δˆ,ˆ,ˆ αβ
( ) ( )δδσ ˆ,ˆ,ˆ*
1ˆ,ˆ,ˆ*,*,
1
ˆ
1
22
αβαβux S
n
u
n
N
t
t == ∑=
with

( ) ( )∑=
=
N
t
tuS
1
2
Comment:
Requires a iterative numerical minimization
procedure to find:
The minimization of:
δˆ,ˆ,ˆ αβ
• Steepest descent
• Simulated annealing
• etc

( ) ( )∑=
=
N
t
tuS
1
2
Comment:
for specific values of
The computation of:
can be achieved by using the forecast equations
δ,,αβ
( )1ˆ 1−−= ttt xxu

( ) ( )∑=
=
N
t
tuS
1
2
Comment:
assumes we know the value of starting values of the
time series {xt| t  T} and {ut| t  T}
The minimization of :
Namely x* and u*.

*ofcomponentsfor the0
*ofcomponentsfor the
u
xx
Approaches:
1. Use estimated values:
2. Use forecasting and backcasting equations to estimate
the values:

Backcasting:
If the time series {xt|t  T} satisfies the equation:
2211 qtqttt uuuu −−− +++++ ααα 
2211 δβββ ++++= −−− ptpttt xxxx 
It can also be shown to satisfy the equation:
2211 qtqttt uuuu +++ +++++ ααα 
2211 δβββ ++++= +++ ptpttt xxxx 
Both equations result in a time series with the same
mean, variance and autocorrelation function:
In the same way that the first equation can be used to
forecast into the future the second equation can be used
to backcast into the past:

*ofcomponentsfor the0
*ofcomponentsfor the
u
xx
Approaches to handling starting values of the series {xt|t  T} and {ut|t  T}
1. Initially start with the values:
2. Estimate the parameters of the model using
Maximum Likelihood estimation and the
conditional Likelihood function.
3. Use the estimated parameters to backcast the
components of x*. The backcasted components of
u* will still be zero.

4. Repeat steps 2 and 3 until the estimates stablize.
This algorithm is an application of the E-M algorithm
This general algorithm is frequently used when there
are missing values.
The E stands for Expectation (using a model to estimate
the missing values)
The M stands for Maximum Likelihood Estimation, the
process used to estimate the parameters of the model.

ARIMA+X=ARIMAX
ARIMA with environmental variable is very important in the
case when external variable start impacting the series
Ex. Flight delay prediction depends not only historical time
series data but external variables like weather condition
(temperature , pressure, humidity, visibility, arrival of other
flights, weighting time etc.)

ARIMA+X=ARIMAX
An ARMAX model simply adds in the covariate on the right hand side:
yt=βxt+ϕ1yt−1+⋯+ϕpyt−p–θ1zt−1–…–θqzt−q+zt
Covariate xt
R function:
riod = NA), xreg = NULL, include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossign
arima(x, order = c(0L, 0L, 0L),seasonal = list(order = c(0L, 0L, 0L), period = NA),
xreg = xt)

Arima model (time series)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Arima model (time series)

Similar to Arima model (time series) (20)

More from Kumar P

More from Kumar P (6)

Recently uploaded

Recently uploaded (20)

Arima model (time series)