Paris2012 session2

State space methods for signal extraction,
parameter estimation and forecasting

Siem Jan Koopman
http://personal.vu.nl/s.j.koopman

Department of Econometrics
VU University Amsterdam
Tinbergen Institute
2012

Regression Model in time series

For a time series of a dependent variable yt and a set of regressors
Xt , we can consider the regression model

yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n.

For a large n, is it reasonable to assume constant coeﬃcients µ, β
and σ 2 ?
Assume we have strong reasons to believe that µ and β are
time-varying, the model may be written as

yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .

This is an example of a linear state space model.

2 / 35

State Space Model
Linear Gaussian state space model is deﬁned in three parts:

→ State equation:

αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),

→ Observation equation:

yt = Zt αt + εt , εt ∼ NID(0, Ht ),

→ Initial state distribution α1 ∼ N (a1 , P1 ).

Notice that
• ζt and εs independent for all t, s, and independent from α1 ;
• observation yt can be multivariate;
• state vector αt is unobserved;
• matrices Tt , Zt , Rt , Qt , Ht determine structure of model.
3 / 35

State Space Model
• state space model is linear and Gaussian: therefore properties
and results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;
• system matrices usually contain unknown parameters;
• estimation has therefore two aspects:
• measuring the unobservable state (prediction, filtering and
smoothing);
• estimation of unknown parameters (maximum likelihood
estimation);
• state space methods offer a unified approach to a wide range
of models and techniques: dynamic regression, ARIMA, UC
models, latent variable models, spline-fitting and many ad-hoc
filters;
• next, some well-known model specifications in state space
form ...
4 / 35

Regression with time-varying coefficients
General state space model:

yt = Zt αt + εt , εt ∼ NID(0, Ht ).

Put regressors in Zt (that is Zt = Xt ),

Tt = I , Rt = I ,

Result is regression model yt = Xt βt + εt with time-varying
coefficient βt = αt following a random walk.
In case Qt = 0, the time-varying regression model reduces to the
standard linear regression model yt = Xt β + εt with β = α1 .
Many dynamic specifications for the regression coefficient can be
considered, see below.
5 / 35

AR(1) process in State Space Form
Consider AR(1) model yt+1 = φyt + ζt , in state space form:


with state vector αt and system matrices

Zt = 1, Ht = 0,
Tt = φ, Rt = 1, Qt = σ 2

we have
• Zt and Ht = 0 imply that α1t = yt ;
• State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 );
• This is the AR(1) model !

6 / 35

AR(2) in State Space Form
Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space
form:


with 2 × 1 state vector αt and system matrices:

Zt = 1 0 , Ht = 0
φ1 1 1
Tt = , Rt = , Qt = σ 2
φ2 0 0
we have
• First state equation implies yt+1 = φ1 yt + α2t + ζt with
ζt ∼ NID(0, σ 2 );
• Second state equation implies α2,t+1 = φ2 yt ;
7 / 35

MA(1) in State Space Form
Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form:


with 2 × 1 state vector αt and system matrices:

Zt = 1 0 , Ht = 0
0 1 1
Tt = , Rt = , Qt = σ 2
0 0 θ

we have
• First state equation implies yt+1 = α2t + ζt with
ζt ∼ NID(0, σ 2 );
• Second state equation implies α2,t+1 = θζt ;
8 / 35

General ARMA in State Space Form

Consider ARMA(2,1) model

yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1

in state space form

yt
αt =
φ2 yt−1 + θζt
Zt = 1 0 , Ht = 0,
φ1 1 1
Tt = , Rt = , Qt = σ 2
φ2 0 θ

All ARIMA(p, d, q) models have a (non-unique) state space
representation.

9 / 35

UC models in State Space Form
State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .

LL model µt+1 = µt + ηt and yt = µt + εt :
2
αt = µt , Tt = 1, Rt = 1, Qt = ση ,
2
Zt = 1, Ht = σε .

LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt and
yt = µt + εt :
2
ση 0
µt 1 1 1 0
αt = , Tt = , Rt = , Qt = 2 ,
βt 0 1 0 1 0 σξ
2
Zt = 1 0 , Ht = σε .

10 / 35

UC models in State Space Form

LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,
S(L)γt+1 = ωt and yt = µt + γt + εt :
′
αt = µt βt γt−1 γt−2 ,
γt
   
1 1 0 0 0  2  1 0 0
0 1 0 0 0 ση 0 0 0 1 0
2
  
Tt = 0
 0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0
  0 1 ,

0 0 1 0 0 0 0 σω2 0 0 0
0 0 0 1 0 0 0 0
2
Zt = 1 0 1 0 0 , Ht = σε .

11 / 35

Kalman Filter

• The Kalman ﬁlter calculates the mean and variance of the
unobserved state, given the observations.
• The state is Gaussian: the complete distribution is
characterized by the mean and variance.
• The ﬁlter is a recursive algorithm; the current best estimate is
updated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1 , which we assumed
given.
• There are various ways to initialize when a1 and P1 are
unknown, which we will not discuss here.

12 / 35

Kalman Filter
The unobserved state αt can be estimated from the observations
with the Kalman ﬁlter:

vt = yt − Z t a t ,
Ft = Zt Pt Zt′ + Ht ,
Kt = Tt Pt Zt′ Ft−1 ,
at+1 = Tt at + Kt vt ,
Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,

for t = 1, . . . , n and starting with given values for a1 and P1 .
• Writing Yt = {y1 , . . . , yt },

at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).

13 / 35

Lemma I : multivariate Normal regression
Suppose x and y are jointly Normally distributed vectors with
means E(x) and E(y ), respectively, with variance matrices Σxx and
Σyy , respectively, and covariance matrix Σxy . Then
−1
E(x|y ) = µx + Σxy Σyy (y − µy ),
Var(x|y ) = Σxx − Σxy Σyy Σ′ .
−1
xy

Deﬁne e = x − E(x|y ). Then
−1
Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ])
= Σxx − Σxy Σyy Σ′
−1
xy
= Var(x|y ),
and
−1
Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y )
= Σxy − Σxy = 0.
14 / 35

Lemma II : an extension

The proof of the Kalman ﬁlter uses a slight extension of the lemma
for multivariate Normal regression theory.

Suppose x, y and z are jointly Normally distributed vectors with
E(z) = 0 and Σyz = 0. Then
−1
E(x|y , z) = E(x|y ) + Σxz Σzz z,
Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ ,
−1
xz

Can you give a proof of Lemma II ?
Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ .

15 / 35

Kalman Filter : derivation
• Writing Yt = {y1 , . . . , yt }, deﬁne

at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt );

• The prediction error is

vt = yt − E(yt |Yt−1 )
= yt − E(Zt αt + εt |Yt−1 )
= yt − Zt E(αt |Yt−1 )
= yt − Z t a t ;

• It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0;
• The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht .

16 / 35

• We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for
j = 1, . . . , t − 1;
• Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 ,
zz
y = Yt−1 and z = vt = Zt (αt − at ) + εt ;
• It follows that E(αt+1 |Yt−1 ) = Tt at ;
• Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ;
• We carry out lemma and obtain the state update

at+1 = E(αt+1 |Yt−1 , yt )
= Tt at + Tt Pt Zt′ Ft−1 vt
= Tt at + Kt vt ;

with Kt = Tt Pt Zt′ Ft−1
17 / 35

• Our best prediction of yt is Zt at . When the actual
observation arrives, calculate the prediction error
vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new
best estimates of the state mean is based on both the old
estimate at and the new information vt :
at+1 = Tt at + Kt vt ,
similarly for the variance:
Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ .
Can you derive the updating equation for Pt+1 ?
• The Kalman gain
Kt = Tt Pt Zt′ Ft−1
is the optimal weighting matrix for the new evidence.
• You should be able to replicate the proof of the Kalman ﬁlter
for the Local Level Model (DK, Chapter 2).
18 / 35

Kalman Filter Illustration
10000
observation filtered level a_t state variance P_t
1250
9000

1000 8000

750 7000

6000
500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960

prediction error v_t prediction error variance F_t
25000
250
24000

0 23000

22000
−250
21000

1880 1900 1920 1940 1960 1880 1900 1920 1940 1960

19 / 35

Prediction and Filtering
This version of the Kalman filter computes the predictions of the
states directly:

at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).

The filtered estimates are defined by

at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ),

for t = 1, . . . , n.
The Kalman filter can be “re-ordered” to compute both:
′
at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt ,
at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,

with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .
Compare Mt with Kt . Verify this version of KF using earlier
derivations.
20 / 35

Smoothing
• The filter calculates the mean and variance conditional on Yt ;
• The Kalman smoother calculates the mean and variance
conditional on the full set of observations Yn ;
• After the filtered estimates are calculated, the smoothing
recursion starts at the last observations and runs until the
first.

αt = E(αt |Yn ),
ˆ Vt = Var(αt |Yn ),
rt = weighted sum of future innovations, Nt = Var(rt ),
Lt = Tt − Kt Zt .
Starting with rn = 0, Nn = 0, the smoothing recursions are given
by
rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt ,
t
αt = at + Pt rt−1 ,
ˆ Vt = Pt − Pt Nt−1 Pt .
21 / 35

Smoothing Illustration

observations smoothed state 4000 V_t
1250

3500
1000

3000
750

2500
500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960

0.02 r_t 0.000100 N_t

0.000075
0.00
0.000050

−0.02 0.000025

1880 1900 1920 1940 1960 1880 1900 1920 1940 1960

22 / 35

Filtering and Smoothing
1400
observation filtered level
smoothed level
1300

1200

1100

1000

900

800

700

600

500

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

23 / 35

Missing Observations

Missing observations are very easy to handle in Kalman ﬁltering:
• suppose yj is missing
• put vj = 0, Kj = 0 and Fj = ∞ in the algorithm
• proceed further calculations as normal

The ﬁlter algorithm extrapolates according to the state equation
until a new observation arrives. The smoother interpolates
between observations.

24 / 35

Missing Observations

observation a_t
1250

1000

750

500
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

P_t
30000

20000

10000

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

25 / 35

Missing Observations, Filter and Smoohter

filtered state P_t
1250 30000

1000
20000

750
10000
500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960
10000
smoothed state V_t
1250

7500
1000

5000
750

500 2500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960

26 / 35

Forecasting

Forecasting requires no extra theory: just treat future observations
as missing:
• put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k
• proceed further calculations as normal
• forecast for yj is Zj aj

27 / 35

Forecasting

observation a_t
1250

1000

750

500
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
50000
P_t

40000

30000

20000

10000

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

28 / 35

Parameter Estimation
The system matrices in a state space model typically depends on a
parameter vector ψ. The model is completely Gaussian; we
estimate by Maximum Likelihood.
The loglikelihood af a time series is
n
log L = log p(yt |Yt−1 ).
t=1

In the state space model, p(yt |Yt−1 ) is a Gaussian density with
mean at and variance Ft :
n
nN 1
log L = − log 2π − log |Ft | + vt′ Ft−1 vt ,
2 2
t=1

with vt and Ft from the Kalman ﬁlter. This is called the prediction
error decomposition of the likelihood. Estimation proceeds by
numerically maximising log L.
29 / 35

ML Estimate of Nile Data
1400
observation level (q = 0)
level (q = 1000) level (q = 0.0973 = ML estimate)
1300

1200

1100

1000

900

800

700

600

500

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

30 / 35

Diagnostics

• Null hypothesis: standardised residuals
√
vt / F t ∼ NID(0, 1)

• Apply standard test for Normality, heteroskedasticity, serial
correlation;
• A recursive algorithm is available to calculate smoothed
disturbances (auxilliary residuals), which can be used to detect
breaks and outliers;
• Model comparison and parameter restrictions: use likelihood
based procedures (LR test, AIC, BIC).

31 / 35

Nile Data Residuals Diagnostics
Correlogram
1.0
Residual Nile Residual Nile
2
0.5

0 0.0

−0.5
−2

1880 1900 1920 1940 1960 0 5 10 15 20
QQ plot
0.5
N(s=0.996) normal
2
0.4

0.3
0
0.2

0.1
−2

−4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2

32 / 35

Assignment 1 - Exercises 1 and 2

1. Formulate the following models in state space form:
• ARMA(3,1) process;
• ARMA(1,3) process;
• ARIMA(3,1,1) process;
• AR(3) plus noise model;
• Random Walk plus AR(3) process.
Please discuss the initial conditions for αt in all cases.
2. Derive a Kalman ﬁlter for the local level model
2 2
yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ),

with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s.
Also discuss the problem of missing obervations in this case.

33 / 35

Assignment 1 - Exercise 3

3.

Consider a time series for yt that is modelled by a state space
model and consider a time series vector of k exogenous variables
Xt . The dependent time series yt depend on the explanatory
variables Xt in a linear way.
• How would you introduce Xt as regression eﬀects in the state
space model ?
• Can you allow the transition matrix Tt to depend on Xt too ?
• Can you allow the transition matrix Tt to depend on yt ?
For the last two questions, please also consider maximum
likelihood estimation of coeﬃcients within the system matrices.

34 / 35

Assignment 1 - Computing work

• Consider Chapter 2 of Durbin and Koopman book.
• There are 8 ﬁgures in Chapter 2.
• Write computer code that can reproduce all these ﬁgures in
Chapter 2, except Fig. 2.4.
• Write a short documentation for the Ox program.

35 / 35

Paris2012 session2

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Paris2012 session2

Similar to Paris2012 session2 (20)

More from Cdiscount

More from Cdiscount (13)

Paris2012 session2