State space methods for signal extraction,
  parameter estimation and forecasting

             Siem Jan Koopman
    http://personal.vu.nl/s.j.koopman

            Department of Econometrics
             VU University Amsterdam
                Tinbergen Institute
                       2012
Regression Model in time series

For a time series of a dependent variable yt and a set of regressors
Xt , we can consider the regression model

    yt = µ + Xt β + εt ,     εt ∼ NID(0, σ 2 ),     t = 1, . . . , n.

For a large n, is it reasonable to assume constant coefficients µ, β
and σ 2 ?
Assume we have strong reasons to believe that µ and β are
time-varying, the model may be written as

  yt = µt + Xt βt + εt ,    µt+1 = µt + ηt ,        βt+1 = βt + ξt .

This is an example of a linear state space model.


                                                                        2 / 35
State Space Model
Linear Gaussian state space model is defined in three parts:

→ State equation:

              αt+1 = Tt αt + Rt ζt ,      ζt ∼ NID(0, Qt ),

→ Observation equation:

                 yt = Zt αt + εt ,     εt ∼ NID(0, Ht ),

→ Initial state distribution α1 ∼ N (a1 , P1 ).

Notice that
  • ζt and εs independent for all t, s, and independent from α1 ;
  • observation yt can be multivariate;
  • state vector αt is unobserved;
  • matrices Tt , Zt , Rt , Qt , Ht determine structure of model.
                                                                    3 / 35
State Space Model
• state space model is linear and Gaussian: therefore properties
  and results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;
• system matrices usually contain unknown parameters;
• estimation has therefore two aspects:
     • measuring the unobservable state (prediction, filtering and
       smoothing);
     • estimation of unknown parameters (maximum likelihood
       estimation);
• state space methods offer a unified approach to a wide range
  of models and techniques: dynamic regression, ARIMA, UC
  models, latent variable models, spline-fitting and many ad-hoc
  filters;
• next, some well-known model specifications in state space
  form ...
                                                                    4 / 35
Regression with time-varying coefficients
General state space model:

            αt+1 = Tt αt + Rt ζt ,     ζt ∼ NID(0, Qt ),
               yt = Zt αt + εt ,     εt ∼ NID(0, Ht ).

Put regressors in Zt (that is Zt = Xt ),

                        Tt = I ,     Rt = I ,

Result is regression model yt = Xt βt + εt with time-varying
coefficient βt = αt following a random walk.
In case Qt = 0, the time-varying regression model reduces to the
standard linear regression model yt = Xt β + εt with β = α1 .
Many dynamic specifications for the regression coefficient can be
considered, see below.
                                                                   5 / 35
AR(1) process in State Space Form
Consider AR(1) model yt+1 = φyt + ζt , in state space form:

           αt+1 = Tt αt + Rt ζt ,      ζt ∼ NID(0, Qt ),
              yt = Zt αt + εt ,     εt ∼ NID(0, Ht ).

with state vector αt and system matrices

                Zt = 1,      Ht = 0,
                Tt = φ,      Rt = 1,       Qt = σ 2

we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 );
  • This is the AR(1) model !

                                                                       6 / 35
AR(2) in State Space Form
Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space
form:

            αt+1 = Tt αt + Rt ζt ,       ζt ∼ NID(0, Qt ),
              yt = Zt αt + εt ,      εt ∼ NID(0, Ht ).

with 2 × 1 state vector αt and system matrices:

           Zt = 1 0 ,         Ht = 0
                  φ1 1                   1
           Tt =        ,          Rt =     ,      Qt = σ 2
                  φ2 0                   0
we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • First state equation implies yt+1 = φ1 yt + α2t + ζt with
    ζt ∼ NID(0, σ 2 );
  • Second state equation implies α2,t+1 = φ2 yt ;
                                                                    7 / 35
MA(1) in State Space Form
Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form:

            αt+1 = Tt αt + Rt ζt ,          ζt ∼ NID(0, Qt ),
               yt = Zt αt + εt ,       εt ∼ NID(0, Ht ).

with 2 × 1 state vector αt and system matrices:

            Zt = 1 0 ,             Ht = 0
                   0 1                      1
           Tt =        ,           Rt =       ,     Qt = σ 2
                   0 0                      θ

we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • First state equation implies yt+1 = α2t + ζt with
    ζt ∼ NID(0, σ 2 );
  • Second state equation implies α2,t+1 = θζt ;
                                                                8 / 35
General ARMA in State Space Form

Consider ARMA(2,1) model

               yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1

in state space form

                       yt
           αt =
                  φ2 yt−1 + θζt
           Zt = 1 0 ,        Ht = 0,
                  φ1 1                   1
           Tt =        ,          Rt =     ,   Qt = σ 2
                  φ2 0                   θ

All ARIMA(p, d, q) models have a (non-unique) state space
representation.


                                                            9 / 35
UC models in State Space Form
State space model: αt+1 = Tt αt + Rt ζt ,    yt = Zt αt + εt .


LL model µt+1 = µt + ηt and yt = µt + εt :
                                                           2
       αt = µt ,     Tt = 1,      Rt = 1,            Qt = ση ,
                                                           2
                      Zt = 1,                        Ht = σε .


LLT model µt+1 = µt + βt + ηt ,      βt+1 = βt + ξt and
yt = µt + εt :
                                                                  2
                                                                 ση 0
       µt                 1 1               1 0
αt =      ,        Tt =       ,     Rt =        ,         Qt =       2 ,
       βt                 0 1               0 1                  0 σξ
                                          2
                   Zt = 1 0 ,       Ht = σε .

                                                                    10 / 35
UC models in State Space Form
State space model: αt+1 = Tt αt + Rt ζt ,   yt = Zt αt + εt .


LLT model with season: µt+1 = µt + βt + ηt ,     βt+1 = βt + ξt ,
S(L)γt+1 = ωt and yt = µt + γt + εt :
                                ′
αt = µt    βt  γt−1 γt−2 ,
                γt
                                                                        
      1   1 0   0   0         2               1                   0    0
     0   1 0   0   0       ση 0 0           0                   1    0
                                  2
                                                                         
Tt = 0
         0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0
                                                                  0    1 ,
                                                                           
     0   0 1   0   0         0 0 σω2         0                   0    0
      0   0 0   1   0                           0                   0    0
                                2
Zt = 1 0 1 0 0 ,          Ht = σε .


                                                                        11 / 35
Kalman Filter


• The Kalman filter calculates the mean and variance of the
  unobserved state, given the observations.
• The state is Gaussian: the complete distribution is
  characterized by the mean and variance.
• The filter is a recursive algorithm; the current best estimate is
  updated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1 , which we assumed
  given.
• There are various ways to initialize when a1 and P1 are
  unknown, which we will not discuss here.




                                                                     12 / 35
Kalman Filter
The unobserved state αt can be estimated from the observations
with the Kalman filter:

                    vt = yt − Z t a t ,
                   Ft = Zt Pt Zt′ + Ht ,
                   Kt = Tt Pt Zt′ Ft−1 ,
                 at+1 = Tt at + Kt vt ,
                Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,

for t = 1, . . . , n and starting with given values for a1 and P1 .
  • Writing Yt = {y1 , . . . , yt },

             at+1 = E(αt+1 |Yt ),          Pt+1 = Var(αt+1 |Yt ).


                                                                      13 / 35
Lemma I : multivariate Normal regression
Suppose x and y are jointly Normally distributed vectors with
means E(x) and E(y ), respectively, with variance matrices Σxx and
Σyy , respectively, and covariance matrix Σxy . Then
                                      −1
                  E(x|y ) = µx + Σxy Σyy (y − µy ),
                Var(x|y ) = Σxx − Σxy Σyy Σ′ .
                                       −1
                                           xy

Define e = x − E(x|y ). Then
                                         −1
           Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ])
                   = Σxx − Σxy Σyy Σ′
                                −1
                                    xy
                   = Var(x|y ),
and
                                           −1
         Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y )
                   = Σxy − Σxy = 0.
                                                                     14 / 35
Lemma II : an extension

The proof of the Kalman filter uses a slight extension of the lemma
for multivariate Normal regression theory.

Suppose x, y and z are jointly Normally distributed vectors with
E(z) = 0 and Σyz = 0. Then
                                              −1
                  E(x|y , z) = E(x|y ) + Σxz Σzz z,
                Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ ,
                                                −1
                                                    xz




Can you give a proof of Lemma II ?
Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ .


                                                                     15 / 35
Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt ,          yt = Zt αt + εt .
  • Writing Yt = {y1 , . . . , yt }, define

             at+1 = E(αt+1 |Yt ),            Pt+1 = Var(αt+1 |Yt );

  • The prediction error is

                       vt = yt − E(yt |Yt−1 )
                          = yt − E(Zt αt + εt |Yt−1 )
                          = yt − Zt E(αt |Yt−1 )
                          = yt − Z t a t ;

  • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0;
  • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht .


                                                                        16 / 35
Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt ,            yt = Zt αt + εt .
  • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for
    j = 1, . . . , t − 1;
  • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 ,
                                      zz
    y = Yt−1 and z = vt = Zt (αt − at ) + εt ;
  • It follows that E(αt+1 |Yt−1 ) = Tt at ;
  • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ;
  • We carry out lemma and obtain the state update

                            at+1 = E(αt+1 |Yt−1 , yt )
                                 = Tt at + Tt Pt Zt′ Ft−1 vt
                                 = Tt at + Kt vt ;

    with Kt = Tt Pt Zt′ Ft−1
                                                                          17 / 35
Kalman Filter : derivation
• Our best prediction of yt is Zt at . When the actual
  observation arrives, calculate the prediction error
  vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new
  best estimates of the state mean is based on both the old
  estimate at and the new information vt :
                        at+1 = Tt at + Kt vt ,
  similarly for the variance:
               Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ .
  Can you derive the updating equation for Pt+1 ?
• The Kalman gain
                          Kt = Tt Pt Zt′ Ft−1
  is the optimal weighting matrix for the new evidence.
• You should be able to replicate the proof of the Kalman filter
  for the Local Level Model (DK, Chapter 2).
                                                                   18 / 35
Kalman Filter Illustration
                                                             10000
       observation        filtered level a_t                         state variance P_t
1250
                                                              9000

1000                                                          8000


 750                                                          7000

                                                              6000
 500
       1880       1900         1920            1940   1960           1880       1900         1920    1940   1960

       prediction error v_t                                          prediction error variance F_t
                                                             25000
 250
                                                             24000

   0                                                         23000

                                                             22000
−250
                                                             21000

       1880       1900         1920            1940   1960           1880       1900         1920    1940   1960



                                                                                                                   19 / 35
Prediction and Filtering
This version of the Kalman filter computes the predictions of the
states directly:

             at+1 = E(αt+1 |Yt ),         Pt+1 = Var(αt+1 |Yt ).

The filtered estimates are defined by

                 at|t = E(αt |Yt ),       Pt|t = Var(αt |Yt ),

for t = 1, . . . , n.
The Kalman filter can be “re-ordered” to compute both:
                                                         ′
          at|t = at + Mt vt ,         Pt|t = Pt − Mt Ft Mt ,
        at+1 = Tt at|t ,              Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,

with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .
Compare Mt with Kt . Verify this version of KF using earlier
derivations.
                                                                           20 / 35
Smoothing
  • The filter calculates the mean and variance conditional on Yt ;
  • The Kalman smoother calculates the mean and variance
    conditional on the full set of observations Yn ;
  • After the filtered estimates are calculated, the smoothing
    recursion starts at the last observations and runs until the
    first.

    αt = E(αt |Yn ),
    ˆ                     Vt = Var(αt |Yn ),
     rt = weighted sum of future innovations,     Nt = Var(rt ),
    Lt = Tt − Kt Zt .
Starting with rn = 0, Nn = 0, the smoothing recursions are given
by
        rt−1 = Ft−1 vt + Lt rt ,    Nt−1 = Ft−1 + L′ Nt Lt ,
                                                   t
          αt = at + Pt rt−1 ,
          ˆ                        Vt = Pt − Pt Nt−1 Pt .
                                                                     21 / 35
Smoothing Illustration

          observations    smoothed state                 4000     V_t
1250

                                                         3500
1000

                                                         3000
 750

                                                         2500
 500
        1880       1900     1920           1940   1960          1880    1900   1920   1940   1960

 0.02     r_t                                        0.000100     N_t


                                                     0.000075
 0.00
                                                     0.000050

−0.02                                                0.000025


        1880       1900     1920           1940   1960          1880    1900   1920   1940   1960



                                                                                                    22 / 35
Filtering and Smoothing
1400
         observation       filtered level
         smoothed level
1300


1200


1100


1000


900


800


700


600


500

  1870   1880      1890    1900       1910   1920   1930   1940   1950   1960   1970



                                                                                       23 / 35
Missing Observations



Missing observations are very easy to handle in Kalman filtering:
  • suppose yj is missing
  • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm
  • proceed further calculations as normal

The filter algorithm extrapolates according to the state equation
until a new observation arrives. The smoother interpolates
between observations.




                                                                   24 / 35
Missing Observations

          observation          a_t
1250


1000


 750


 500
   1870   1880          1890         1900   1910   1920   1930   1940   1950   1960   1970

          P_t
30000



20000



10000


   1870   1880          1890         1900   1910   1920   1930   1940   1950   1960   1970



                                                                                             25 / 35
Missing Observations, Filter and Smoohter

         filtered state                                  P_t
1250                                           30000


1000
                                               20000

750
                                               10000
500
       1880        1900   1920   1940   1960           1880    1900   1920   1940   1960
                                               10000
         smoothed state                                  V_t
1250

                                                7500
1000

                                                5000
750


500                                             2500
       1880        1900   1920   1940   1960           1880    1900   1920   1940   1960



                                                                                           26 / 35
Forecasting




Forecasting requires no extra theory: just treat future observations
as missing:
  • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k
  • proceed further calculations as normal
  • forecast for yj is Zj aj




                                                                       27 / 35
Forecasting

            observation    a_t
 1250


 1000


  750


  500
    1870   1880    1890   1900   1910   1920   1930   1940   1950   1960   1970   1980   1990   2000
50000
            P_t

40000

30000

20000

10000

   1870    1880    1890   1900   1910   1920   1930   1940   1950   1960   1970   1980   1990   2000



                                                                                                       28 / 35
Parameter Estimation
The system matrices in a state space model typically depends on a
parameter vector ψ. The model is completely Gaussian; we
estimate by Maximum Likelihood.
The loglikelihood af a time series is
                                 n
                      log L =         log p(yt |Yt−1 ).
                                t=1

In the state space model, p(yt |Yt−1 ) is a Gaussian density with
mean at and variance Ft :
                                         n
                     nN          1
         log L = −      log 2π −              log |Ft | + vt′ Ft−1 vt ,
                      2          2
                                        t=1

with vt and Ft from the Kalman filter. This is called the prediction
error decomposition of the likelihood. Estimation proceeds by
numerically maximising log L.
                                                                          29 / 35
ML Estimate of Nile Data
1400
         observation           level (q = 0)
         level (q = 1000)      level (q = 0.0973 = ML estimate)
1300


1200


1100


1000


900


800


700


600


500

  1870   1880      1890       1900      1910      1920       1930   1940   1950   1960   1970



                                                                                                30 / 35
Diagnostics


• Null hypothesis: standardised residuals
                            √
                        vt / F t ∼ NID(0, 1)

• Apply standard test for Normality, heteroskedasticity, serial
  correlation;
• A recursive algorithm is available to calculate smoothed
  disturbances (auxilliary residuals), which can be used to detect
  breaks and outliers;
• Model comparison and parameter restrictions: use likelihood
  based procedures (LR test, AIC, BIC).



                                                                     31 / 35
Nile Data Residuals Diagnostics
                                                                    Correlogram
                                                          1.0
       Residual Nile                                                   Residual Nile
 2
                                                          0.5


 0                                                        0.0


                                                         −0.5
−2

       1880        1900     1920       1940   1960              0             5        10       15       20
                                                                    QQ plot
0.5
       N(s=0.996)                                                      normal
                                                           2
0.4

0.3
                                                           0
0.2

0.1
                                                          −2

  −4   −3     −2       −1   0      1      2   3      4                −2          −1        0        1   2



                                                                                                              32 / 35
Assignment 1 - Exercises 1 and 2

1. Formulate the following models in state space form:
     •   ARMA(3,1) process;
     •   ARMA(1,3) process;
     •   ARIMA(3,1,1) process;
     •   AR(3) plus noise model;
     •   Random Walk plus AR(3) process.
   Please discuss the initial conditions for αt in all cases.
2. Derive a Kalman filter for the local level model
                                   2                         2
     yt = µ t + ξ t ,   ξt ∼ N(0, σξ ),   ∆µt+1 = ηt ∼ N(0, ση ),

   with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s.
   Also discuss the problem of missing obervations in this case.


                                                                          33 / 35
Assignment 1 - Exercise 3

3.

Consider a time series for yt that is modelled by a state space
model and consider a time series vector of k exogenous variables
Xt . The dependent time series yt depend on the explanatory
variables Xt in a linear way.
     • How would you introduce Xt as regression effects in the state
       space model ?
     • Can you allow the transition matrix Tt to depend on Xt too ?
     • Can you allow the transition matrix Tt to depend on yt ?
For the last two questions, please also consider maximum
likelihood estimation of coefficients within the system matrices.


                                                                      34 / 35
Assignment 1 - Computing work




• Consider Chapter 2 of Durbin and Koopman book.
• There are 8 figures in Chapter 2.
• Write computer code that can reproduce all these figures in
  Chapter 2, except Fig. 2.4.
• Write a short documentation for the Ox program.




                                                               35 / 35

Paris2012 session2

  • 1.
    State space methodsfor signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
  • 2.
    Regression Model intime series For a time series of a dependent variable yt and a set of regressors Xt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n. For a large n, is it reasonable to assume constant coefficients µ, β and σ 2 ? Assume we have strong reasons to believe that µ and β are time-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt . This is an example of a linear state space model. 2 / 35
  • 3.
    State Space Model LinearGaussian state space model is defined in three parts: → State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), → Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ), → Initial state distribution α1 ∼ N (a1 , P1 ). Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
  • 4.
    State Space Model •state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply; • state vector αt evolves as a VAR(1) process; • system matrices usually contain unknown parameters; • estimation has therefore two aspects: • measuring the unobservable state (prediction, filtering and smoothing); • estimation of unknown parameters (maximum likelihood estimation); • state space methods offer a unified approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-fitting and many ad-hoc filters; • next, some well-known model specifications in state space form ... 4 / 35
  • 5.
    Regression with time-varyingcoefficients General state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I , Result is regression model yt = Xt βt + εt with time-varying coefficient βt = αt following a random walk. In case Qt = 0, the time-varying regression model reduces to the standard linear regression model yt = Xt β + εt with β = α1 . Many dynamic specifications for the regression coefficient can be considered, see below. 5 / 35
  • 6.
    AR(1) process inState Space Form Consider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2 we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
  • 7.
    AR(2) in StateSpace Form Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0 we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
  • 8.
    MA(1) in StateSpace Form Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θ we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
  • 9.
    General ARMA inState Space Form Consider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1 in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θ All ARIMA(p, d, q) models have a (non-unique) state space representation. 9 / 35
  • 10.
    UC models inState Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LL model µt+1 = µt + ηt and yt = µt + εt : 2 αt = µt , Tt = 1, Rt = 1, Qt = ση , 2 Zt = 1, Ht = σε . LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt and yt = µt + εt : 2 ση 0 µt 1 1 1 0 αt = , Tt = , Rt = , Qt = 2 , βt 0 1 0 1 0 σξ 2 Zt = 1 0 , Ht = σε . 10 / 35
  • 11.
    UC models inState Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt , S(L)γt+1 = ωt and yt = µt + γt + εt : ′ αt = µt βt γt−1 γt−2 , γt     1 1 0 0 0  2  1 0 0 0 1 0 0 0 ση 0 0 0 1 0 2    Tt = 0  0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0   0 1 ,  0 0 1 0 0 0 0 σω2 0 0 0 0 0 0 1 0 0 0 0 2 Zt = 1 0 1 0 0 , Ht = σε . 11 / 35
  • 12.
    Kalman Filter • TheKalman filter calculates the mean and variance of the unobserved state, given the observations. • The state is Gaussian: the complete distribution is characterized by the mean and variance. • The filter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained. • To start the recursion, we need a1 and P1 , which we assumed given. • There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
  • 13.
    Kalman Filter The unobservedstate αt can be estimated from the observations with the Kalman filter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ , for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
  • 14.
    Lemma I :multivariate Normal regression Suppose x and y are jointly Normally distributed vectors with means E(x) and E(y ), respectively, with variance matrices Σxx and Σyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xy Define e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ), and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
  • 15.
    Lemma II :an extension The proof of the Kalman filter uses a slight extension of the lemma for multivariate Normal regression theory. Suppose x, y and z are jointly Normally distributed vectors with E(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xz Can you give a proof of Lemma II ? Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
  • 16.
    Kalman Filter :derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, define at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
  • 17.
    Kalman Filter :derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
  • 18.
    Kalman Filter :derivation • Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ? • The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence. • You should be able to replicate the proof of the Kalman filter for the Local Level Model (DK, Chapter 2). 18 / 35
  • 19.
    Kalman Filter Illustration 10000 observation filtered level a_t state variance P_t 1250 9000 1000 8000 750 7000 6000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 prediction error v_t prediction error variance F_t 25000 250 24000 0 23000 22000 −250 21000 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 19 / 35
  • 20.
    Prediction and Filtering Thisversion of the Kalman filter computes the predictions of the states directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). The filtered estimates are defined by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ), for t = 1, . . . , n. The Kalman filter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ , with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 . Compare Mt with Kt . Verify this version of KF using earlier derivations. 20 / 35
  • 21.
    Smoothing •The filter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the filtered estimates are calculated, the smoothing recursion starts at the last observations and runs until the first. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt . Starting with rn = 0, Nn = 0, the smoothing recursions are given by rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
  • 22.
    Smoothing Illustration observations smoothed state 4000 V_t 1250 3500 1000 3000 750 2500 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 0.02 r_t 0.000100 N_t 0.000075 0.00 0.000050 −0.02 0.000025 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 22 / 35
  • 23.
    Filtering and Smoothing 1400 observation filtered level smoothed level 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 23 / 35
  • 24.
    Missing Observations Missing observationsare very easy to handle in Kalman filtering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normal The filter algorithm extrapolates according to the state equation until a new observation arrives. The smoother interpolates between observations. 24 / 35
  • 25.
    Missing Observations observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 P_t 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 25 / 35
  • 26.
    Missing Observations, Filterand Smoohter filtered state P_t 1250 30000 1000 20000 750 10000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 10000 smoothed state V_t 1250 7500 1000 5000 750 500 2500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 26 / 35
  • 27.
    Forecasting Forecasting requires noextra theory: just treat future observations as missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
  • 28.
    Forecasting observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 50000 P_t 40000 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 28 / 35
  • 29.
    Parameter Estimation The systemmatrices in a state space model typically depends on a parameter vector ψ. The model is completely Gaussian; we estimate by Maximum Likelihood. The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1 In the state space model, p(yt |Yt−1 ) is a Gaussian density with mean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1 with vt and Ft from the Kalman filter. This is called the prediction error decomposition of the likelihood. Estimation proceeds by numerically maximising log L. 29 / 35
  • 30.
    ML Estimate ofNile Data 1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate) 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
  • 31.
    Diagnostics • Null hypothesis:standardised residuals √ vt / F t ∼ NID(0, 1) • Apply standard test for Normality, heteroskedasticity, serial correlation; • A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers; • Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
  • 32.
    Nile Data ResidualsDiagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5 −2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot 0.5 N(s=0.996) normal 2 0.4 0.3 0 0.2 0.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
  • 33.
    Assignment 1 -Exercises 1 and 2 1. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases. 2. Derive a Kalman filter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
  • 34.
    Assignment 1 -Exercise 3 3. Consider a time series for yt that is modelled by a state space model and consider a time series vector of k exogenous variables Xt . The dependent time series yt depend on the explanatory variables Xt in a linear way. • How would you introduce Xt as regression effects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ? For the last two questions, please also consider maximum likelihood estimation of coefficients within the system matrices. 34 / 35
  • 35.
    Assignment 1 -Computing work • Consider Chapter 2 of Durbin and Koopman book. • There are 8 figures in Chapter 2. • Write computer code that can reproduce all these figures in Chapter 2, except Fig. 2.4. • Write a short documentation for the Ox program. 35 / 35