SlideShare a Scribd company logo
1 of 35
Download to read offline
State space methods for signal extraction,
  parameter estimation and forecasting

             Siem Jan Koopman
    http://personal.vu.nl/s.j.koopman

            Department of Econometrics
             VU University Amsterdam
                Tinbergen Institute
                       2012
Regression Model in time series

For a time series of a dependent variable yt and a set of regressors
Xt , we can consider the regression model

    yt = µ + Xt β + εt ,     εt ∼ NID(0, σ 2 ),     t = 1, . . . , n.

For a large n, is it reasonable to assume constant coefficients µ, β
and σ 2 ?
Assume we have strong reasons to believe that µ and β are
time-varying, the model may be written as

  yt = µt + Xt βt + εt ,    µt+1 = µt + ηt ,        βt+1 = βt + ξt .

This is an example of a linear state space model.


                                                                        2 / 35
State Space Model
Linear Gaussian state space model is defined in three parts:

→ State equation:

              αt+1 = Tt αt + Rt ζt ,      ζt ∼ NID(0, Qt ),

→ Observation equation:

                 yt = Zt αt + εt ,     εt ∼ NID(0, Ht ),

→ Initial state distribution α1 ∼ N (a1 , P1 ).

Notice that
  • ζt and εs independent for all t, s, and independent from α1 ;
  • observation yt can be multivariate;
  • state vector αt is unobserved;
  • matrices Tt , Zt , Rt , Qt , Ht determine structure of model.
                                                                    3 / 35
State Space Model
• state space model is linear and Gaussian: therefore properties
  and results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;
• system matrices usually contain unknown parameters;
• estimation has therefore two aspects:
     • measuring the unobservable state (prediction, filtering and
       smoothing);
     • estimation of unknown parameters (maximum likelihood
       estimation);
• state space methods offer a unified approach to a wide range
  of models and techniques: dynamic regression, ARIMA, UC
  models, latent variable models, spline-fitting and many ad-hoc
  filters;
• next, some well-known model specifications in state space
  form ...
                                                                    4 / 35
Regression with time-varying coefficients
General state space model:

            αt+1 = Tt αt + Rt ζt ,     ζt ∼ NID(0, Qt ),
               yt = Zt αt + εt ,     εt ∼ NID(0, Ht ).

Put regressors in Zt (that is Zt = Xt ),

                        Tt = I ,     Rt = I ,

Result is regression model yt = Xt βt + εt with time-varying
coefficient βt = αt following a random walk.
In case Qt = 0, the time-varying regression model reduces to the
standard linear regression model yt = Xt β + εt with β = α1 .
Many dynamic specifications for the regression coefficient can be
considered, see below.
                                                                   5 / 35
AR(1) process in State Space Form
Consider AR(1) model yt+1 = φyt + ζt , in state space form:

           αt+1 = Tt αt + Rt ζt ,      ζt ∼ NID(0, Qt ),
              yt = Zt αt + εt ,     εt ∼ NID(0, Ht ).

with state vector αt and system matrices

                Zt = 1,      Ht = 0,
                Tt = φ,      Rt = 1,       Qt = σ 2

we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 );
  • This is the AR(1) model !

                                                                       6 / 35
AR(2) in State Space Form
Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space
form:

            αt+1 = Tt αt + Rt ζt ,       ζt ∼ NID(0, Qt ),
              yt = Zt αt + εt ,      εt ∼ NID(0, Ht ).

with 2 × 1 state vector αt and system matrices:

           Zt = 1 0 ,         Ht = 0
                  φ1 1                   1
           Tt =        ,          Rt =     ,      Qt = σ 2
                  φ2 0                   0
we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • First state equation implies yt+1 = φ1 yt + α2t + ζt with
    ζt ∼ NID(0, σ 2 );
  • Second state equation implies α2,t+1 = φ2 yt ;
                                                                    7 / 35
MA(1) in State Space Form
Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form:

            αt+1 = Tt αt + Rt ζt ,          ζt ∼ NID(0, Qt ),
               yt = Zt αt + εt ,       εt ∼ NID(0, Ht ).

with 2 × 1 state vector αt and system matrices:

            Zt = 1 0 ,             Ht = 0
                   0 1                      1
           Tt =        ,           Rt =       ,     Qt = σ 2
                   0 0                      θ

we have
  • Zt and Ht = 0 imply that α1t = yt ;
  • First state equation implies yt+1 = α2t + ζt with
    ζt ∼ NID(0, σ 2 );
  • Second state equation implies α2,t+1 = θζt ;
                                                                8 / 35
General ARMA in State Space Form

Consider ARMA(2,1) model

               yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1

in state space form

                       yt
           αt =
                  φ2 yt−1 + θζt
           Zt = 1 0 ,        Ht = 0,
                  φ1 1                   1
           Tt =        ,          Rt =     ,   Qt = σ 2
                  φ2 0                   θ

All ARIMA(p, d, q) models have a (non-unique) state space
representation.


                                                            9 / 35
UC models in State Space Form
State space model: αt+1 = Tt αt + Rt ζt ,    yt = Zt αt + εt .


LL model µt+1 = µt + ηt and yt = µt + εt :
                                                           2
       αt = µt ,     Tt = 1,      Rt = 1,            Qt = ση ,
                                                           2
                      Zt = 1,                        Ht = σε .


LLT model µt+1 = µt + βt + ηt ,      βt+1 = βt + ξt and
yt = µt + εt :
                                                                  2
                                                                 ση 0
       µt                 1 1               1 0
αt =      ,        Tt =       ,     Rt =        ,         Qt =       2 ,
       βt                 0 1               0 1                  0 σξ
                                          2
                   Zt = 1 0 ,       Ht = σε .

                                                                    10 / 35
UC models in State Space Form
State space model: αt+1 = Tt αt + Rt ζt ,   yt = Zt αt + εt .


LLT model with season: µt+1 = µt + βt + ηt ,     βt+1 = βt + ξt ,
S(L)γt+1 = ωt and yt = µt + γt + εt :
                                ′
αt = µt    βt  γt−1 γt−2 ,
                γt
                                                                        
      1   1 0   0   0         2               1                   0    0
     0   1 0   0   0       ση 0 0           0                   1    0
                                  2
                                                                         
Tt = 0
         0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0
                                                                  0    1 ,
                                                                           
     0   0 1   0   0         0 0 σω2         0                   0    0
      0   0 0   1   0                           0                   0    0
                                2
Zt = 1 0 1 0 0 ,          Ht = σε .


                                                                        11 / 35
Kalman Filter


• The Kalman filter calculates the mean and variance of the
  unobserved state, given the observations.
• The state is Gaussian: the complete distribution is
  characterized by the mean and variance.
• The filter is a recursive algorithm; the current best estimate is
  updated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1 , which we assumed
  given.
• There are various ways to initialize when a1 and P1 are
  unknown, which we will not discuss here.




                                                                     12 / 35
Kalman Filter
The unobserved state αt can be estimated from the observations
with the Kalman filter:

                    vt = yt − Z t a t ,
                   Ft = Zt Pt Zt′ + Ht ,
                   Kt = Tt Pt Zt′ Ft−1 ,
                 at+1 = Tt at + Kt vt ,
                Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,

for t = 1, . . . , n and starting with given values for a1 and P1 .
  • Writing Yt = {y1 , . . . , yt },

             at+1 = E(αt+1 |Yt ),          Pt+1 = Var(αt+1 |Yt ).


                                                                      13 / 35
Lemma I : multivariate Normal regression
Suppose x and y are jointly Normally distributed vectors with
means E(x) and E(y ), respectively, with variance matrices Σxx and
Σyy , respectively, and covariance matrix Σxy . Then
                                      −1
                  E(x|y ) = µx + Σxy Σyy (y − µy ),
                Var(x|y ) = Σxx − Σxy Σyy Σ′ .
                                       −1
                                           xy

Define e = x − E(x|y ). Then
                                         −1
           Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ])
                   = Σxx − Σxy Σyy Σ′
                                −1
                                    xy
                   = Var(x|y ),
and
                                           −1
         Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y )
                   = Σxy − Σxy = 0.
                                                                     14 / 35
Lemma II : an extension

The proof of the Kalman filter uses a slight extension of the lemma
for multivariate Normal regression theory.

Suppose x, y and z are jointly Normally distributed vectors with
E(z) = 0 and Σyz = 0. Then
                                              −1
                  E(x|y , z) = E(x|y ) + Σxz Σzz z,
                Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ ,
                                                −1
                                                    xz




Can you give a proof of Lemma II ?
Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ .


                                                                     15 / 35
Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt ,          yt = Zt αt + εt .
  • Writing Yt = {y1 , . . . , yt }, define

             at+1 = E(αt+1 |Yt ),            Pt+1 = Var(αt+1 |Yt );

  • The prediction error is

                       vt = yt − E(yt |Yt−1 )
                          = yt − E(Zt αt + εt |Yt−1 )
                          = yt − Zt E(αt |Yt−1 )
                          = yt − Z t a t ;

  • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0;
  • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht .


                                                                        16 / 35
Kalman Filter : derivation
State space model: αt+1 = Tt αt + Rt ζt ,            yt = Zt αt + εt .
  • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for
    j = 1, . . . , t − 1;
  • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 ,
                                      zz
    y = Yt−1 and z = vt = Zt (αt − at ) + εt ;
  • It follows that E(αt+1 |Yt−1 ) = Tt at ;
  • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ;
  • We carry out lemma and obtain the state update

                            at+1 = E(αt+1 |Yt−1 , yt )
                                 = Tt at + Tt Pt Zt′ Ft−1 vt
                                 = Tt at + Kt vt ;

    with Kt = Tt Pt Zt′ Ft−1
                                                                          17 / 35
Kalman Filter : derivation
• Our best prediction of yt is Zt at . When the actual
  observation arrives, calculate the prediction error
  vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new
  best estimates of the state mean is based on both the old
  estimate at and the new information vt :
                        at+1 = Tt at + Kt vt ,
  similarly for the variance:
               Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ .
  Can you derive the updating equation for Pt+1 ?
• The Kalman gain
                          Kt = Tt Pt Zt′ Ft−1
  is the optimal weighting matrix for the new evidence.
• You should be able to replicate the proof of the Kalman filter
  for the Local Level Model (DK, Chapter 2).
                                                                   18 / 35
Kalman Filter Illustration
                                                             10000
       observation        filtered level a_t                         state variance P_t
1250
                                                              9000

1000                                                          8000


 750                                                          7000

                                                              6000
 500
       1880       1900         1920            1940   1960           1880       1900         1920    1940   1960

       prediction error v_t                                          prediction error variance F_t
                                                             25000
 250
                                                             24000

   0                                                         23000

                                                             22000
−250
                                                             21000

       1880       1900         1920            1940   1960           1880       1900         1920    1940   1960



                                                                                                                   19 / 35
Prediction and Filtering
This version of the Kalman filter computes the predictions of the
states directly:

             at+1 = E(αt+1 |Yt ),         Pt+1 = Var(αt+1 |Yt ).

The filtered estimates are defined by

                 at|t = E(αt |Yt ),       Pt|t = Var(αt |Yt ),

for t = 1, . . . , n.
The Kalman filter can be “re-ordered” to compute both:
                                                         ′
          at|t = at + Mt vt ,         Pt|t = Pt − Mt Ft Mt ,
        at+1 = Tt at|t ,              Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,

with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .
Compare Mt with Kt . Verify this version of KF using earlier
derivations.
                                                                           20 / 35
Smoothing
  • The filter calculates the mean and variance conditional on Yt ;
  • The Kalman smoother calculates the mean and variance
    conditional on the full set of observations Yn ;
  • After the filtered estimates are calculated, the smoothing
    recursion starts at the last observations and runs until the
    first.

    αt = E(αt |Yn ),
    ˆ                     Vt = Var(αt |Yn ),
     rt = weighted sum of future innovations,     Nt = Var(rt ),
    Lt = Tt − Kt Zt .
Starting with rn = 0, Nn = 0, the smoothing recursions are given
by
        rt−1 = Ft−1 vt + Lt rt ,    Nt−1 = Ft−1 + L′ Nt Lt ,
                                                   t
          αt = at + Pt rt−1 ,
          ˆ                        Vt = Pt − Pt Nt−1 Pt .
                                                                     21 / 35
Smoothing Illustration

          observations    smoothed state                 4000     V_t
1250

                                                         3500
1000

                                                         3000
 750

                                                         2500
 500
        1880       1900     1920           1940   1960          1880    1900   1920   1940   1960

 0.02     r_t                                        0.000100     N_t


                                                     0.000075
 0.00
                                                     0.000050

−0.02                                                0.000025


        1880       1900     1920           1940   1960          1880    1900   1920   1940   1960



                                                                                                    22 / 35
Filtering and Smoothing
1400
         observation       filtered level
         smoothed level
1300


1200


1100


1000


900


800


700


600


500

  1870   1880      1890    1900       1910   1920   1930   1940   1950   1960   1970



                                                                                       23 / 35
Missing Observations



Missing observations are very easy to handle in Kalman filtering:
  • suppose yj is missing
  • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm
  • proceed further calculations as normal

The filter algorithm extrapolates according to the state equation
until a new observation arrives. The smoother interpolates
between observations.




                                                                   24 / 35
Missing Observations

          observation          a_t
1250


1000


 750


 500
   1870   1880          1890         1900   1910   1920   1930   1940   1950   1960   1970

          P_t
30000



20000



10000


   1870   1880          1890         1900   1910   1920   1930   1940   1950   1960   1970



                                                                                             25 / 35
Missing Observations, Filter and Smoohter

         filtered state                                  P_t
1250                                           30000


1000
                                               20000

750
                                               10000
500
       1880        1900   1920   1940   1960           1880    1900   1920   1940   1960
                                               10000
         smoothed state                                  V_t
1250

                                                7500
1000

                                                5000
750


500                                             2500
       1880        1900   1920   1940   1960           1880    1900   1920   1940   1960



                                                                                           26 / 35
Forecasting




Forecasting requires no extra theory: just treat future observations
as missing:
  • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k
  • proceed further calculations as normal
  • forecast for yj is Zj aj




                                                                       27 / 35
Forecasting

            observation    a_t
 1250


 1000


  750


  500
    1870   1880    1890   1900   1910   1920   1930   1940   1950   1960   1970   1980   1990   2000
50000
            P_t

40000

30000

20000

10000

   1870    1880    1890   1900   1910   1920   1930   1940   1950   1960   1970   1980   1990   2000



                                                                                                       28 / 35
Parameter Estimation
The system matrices in a state space model typically depends on a
parameter vector ψ. The model is completely Gaussian; we
estimate by Maximum Likelihood.
The loglikelihood af a time series is
                                 n
                      log L =         log p(yt |Yt−1 ).
                                t=1

In the state space model, p(yt |Yt−1 ) is a Gaussian density with
mean at and variance Ft :
                                         n
                     nN          1
         log L = −      log 2π −              log |Ft | + vt′ Ft−1 vt ,
                      2          2
                                        t=1

with vt and Ft from the Kalman filter. This is called the prediction
error decomposition of the likelihood. Estimation proceeds by
numerically maximising log L.
                                                                          29 / 35
ML Estimate of Nile Data
1400
         observation           level (q = 0)
         level (q = 1000)      level (q = 0.0973 = ML estimate)
1300


1200


1100


1000


900


800


700


600


500

  1870   1880      1890       1900      1910      1920       1930   1940   1950   1960   1970



                                                                                                30 / 35
Diagnostics


• Null hypothesis: standardised residuals
                            √
                        vt / F t ∼ NID(0, 1)

• Apply standard test for Normality, heteroskedasticity, serial
  correlation;
• A recursive algorithm is available to calculate smoothed
  disturbances (auxilliary residuals), which can be used to detect
  breaks and outliers;
• Model comparison and parameter restrictions: use likelihood
  based procedures (LR test, AIC, BIC).



                                                                     31 / 35
Nile Data Residuals Diagnostics
                                                                    Correlogram
                                                          1.0
       Residual Nile                                                   Residual Nile
 2
                                                          0.5


 0                                                        0.0


                                                         −0.5
−2

       1880        1900     1920       1940   1960              0             5        10       15       20
                                                                    QQ plot
0.5
       N(s=0.996)                                                      normal
                                                           2
0.4

0.3
                                                           0
0.2

0.1
                                                          −2

  −4   −3     −2       −1   0      1      2   3      4                −2          −1        0        1   2



                                                                                                              32 / 35
Assignment 1 - Exercises 1 and 2

1. Formulate the following models in state space form:
     •   ARMA(3,1) process;
     •   ARMA(1,3) process;
     •   ARIMA(3,1,1) process;
     •   AR(3) plus noise model;
     •   Random Walk plus AR(3) process.
   Please discuss the initial conditions for αt in all cases.
2. Derive a Kalman filter for the local level model
                                   2                         2
     yt = µ t + ξ t ,   ξt ∼ N(0, σξ ),   ∆µt+1 = ηt ∼ N(0, ση ),

   with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s.
   Also discuss the problem of missing obervations in this case.


                                                                          33 / 35
Assignment 1 - Exercise 3

3.

Consider a time series for yt that is modelled by a state space
model and consider a time series vector of k exogenous variables
Xt . The dependent time series yt depend on the explanatory
variables Xt in a linear way.
     • How would you introduce Xt as regression effects in the state
       space model ?
     • Can you allow the transition matrix Tt to depend on Xt too ?
     • Can you allow the transition matrix Tt to depend on yt ?
For the last two questions, please also consider maximum
likelihood estimation of coefficients within the system matrices.


                                                                      34 / 35
Assignment 1 - Computing work




• Consider Chapter 2 of Durbin and Koopman book.
• There are 8 figures in Chapter 2.
• Write computer code that can reproduce all these figures in
  Chapter 2, except Fig. 2.4.
• Write a short documentation for the Ox program.




                                                               35 / 35

More Related Content

What's hot

Lesson 15: Exponential Growth and Decay
Lesson 15: Exponential Growth and DecayLesson 15: Exponential Growth and Decay
Lesson 15: Exponential Growth and DecayMatthew Leingang
 
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...SSA KPI
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinearNBER
 
Lesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsLesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsMatthew Leingang
 
Final Present Pap1on relibility
Final Present Pap1on relibilityFinal Present Pap1on relibility
Final Present Pap1on relibilityketan gajjar
 
Chapter 5 heterogeneous
Chapter 5 heterogeneousChapter 5 heterogeneous
Chapter 5 heterogeneousNBER
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm Quang Hoang
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Fabian Pedregosa
 
Stochastic Calculus Main Results
Stochastic Calculus Main ResultsStochastic Calculus Main Results
Stochastic Calculus Main ResultsSSA KPI
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubationNBER
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Chiheb Ben Hammouda
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causalityQuang Hoang
 
kinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statekinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statejiang-min zhang
 
7. toda yamamoto-granger causality
7. toda yamamoto-granger causality7. toda yamamoto-granger causality
7. toda yamamoto-granger causalityQuang Hoang
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...Chiheb Ben Hammouda
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 

What's hot (19)

Lesson 15: Exponential Growth and Decay
Lesson 15: Exponential Growth and DecayLesson 15: Exponential Growth and Decay
Lesson 15: Exponential Growth and Decay
 
Basic concepts and how to measure price volatility
Basic concepts and how to measure price volatility Basic concepts and how to measure price volatility
Basic concepts and how to measure price volatility
 
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
Parameter Estimation in Stochastic Differential Equations by Continuous Optim...
 
Hw4sol
Hw4solHw4sol
Hw4sol
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinear
 
Lesson 13: Related Rates Problems
Lesson 13: Related Rates ProblemsLesson 13: Related Rates Problems
Lesson 13: Related Rates Problems
 
Final Present Pap1on relibility
Final Present Pap1on relibilityFinal Present Pap1on relibility
Final Present Pap1on relibility
 
Chapter 5 heterogeneous
Chapter 5 heterogeneousChapter 5 heterogeneous
Chapter 5 heterogeneous
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm
 
Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2Random Matrix Theory and Machine Learning - Part 2
Random Matrix Theory and Machine Learning - Part 2
 
Stochastic Calculus Main Results
Stochastic Calculus Main ResultsStochastic Calculus Main Results
Stochastic Calculus Main Results
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubation
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
4. standard granger causality
4. standard granger causality4. standard granger causality
4. standard granger causality
 
kinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch statekinks and cusps in the transition dynamics of a bloch state
kinks and cusps in the transition dynamics of a bloch state
 
7. toda yamamoto-granger causality
7. toda yamamoto-granger causality7. toda yamamoto-granger causality
7. toda yamamoto-granger causality
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 

Viewers also liked

Paris2012 session4
Paris2012 session4Paris2012 session4
Paris2012 session4Cdiscount
 
Robust sequentiel learning
Robust sequentiel learningRobust sequentiel learning
Robust sequentiel learningCdiscount
 
Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Cdiscount
 
Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Cdiscount
 
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...Cdiscount
 
Prévisions trafic aérien
Prévisions trafic aérienPrévisions trafic aérien
Prévisions trafic aérienCdiscount
 
Prévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnellesPrévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnellesCdiscount
 
Présentation Olivier Biau Random forests et conjoncture
Présentation Olivier Biau Random forests et conjoncturePrésentation Olivier Biau Random forests et conjoncture
Présentation Olivier Biau Random forests et conjonctureCdiscount
 
Forecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business SurveysForecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business SurveysCdiscount
 
Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses Cdiscount
 
Prediction in dynamic Graphs
Prediction in dynamic GraphsPrediction in dynamic Graphs
Prediction in dynamic GraphsCdiscount
 
R2DOCX : R + WORD
R2DOCX : R + WORDR2DOCX : R + WORD
R2DOCX : R + WORDCdiscount
 
Presentation r markdown
Presentation r markdown Presentation r markdown
Presentation r markdown Cdiscount
 
Présentation G.Biau Random Forests
Présentation G.Biau Random ForestsPrésentation G.Biau Random Forests
Présentation G.Biau Random ForestsCdiscount
 
FLTauR - Construction de modèles de prévision sous r avec le package caret
FLTauR - Construction de modèles de prévision sous r avec le package caretFLTauR - Construction de modèles de prévision sous r avec le package caret
FLTauR - Construction de modèles de prévision sous r avec le package caretjfeudeline
 

Viewers also liked (20)

Paris2012 session4
Paris2012 session4Paris2012 session4
Paris2012 session4
 
Robust sequentiel learning
Robust sequentiel learningRobust sequentiel learning
Robust sequentiel learning
 
Scm risques
Scm risquesScm risques
Scm risques
 
Scm prix blé_2012_11_06
Scm prix blé_2012_11_06Scm prix blé_2012_11_06
Scm prix blé_2012_11_06
 
Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06Scm indicateurs prospectifs_2012_11_06
Scm indicateurs prospectifs_2012_11_06
 
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
Prediction of Quantiles by Statistical Learning and Application to GDP Foreca...
 
Prévisions trafic aérien
Prévisions trafic aérienPrévisions trafic aérien
Prévisions trafic aérien
 
Prévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnellesPrévision consommation électrique par processus à valeurs fonctionnelles
Prévision consommation électrique par processus à valeurs fonctionnelles
 
Présentation Olivier Biau Random forests et conjoncture
Présentation Olivier Biau Random forests et conjoncturePrésentation Olivier Biau Random forests et conjoncture
Présentation Olivier Biau Random forests et conjoncture
 
Forecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business SurveysForecasting GDP profile with an application to French Business Surveys
Forecasting GDP profile with an application to French Business Surveys
 
Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses Ranking binaire, agrégation multiclasses
Ranking binaire, agrégation multiclasses
 
Prediction in dynamic Graphs
Prediction in dynamic GraphsPrediction in dynamic Graphs
Prediction in dynamic Graphs
 
R2DOCX : R + WORD
R2DOCX : R + WORDR2DOCX : R + WORD
R2DOCX : R + WORD
 
Gur1009
Gur1009Gur1009
Gur1009
 
Big data with r
Big data with rBig data with r
Big data with r
 
R Devtools
R DevtoolsR Devtools
R Devtools
 
Presentation r markdown
Presentation r markdown Presentation r markdown
Presentation r markdown
 
R versur Python
R versur PythonR versur Python
R versur Python
 
Présentation G.Biau Random Forests
Présentation G.Biau Random ForestsPrésentation G.Biau Random Forests
Présentation G.Biau Random Forests
 
FLTauR - Construction de modèles de prévision sous r avec le package caret
FLTauR - Construction de modèles de prévision sous r avec le package caretFLTauR - Construction de modèles de prévision sous r avec le package caret
FLTauR - Construction de modèles de prévision sous r avec le package caret
 

Similar to Paris2012 session2

On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility usingkkislas
 
Dsp U Lec09 Iir Filter Design
Dsp U   Lec09 Iir Filter DesignDsp U   Lec09 Iir Filter Design
Dsp U Lec09 Iir Filter Designtaha25
 
2003 Ames.Models
2003 Ames.Models2003 Ames.Models
2003 Ames.Modelspinchung
 
Kinematika rotasi
Kinematika rotasiKinematika rotasi
Kinematika rotasirymmanz86
 
Autoregression
AutoregressionAutoregression
Autoregressionjchristo06
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
Volatility derivatives and default risk
Volatility derivatives and default riskVolatility derivatives and default risk
Volatility derivatives and default riskVolatility
 
Introduction to modern time series analysis
Introduction to modern time series analysisIntroduction to modern time series analysis
Introduction to modern time series analysisSpringer
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...yigalbt
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsMatthew Leingang
 
Csm chapters12
Csm chapters12Csm chapters12
Csm chapters12Pamela Paz
 
Hedging, Arbitrage, and Optimality with Superlinear Frictions
Hedging, Arbitrage, and Optimality with Superlinear FrictionsHedging, Arbitrage, and Optimality with Superlinear Frictions
Hedging, Arbitrage, and Optimality with Superlinear Frictionsguasoni
 
7076 chapter5 slides
7076 chapter5 slides7076 chapter5 slides
7076 chapter5 slidesNguyen Mina
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
 

Similar to Paris2012 session2 (20)

On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
 
Cash Settled Interest Rate Swap Futures
Cash Settled Interest Rate Swap FuturesCash Settled Interest Rate Swap Futures
Cash Settled Interest Rate Swap Futures
 
Dsp U Lec09 Iir Filter Design
Dsp U   Lec09 Iir Filter DesignDsp U   Lec09 Iir Filter Design
Dsp U Lec09 Iir Filter Design
 
2003 Ames.Models
2003 Ames.Models2003 Ames.Models
2003 Ames.Models
 
Kinematika rotasi
Kinematika rotasiKinematika rotasi
Kinematika rotasi
 
Autoregression
AutoregressionAutoregression
Autoregression
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Sol67
Sol67Sol67
Sol67
 
Sol67
Sol67Sol67
Sol67
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
Levy processes in the energy markets
Levy processes in the energy marketsLevy processes in the energy markets
Levy processes in the energy markets
 
Volatility derivatives and default risk
Volatility derivatives and default riskVolatility derivatives and default risk
Volatility derivatives and default risk
 
Chapter3 laplace
Chapter3 laplaceChapter3 laplace
Chapter3 laplace
 
Introduction to modern time series analysis
Introduction to modern time series analysisIntroduction to modern time series analysis
Introduction to modern time series analysis
 
The convenience yield implied by quadratic volatility smiles presentation [...
The convenience yield implied by quadratic volatility smiles   presentation [...The convenience yield implied by quadratic volatility smiles   presentation [...
The convenience yield implied by quadratic volatility smiles presentation [...
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functions
 
Csm chapters12
Csm chapters12Csm chapters12
Csm chapters12
 
Hedging, Arbitrage, and Optimality with Superlinear Frictions
Hedging, Arbitrage, and Optimality with Superlinear FrictionsHedging, Arbitrage, and Optimality with Superlinear Frictions
Hedging, Arbitrage, and Optimality with Superlinear Frictions
 
7076 chapter5 slides
7076 chapter5 slides7076 chapter5 slides
7076 chapter5 slides
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
 

More from Cdiscount

Fltau r interface
Fltau r interfaceFltau r interface
Fltau r interfaceCdiscount
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2Cdiscount
 
Introduction à la cartographie avec R
Introduction à la cartographie avec RIntroduction à la cartographie avec R
Introduction à la cartographie avec RCdiscount
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Cdiscount
 
Premier pas de web scrapping avec R
Premier pas de  web scrapping avec RPremier pas de  web scrapping avec R
Premier pas de web scrapping avec RCdiscount
 
Incorporer du C dans R, créer son package
Incorporer du C dans R, créer son packageIncorporer du C dans R, créer son package
Incorporer du C dans R, créer son packageCdiscount
 
Comptabilité Nationale avec R
Comptabilité Nationale avec RComptabilité Nationale avec R
Comptabilité Nationale avec RCdiscount
 
Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cdiscount
 
Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cdiscount
 
RStudio is good for you
RStudio is good for youRStudio is good for you
RStudio is good for youCdiscount
 
R fait du la tex
R fait du la texR fait du la tex
R fait du la texCdiscount
 
Première approche de cartographie sous R
Première approche de cartographie sous RPremière approche de cartographie sous R
Première approche de cartographie sous RCdiscount
 

More from Cdiscount (13)

Fltau r interface
Fltau r interfaceFltau r interface
Fltau r interface
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 
Introduction à la cartographie avec R
Introduction à la cartographie avec RIntroduction à la cartographie avec R
Introduction à la cartographie avec R
 
HADOOP + R
HADOOP + RHADOOP + R
HADOOP + R
 
Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)Parallel R in snow (english after 2nd slide)
Parallel R in snow (english after 2nd slide)
 
Premier pas de web scrapping avec R
Premier pas de  web scrapping avec RPremier pas de  web scrapping avec R
Premier pas de web scrapping avec R
 
Incorporer du C dans R, créer son package
Incorporer du C dans R, créer son packageIncorporer du C dans R, créer son package
Incorporer du C dans R, créer son package
 
Comptabilité Nationale avec R
Comptabilité Nationale avec RComptabilité Nationale avec R
Comptabilité Nationale avec R
 
Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)Cartographie avec igraph sous R (Partie 2)
Cartographie avec igraph sous R (Partie 2)
 
Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1) Cartographie avec igraph sous R (Partie 1)
Cartographie avec igraph sous R (Partie 1)
 
RStudio is good for you
RStudio is good for youRStudio is good for you
RStudio is good for you
 
R fait du la tex
R fait du la texR fait du la tex
R fait du la tex
 
Première approche de cartographie sous R
Première approche de cartographie sous RPremière approche de cartographie sous R
Première approche de cartographie sous R
 

Paris2012 session2

  • 1. State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
  • 2. Regression Model in time series For a time series of a dependent variable yt and a set of regressors Xt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n. For a large n, is it reasonable to assume constant coefficients µ, β and σ 2 ? Assume we have strong reasons to believe that µ and β are time-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt . This is an example of a linear state space model. 2 / 35
  • 3. State Space Model Linear Gaussian state space model is defined in three parts: → State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), → Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ), → Initial state distribution α1 ∼ N (a1 , P1 ). Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
  • 4. State Space Model • state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply; • state vector αt evolves as a VAR(1) process; • system matrices usually contain unknown parameters; • estimation has therefore two aspects: • measuring the unobservable state (prediction, filtering and smoothing); • estimation of unknown parameters (maximum likelihood estimation); • state space methods offer a unified approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-fitting and many ad-hoc filters; • next, some well-known model specifications in state space form ... 4 / 35
  • 5. Regression with time-varying coefficients General state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I , Result is regression model yt = Xt βt + εt with time-varying coefficient βt = αt following a random walk. In case Qt = 0, the time-varying regression model reduces to the standard linear regression model yt = Xt β + εt with β = α1 . Many dynamic specifications for the regression coefficient can be considered, see below. 5 / 35
  • 6. AR(1) process in State Space Form Consider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2 we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
  • 7. AR(2) in State Space Form Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0 we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
  • 8. MA(1) in State Space Form Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θ we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
  • 9. General ARMA in State Space Form Consider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1 in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θ All ARIMA(p, d, q) models have a (non-unique) state space representation. 9 / 35
  • 10. UC models in State Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LL model µt+1 = µt + ηt and yt = µt + εt : 2 αt = µt , Tt = 1, Rt = 1, Qt = ση , 2 Zt = 1, Ht = σε . LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt and yt = µt + εt : 2 ση 0 µt 1 1 1 0 αt = , Tt = , Rt = , Qt = 2 , βt 0 1 0 1 0 σξ 2 Zt = 1 0 , Ht = σε . 10 / 35
  • 11. UC models in State Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt , S(L)γt+1 = ωt and yt = µt + γt + εt : ′ αt = µt βt γt−1 γt−2 , γt     1 1 0 0 0  2  1 0 0 0 1 0 0 0 ση 0 0 0 1 0 2    Tt = 0  0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0   0 1 ,  0 0 1 0 0 0 0 σω2 0 0 0 0 0 0 1 0 0 0 0 2 Zt = 1 0 1 0 0 , Ht = σε . 11 / 35
  • 12. Kalman Filter • The Kalman filter calculates the mean and variance of the unobserved state, given the observations. • The state is Gaussian: the complete distribution is characterized by the mean and variance. • The filter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained. • To start the recursion, we need a1 and P1 , which we assumed given. • There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
  • 13. Kalman Filter The unobserved state αt can be estimated from the observations with the Kalman filter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ , for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
  • 14. Lemma I : multivariate Normal regression Suppose x and y are jointly Normally distributed vectors with means E(x) and E(y ), respectively, with variance matrices Σxx and Σyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xy Define e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ), and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
  • 15. Lemma II : an extension The proof of the Kalman filter uses a slight extension of the lemma for multivariate Normal regression theory. Suppose x, y and z are jointly Normally distributed vectors with E(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xz Can you give a proof of Lemma II ? Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
  • 16. Kalman Filter : derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, define at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
  • 17. Kalman Filter : derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
  • 18. Kalman Filter : derivation • Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ? • The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence. • You should be able to replicate the proof of the Kalman filter for the Local Level Model (DK, Chapter 2). 18 / 35
  • 19. Kalman Filter Illustration 10000 observation filtered level a_t state variance P_t 1250 9000 1000 8000 750 7000 6000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 prediction error v_t prediction error variance F_t 25000 250 24000 0 23000 22000 −250 21000 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 19 / 35
  • 20. Prediction and Filtering This version of the Kalman filter computes the predictions of the states directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). The filtered estimates are defined by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ), for t = 1, . . . , n. The Kalman filter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ , with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 . Compare Mt with Kt . Verify this version of KF using earlier derivations. 20 / 35
  • 21. Smoothing • The filter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the filtered estimates are calculated, the smoothing recursion starts at the last observations and runs until the first. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt . Starting with rn = 0, Nn = 0, the smoothing recursions are given by rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
  • 22. Smoothing Illustration observations smoothed state 4000 V_t 1250 3500 1000 3000 750 2500 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 0.02 r_t 0.000100 N_t 0.000075 0.00 0.000050 −0.02 0.000025 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 22 / 35
  • 23. Filtering and Smoothing 1400 observation filtered level smoothed level 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 23 / 35
  • 24. Missing Observations Missing observations are very easy to handle in Kalman filtering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normal The filter algorithm extrapolates according to the state equation until a new observation arrives. The smoother interpolates between observations. 24 / 35
  • 25. Missing Observations observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 P_t 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 25 / 35
  • 26. Missing Observations, Filter and Smoohter filtered state P_t 1250 30000 1000 20000 750 10000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 10000 smoothed state V_t 1250 7500 1000 5000 750 500 2500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 26 / 35
  • 27. Forecasting Forecasting requires no extra theory: just treat future observations as missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
  • 28. Forecasting observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 50000 P_t 40000 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 28 / 35
  • 29. Parameter Estimation The system matrices in a state space model typically depends on a parameter vector ψ. The model is completely Gaussian; we estimate by Maximum Likelihood. The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1 In the state space model, p(yt |Yt−1 ) is a Gaussian density with mean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1 with vt and Ft from the Kalman filter. This is called the prediction error decomposition of the likelihood. Estimation proceeds by numerically maximising log L. 29 / 35
  • 30. ML Estimate of Nile Data 1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate) 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
  • 31. Diagnostics • Null hypothesis: standardised residuals √ vt / F t ∼ NID(0, 1) • Apply standard test for Normality, heteroskedasticity, serial correlation; • A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers; • Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
  • 32. Nile Data Residuals Diagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5 −2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot 0.5 N(s=0.996) normal 2 0.4 0.3 0 0.2 0.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
  • 33. Assignment 1 - Exercises 1 and 2 1. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases. 2. Derive a Kalman filter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
  • 34. Assignment 1 - Exercise 3 3. Consider a time series for yt that is modelled by a state space model and consider a time series vector of k exogenous variables Xt . The dependent time series yt depend on the explanatory variables Xt in a linear way. • How would you introduce Xt as regression effects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ? For the last two questions, please also consider maximum likelihood estimation of coefficients within the system matrices. 34 / 35
  • 35. Assignment 1 - Computing work • Consider Chapter 2 of Durbin and Koopman book. • There are 8 figures in Chapter 2. • Write computer code that can reproduce all these figures in Chapter 2, except Fig. 2.4. • Write a short documentation for the Ox program. 35 / 35