Paris2012 session2
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
866
On Slideshare
456
From Embeds
410
Number of Embeds
16

Actions

Shares
Downloads
13
Comments
0
Likes
0

Embeds 410

http://previsions.blogspot.fr 299
http://previsions.blogspot.com 71
http://previsions.blogspot.co.uk 8
http://previsions.blogspot.be 4
http://www.directrss.co.il 4
http://previsions.blogspot.it 4
http://previsions.blogspot.ca 3
http://previsions.blogspot.in 3
http://www.previsions.blogspot.fr 3
http://previsions.blogspot.ru 2
http://previsions.blogspot.hu 2
http://previsions.blogspot.ch 2
http://previsions.blogspot.cz 2
http://previsions.blogspot.com.es 1
http://previsions.blogspot.ie 1
http://previsions.blogspot.dk 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
  • 2. Regression Model in time seriesFor a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n.For a large n, is it reasonable to assume constant coefficients µ, βand σ 2 ?Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .This is an example of a linear state space model. 2 / 35
  • 3. State Space ModelLinear Gaussian state space model is defined in three parts:→ State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),→ Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ),→ Initial state distribution α1 ∼ N (a1 , P1 ).Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
  • 4. State Space Model• state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply;• state vector αt evolves as a VAR(1) process;• system matrices usually contain unknown parameters;• estimation has therefore two aspects: • measuring the unobservable state (prediction, filtering and smoothing); • estimation of unknown parameters (maximum likelihood estimation);• state space methods offer a unified approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-fitting and many ad-hoc filters;• next, some well-known model specifications in state space form ... 4 / 35
  • 5. Regression with time-varying coefficientsGeneral state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I ,Result is regression model yt = Xt βt + εt with time-varyingcoefficient βt = αt following a random walk.In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xt β + εt with β = α1 .Many dynamic specifications for the regression coefficient can beconsidered, see below. 5 / 35
  • 6. AR(1) process in State Space FormConsider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
  • 7. AR(2) in State Space FormConsider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state spaceform: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
  • 8. MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θwe have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
  • 9. General ARMA in State Space FormConsider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θAll ARIMA(p, d, q) models have a (non-unique) state spacerepresentation. 9 / 35
  • 10. UC models in State Space FormState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .LL model µt+1 = µt + ηt and yt = µt + εt : 2 αt = µt , Tt = 1, Rt = 1, Qt = ση , 2 Zt = 1, Ht = σε .LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt andyt = µt + εt : 2 ση 0 µt 1 1 1 0αt = , Tt = , Rt = , Qt = 2 , βt 0 1 0 1 0 σξ 2 Zt = 1 0 , Ht = σε . 10 / 35
  • 11. UC models in State Space FormState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt .LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,S(L)γt+1 = ωt and yt = µt + γt + εt : ′αt = µt βt γt−1 γt−2 , γt     1 1 0 0 0  2  1 0 0 0 1 0 0 0 ση 0 0 0 1 0 2   Tt = 0  0 −1 −1 −1 , Qt =  0 σξ 0  , Rt = 0   0 1 ,  0 0 1 0 0 0 0 σω2 0 0 0 0 0 0 1 0 0 0 0 2Zt = 1 0 1 0 0 , Ht = σε . 11 / 35
  • 12. Kalman Filter• The Kalman filter calculates the mean and variance of the unobserved state, given the observations.• The state is Gaussian: the complete distribution is characterized by the mean and variance.• The filter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained.• To start the recursion, we need a1 and P1 , which we assumed given.• There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
  • 13. Kalman FilterThe unobserved state αt can be estimated from the observationswith the Kalman filter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
  • 14. Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y ), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xyDefine e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ),and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
  • 15. Lemma II : an extensionThe proof of the Kalman filter uses a slight extension of the lemmafor multivariate Normal regression theory.Suppose x, y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xzCan you give a proof of Lemma II ?Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
  • 16. Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, define at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
  • 17. Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
  • 18. Kalman Filter : derivation• Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ?• The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman filter for the Local Level Model (DK, Chapter 2). 18 / 35
  • 19. Kalman Filter Illustration 10000 observation filtered level a_t state variance P_t1250 90001000 8000 750 7000 6000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 prediction error v_t prediction error variance F_t 25000 250 24000 0 23000 22000−250 21000 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 19 / 35
  • 20. Prediction and FilteringThis version of the Kalman filter computes the predictions of thestates directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).The filtered estimates are defined by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ),for t = 1, . . . , n.The Kalman filter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .Compare Mt with Kt . Verify this version of KF using earlierderivations. 20 / 35
  • 21. Smoothing • The filter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the filtered estimates are calculated, the smoothing recursion starts at the last observations and runs until the first. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt .Starting with rn = 0, Nn = 0, the smoothing recursions are givenby rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
  • 22. Smoothing Illustration observations smoothed state 4000 V_t1250 35001000 3000 750 2500 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 0.02 r_t 0.000100 N_t 0.000075 0.00 0.000050−0.02 0.000025 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 22 / 35
  • 23. Filtering and Smoothing1400 observation filtered level smoothed level1300120011001000900800700600500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 23 / 35
  • 24. Missing ObservationsMissing observations are very easy to handle in Kalman filtering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normalThe filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations. 24 / 35
  • 25. Missing Observations observation a_t12501000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 P_t300002000010000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 25 / 35
  • 26. Missing Observations, Filter and Smoohter filtered state P_t1250 300001000 20000750 10000500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 10000 smoothed state V_t1250 75001000 5000750500 2500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 26 / 35
  • 27. ForecastingForecasting requires no extra theory: just treat future observationsas missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
  • 28. Forecasting observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 200050000 P_t40000300002000010000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 28 / 35
  • 29. Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1In the state space model, p(yt |Yt−1 ) is a Gaussian density withmean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1with vt and Ft from the Kalman filter. This is called the predictionerror decomposition of the likelihood. Estimation proceeds bynumerically maximising log L. 29 / 35
  • 30. ML Estimate of Nile Data1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate)1300120011001000900800700600500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
  • 31. Diagnostics• Null hypothesis: standardised residuals √ vt / F t ∼ NID(0, 1)• Apply standard test for Normality, heteroskedasticity, serial correlation;• A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers;• Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
  • 32. Nile Data Residuals Diagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5−2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot0.5 N(s=0.996) normal 20.40.3 00.20.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
  • 33. Assignment 1 - Exercises 1 and 21. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases.2. Derive a Kalman filter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
  • 34. Assignment 1 - Exercise 33.Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way. • How would you introduce Xt as regression effects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ?For the last two questions, please also consider maximumlikelihood estimation of coefficients within the system matrices. 34 / 35
  • 35. Assignment 1 - Computing work• Consider Chapter 2 of Durbin and Koopman book.• There are 8 figures in Chapter 2.• Write computer code that can reproduce all these figures in Chapter 2, except Fig. 2.4.• Write a short documentation for the Ox program. 35 / 35