- 1. State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
- 2. Regression Model in time series For a time series of a dependent variable yt and a set of regressors Xt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n. For a large n, is it reasonable to assume constant coeﬃcients µ, β and σ 2 ? Assume we have strong reasons to believe that µ and β are time-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt . This is an example of a linear state space model. 2 / 35
- 3. State Space Model Linear Gaussian state space model is deﬁned in three parts: → State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), → Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ), → Initial state distribution α1 ∼ N (a1 , P1 ). Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
- 4. State Space Model • state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply; • state vector αt evolves as a VAR(1) process; • system matrices usually contain unknown parameters; • estimation has therefore two aspects: • measuring the unobservable state (prediction, ﬁltering and smoothing); • estimation of unknown parameters (maximum likelihood estimation); • state space methods oﬀer a uniﬁed approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-ﬁtting and many ad-hoc ﬁlters; • next, some well-known model speciﬁcations in state space form ... 4 / 35
- 5. Regression with time-varying coeﬃcients General state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I , Result is regression model yt = Xt βt + εt with time-varying coeﬃcient βt = αt following a random walk. In case Qt = 0, the time-varying regression model reduces to the standard linear regression model yt = Xt β + εt with β = α1 . Many dynamic speciﬁcations for the regression coeﬃcient can be considered, see below. 5 / 35
- 6. AR(1) process in State Space Form Consider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2 we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
- 7. AR(2) in State Space Form Consider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0 we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
- 8. MA(1) in State Space Form Consider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ). with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θ we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
- 9. General ARMA in State Space Form Consider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1 in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θ All ARIMA(p, d, q) models have a (non-unique) state space representation. 9 / 35
- 10. UC models in State Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LL model µt+1 = µt + ηt and yt = µt + εt : 2 αt = µt , Tt = 1, Rt = 1, Qt = ση , 2 Zt = 1, Ht = σε . LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt and yt = µt + εt : 2 ση 0 µt 1 1 1 0 αt = , Tt = , Rt = , Qt = 2 , βt 0 1 0 1 0 σξ 2 Zt = 1 0 , Ht = σε . 10 / 35
- 11. UC models in State Space Form State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt , S(L)γt+1 = ωt and yt = µt + γt + εt : ′ αt = µt βt γt−1 γt−2 , γt 1 1 0 0 0 2 1 0 0 0 1 0 0 0 ση 0 0 0 1 0 2 Tt = 0 0 −1 −1 −1 , Qt = 0 σξ 0 , Rt = 0 0 1 , 0 0 1 0 0 0 0 σω2 0 0 0 0 0 0 1 0 0 0 0 2 Zt = 1 0 1 0 0 , Ht = σε . 11 / 35
- 12. Kalman Filter • The Kalman ﬁlter calculates the mean and variance of the unobserved state, given the observations. • The state is Gaussian: the complete distribution is characterized by the mean and variance. • The ﬁlter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained. • To start the recursion, we need a1 and P1 , which we assumed given. • There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
- 13. Kalman Filter The unobserved state αt can be estimated from the observations with the Kalman ﬁlter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ , for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
- 14. Lemma I : multivariate Normal regression Suppose x and y are jointly Normally distributed vectors with means E(x) and E(y ), respectively, with variance matrices Σxx and Σyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xy Deﬁne e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ), and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
- 15. Lemma II : an extension The proof of the Kalman ﬁlter uses a slight extension of the lemma for multivariate Normal regression theory. Suppose x, y and z are jointly Normally distributed vectors with E(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xz Can you give a proof of Lemma II ? Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
- 16. Kalman Filter : derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, deﬁne at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
- 17. Kalman Filter : derivation State space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
- 18. Kalman Filter : derivation • Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ? • The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence. • You should be able to replicate the proof of the Kalman ﬁlter for the Local Level Model (DK, Chapter 2). 18 / 35
- 19. Kalman Filter Illustration 10000 observation filtered level a_t state variance P_t 1250 9000 1000 8000 750 7000 6000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 prediction error v_t prediction error variance F_t 25000 250 24000 0 23000 22000 −250 21000 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 19 / 35
- 20. Prediction and Filtering This version of the Kalman ﬁlter computes the predictions of the states directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). The ﬁltered estimates are deﬁned by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ), for t = 1, . . . , n. The Kalman ﬁlter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ , with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 . Compare Mt with Kt . Verify this version of KF using earlier derivations. 20 / 35
- 21. Smoothing • The ﬁlter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the ﬁltered estimates are calculated, the smoothing recursion starts at the last observations and runs until the ﬁrst. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt . Starting with rn = 0, Nn = 0, the smoothing recursions are given by rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
- 22. Smoothing Illustration observations smoothed state 4000 V_t 1250 3500 1000 3000 750 2500 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 0.02 r_t 0.000100 N_t 0.000075 0.00 0.000050 −0.02 0.000025 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 22 / 35
- 23. Filtering and Smoothing 1400 observation filtered level smoothed level 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 23 / 35
- 24. Missing Observations Missing observations are very easy to handle in Kalman ﬁltering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normal The ﬁlter algorithm extrapolates according to the state equation until a new observation arrives. The smoother interpolates between observations. 24 / 35
- 25. Missing Observations observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 P_t 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 25 / 35
- 26. Missing Observations, Filter and Smoohter filtered state P_t 1250 30000 1000 20000 750 10000 500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 10000 smoothed state V_t 1250 7500 1000 5000 750 500 2500 1880 1900 1920 1940 1960 1880 1900 1920 1940 1960 26 / 35
- 27. Forecasting Forecasting requires no extra theory: just treat future observations as missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
- 28. Forecasting observation a_t 1250 1000 750 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 50000 P_t 40000 30000 20000 10000 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 28 / 35
- 29. Parameter Estimation The system matrices in a state space model typically depends on a parameter vector ψ. The model is completely Gaussian; we estimate by Maximum Likelihood. The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1 In the state space model, p(yt |Yt−1 ) is a Gaussian density with mean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1 with vt and Ft from the Kalman ﬁlter. This is called the prediction error decomposition of the likelihood. Estimation proceeds by numerically maximising log L. 29 / 35
- 30. ML Estimate of Nile Data 1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate) 1300 1200 1100 1000 900 800 700 600 500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
- 31. Diagnostics • Null hypothesis: standardised residuals √ vt / F t ∼ NID(0, 1) • Apply standard test for Normality, heteroskedasticity, serial correlation; • A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers; • Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
- 32. Nile Data Residuals Diagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5 −2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot 0.5 N(s=0.996) normal 2 0.4 0.3 0 0.2 0.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
- 33. Assignment 1 - Exercises 1 and 2 1. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases. 2. Derive a Kalman ﬁlter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
- 34. Assignment 1 - Exercise 3 3. Consider a time series for yt that is modelled by a state space model and consider a time series vector of k exogenous variables Xt . The dependent time series yt depend on the explanatory variables Xt in a linear way. • How would you introduce Xt as regression eﬀects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ? For the last two questions, please also consider maximum likelihood estimation of coeﬃcients within the system matrices. 34 / 35
- 35. Assignment 1 - Computing work • Consider Chapter 2 of Durbin and Koopman book. • There are 8 ﬁgures in Chapter 2. • Write computer code that can reproduce all these ﬁgures in Chapter 2, except Fig. 2.4. • Write a short documentation for the Ox program. 35 / 35