1.
State space methods for signal extraction, parameter estimation and forecasting Siem Jan Koopman http://personal.vu.nl/s.j.koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2012
2.
Regression Model in time seriesFor a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model yt = µ + Xt β + εt , εt ∼ NID(0, σ 2 ), t = 1, . . . , n.For a large n, is it reasonable to assume constant coeﬃcients µ, βand σ 2 ?Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as yt = µt + Xt βt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .This is an example of a linear state space model. 2 / 35
3.
State Space ModelLinear Gaussian state space model is deﬁned in three parts:→ State equation: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ),→ Observation equation: yt = Zt αt + εt , εt ∼ NID(0, Ht ),→ Initial state distribution α1 ∼ N (a1 , P1 ).Notice that • ζt and εs independent for all t, s, and independent from α1 ; • observation yt can be multivariate; • state vector αt is unobserved; • matrices Tt , Zt , Rt , Qt , Ht determine structure of model. 3 / 35
4.
State Space Model• state space model is linear and Gaussian: therefore properties and results of multivariate normal distribution apply;• state vector αt evolves as a VAR(1) process;• system matrices usually contain unknown parameters;• estimation has therefore two aspects: • measuring the unobservable state (prediction, ﬁltering and smoothing); • estimation of unknown parameters (maximum likelihood estimation);• state space methods oﬀer a uniﬁed approach to a wide range of models and techniques: dynamic regression, ARIMA, UC models, latent variable models, spline-ﬁtting and many ad-hoc ﬁlters;• next, some well-known model speciﬁcations in state space form ... 4 / 35
5.
Regression with time-varying coeﬃcientsGeneral state space model: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).Put regressors in Zt (that is Zt = Xt ), Tt = I , Rt = I ,Result is regression model yt = Xt βt + εt with time-varyingcoeﬃcient βt = αt following a random walk.In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xt β + εt with β = α1 .Many dynamic speciﬁcations for the regression coeﬃcient can beconsidered, see below. 5 / 35
6.
AR(1) process in State Space FormConsider AR(1) model yt+1 = φyt + ζt , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with state vector αt and system matrices Zt = 1, Ht = 0, Tt = φ, Rt = 1, Qt = σ 2we have • Zt and Ht = 0 imply that α1t = yt ; • State equation implies yt+1 = φ1 yt + ζt with ζt ∼ NID(0, σ 2 ); • This is the AR(1) model ! 6 / 35
7.
AR(2) in State Space FormConsider AR(2) model yt+1 = φ1 yt + φ2 yt−1 + ζt , in state spaceform: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 0we have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = φ1 yt + α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = φ2 yt ; 7 / 35
8.
MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1 , in state space form: αt+1 = Tt αt + Rt ζt , ζt ∼ NID(0, Qt ), yt = Zt αt + εt , εt ∼ NID(0, Ht ).with 2 × 1 state vector αt and system matrices: Zt = 1 0 , Ht = 0 0 1 1 Tt = , Rt = , Qt = σ 2 0 0 θwe have • Zt and Ht = 0 imply that α1t = yt ; • First state equation implies yt+1 = α2t + ζt with ζt ∼ NID(0, σ 2 ); • Second state equation implies α2,t+1 = θζt ; 8 / 35
9.
General ARMA in State Space FormConsider ARMA(2,1) model yt = φ1 yt−1 + φ2 yt−2 + ζt + θζt−1in state space form yt αt = φ2 yt−1 + θζt Zt = 1 0 , Ht = 0, φ1 1 1 Tt = , Rt = , Qt = σ 2 φ2 0 θAll ARIMA(p, d, q) models have a (non-unique) state spacerepresentation. 9 / 35
12.
Kalman Filter• The Kalman ﬁlter calculates the mean and variance of the unobserved state, given the observations.• The state is Gaussian: the complete distribution is characterized by the mean and variance.• The ﬁlter is a recursive algorithm; the current best estimate is updated whenever a new observation is obtained.• To start the recursion, we need a1 and P1 , which we assumed given.• There are various ways to initialize when a1 and P1 are unknown, which we will not discuss here. 12 / 35
13.
Kalman FilterThe unobserved state αt can be estimated from the observationswith the Kalman ﬁlter: vt = yt − Z t a t , Ft = Zt Pt Zt′ + Ht , Kt = Tt Pt Zt′ Ft−1 , at+1 = Tt at + Kt vt , Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ ,for t = 1, . . . , n and starting with given values for a1 and P1 . • Writing Yt = {y1 , . . . , yt }, at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ). 13 / 35
14.
Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y ), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then −1 E(x|y ) = µx + Σxy Σyy (y − µy ), Var(x|y ) = Σxx − Σxy Σyy Σ′ . −1 xyDeﬁne e = x − E(x|y ). Then −1 Var(e) = Var([x − µx ] − Σxy Σyy [y − µy ]) = Σxx − Σxy Σyy Σ′ −1 xy = Var(x|y ),and −1 Cov(e, y ) = Cov([x − µx ] − Σxy Σyy [y − µy ], y ) = Σxy − Σxy = 0. 14 / 35
15.
Lemma II : an extensionThe proof of the Kalman ﬁlter uses a slight extension of the lemmafor multivariate Normal regression theory.Suppose x, y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then −1 E(x|y , z) = E(x|y ) + Σxz Σzz z, Var(x|y , z) = Var(x|y ) − Σxz Σzz Σ′ , −1 xzCan you give a proof of Lemma II ?Hint : apply Lemma I for x and y ∗ = (y ′ , z ′ )′ . 15 / 35
16.
Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • Writing Yt = {y1 , . . . , yt }, deﬁne at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ); • The prediction error is vt = yt − E(yt |Yt−1 ) = yt − E(Zt αt + εt |Yt−1 ) = yt − Zt E(αt |Yt−1 ) = yt − Z t a t ; • It follows that vt = Zt (αt − at ) + εt and E(vt ) = 0; • The prediction error variance is Ft = Var(vt ) = Zt Pt Zt′ + Ht . 16 / 35
17.
Kalman Filter : derivationState space model: αt+1 = Tt αt + Rt ζt , yt = Zt αt + εt . • We have Yt = {Yt−1 , yt } = {Yt−1 , vt } and E(vt yt−j ) = 0 for j = 1, . . . , t − 1; • Lemma E(x|y , z) = E(x|y ) + Σxz Σ−1 z, and take x = αt+1 , zz y = Yt−1 and z = vt = Zt (αt − at ) + εt ; • It follows that E(αt+1 |Yt−1 ) = Tt at ; • Further, E(αt+1 vt′ ) = Tt E(αt vt′ ) + Rt E(ζt vt′ ) = Tt Pt Zt′ ; • We carry out lemma and obtain the state update at+1 = E(αt+1 |Yt−1 , yt ) = Tt at + Tt Pt Zt′ Ft−1 vt = Tt at + Kt vt ; with Kt = Tt Pt Zt′ Ft−1 17 / 35
18.
Kalman Filter : derivation• Our best prediction of yt is Zt at . When the actual observation arrives, calculate the prediction error vt = yt − Zt at and its variance Ft = Zt Pt Zt′ + Ht . The new best estimates of the state mean is based on both the old estimate at and the new information vt : at+1 = Tt at + Kt vt , similarly for the variance: Pt+1 = Tt Pt Tt′ + Rt Qt Rt′ − Kt Ft Kt′ . Can you derive the updating equation for Pt+1 ?• The Kalman gain Kt = Tt Pt Zt′ Ft−1 is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman ﬁlter for the Local Level Model (DK, Chapter 2). 18 / 35
20.
Prediction and FilteringThis version of the Kalman ﬁlter computes the predictions of thestates directly: at+1 = E(αt+1 |Yt ), Pt+1 = Var(αt+1 |Yt ).The ﬁltered estimates are deﬁned by at|t = E(αt |Yt ), Pt|t = Var(αt |Yt ),for t = 1, . . . , n.The Kalman ﬁlter can be “re-ordered” to compute both: ′ at|t = at + Mt vt , Pt|t = Pt − Mt Ft Mt , at+1 = Tt at|t , Pt+1 = Tt Pt|t Tt′ + Rt Qt Rt′ ,with Mt = Pt Zt′ Ft−1 for t = 1, . . . , n and starting with a1 and P1 .Compare Mt with Kt . Verify this version of KF using earlierderivations. 20 / 35
21.
Smoothing • The ﬁlter calculates the mean and variance conditional on Yt ; • The Kalman smoother calculates the mean and variance conditional on the full set of observations Yn ; • After the ﬁltered estimates are calculated, the smoothing recursion starts at the last observations and runs until the ﬁrst. αt = E(αt |Yn ), ˆ Vt = Var(αt |Yn ), rt = weighted sum of future innovations, Nt = Var(rt ), Lt = Tt − Kt Zt .Starting with rn = 0, Nn = 0, the smoothing recursions are givenby rt−1 = Ft−1 vt + Lt rt , Nt−1 = Ft−1 + L′ Nt Lt , t αt = at + Pt rt−1 , ˆ Vt = Pt − Pt Nt−1 Pt . 21 / 35
24.
Missing ObservationsMissing observations are very easy to handle in Kalman ﬁltering: • suppose yj is missing • put vj = 0, Kj = 0 and Fj = ∞ in the algorithm • proceed further calculations as normalThe ﬁlter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations. 24 / 35
27.
ForecastingForecasting requires no extra theory: just treat future observationsas missing: • put vj = 0, Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k • proceed further calculations as normal • forecast for yj is Zj aj 27 / 35
29.
Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is n log L = log p(yt |Yt−1 ). t=1In the state space model, p(yt |Yt−1 ) is a Gaussian density withmean at and variance Ft : n nN 1 log L = − log 2π − log |Ft | + vt′ Ft−1 vt , 2 2 t=1with vt and Ft from the Kalman ﬁlter. This is called the predictionerror decomposition of the likelihood. Estimation proceeds bynumerically maximising log L. 29 / 35
30.
ML Estimate of Nile Data1400 observation level (q = 0) level (q = 1000) level (q = 0.0973 = ML estimate)1300120011001000900800700600500 1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 30 / 35
31.
Diagnostics• Null hypothesis: standardised residuals √ vt / F t ∼ NID(0, 1)• Apply standard test for Normality, heteroskedasticity, serial correlation;• A recursive algorithm is available to calculate smoothed disturbances (auxilliary residuals), which can be used to detect breaks and outliers;• Model comparison and parameter restrictions: use likelihood based procedures (LR test, AIC, BIC). 31 / 35
32.
Nile Data Residuals Diagnostics Correlogram 1.0 Residual Nile Residual Nile 2 0.5 0 0.0 −0.5−2 1880 1900 1920 1940 1960 0 5 10 15 20 QQ plot0.5 N(s=0.996) normal 20.40.3 00.20.1 −2 −4 −3 −2 −1 0 1 2 3 4 −2 −1 0 1 2 32 / 35
33.
Assignment 1 - Exercises 1 and 21. Formulate the following models in state space form: • ARMA(3,1) process; • ARMA(1,3) process; • ARIMA(3,1,1) process; • AR(3) plus noise model; • Random Walk plus AR(3) process. Please discuss the initial conditions for αt in all cases.2. Derive a Kalman ﬁlter for the local level model 2 2 yt = µ t + ξ t , ξt ∼ N(0, σξ ), ∆µt+1 = ηt ∼ N(0, ση ), with E (ξt ηt ) = σξη = 0 and E (ξt ηs ) = 0 for all t, s and t = s. Also discuss the problem of missing obervations in this case. 33 / 35
34.
Assignment 1 - Exercise 33.Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way. • How would you introduce Xt as regression eﬀects in the state space model ? • Can you allow the transition matrix Tt to depend on Xt too ? • Can you allow the transition matrix Tt to depend on yt ?For the last two questions, please also consider maximumlikelihood estimation of coeﬃcients within the system matrices. 34 / 35
35.
Assignment 1 - Computing work• Consider Chapter 2 of Durbin and Koopman book.• There are 8 ﬁgures in Chapter 2.• Write computer code that can reproduce all these ﬁgures in Chapter 2, except Fig. 2.4.• Write a short documentation for the Ox program. 35 / 35
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment