Upcoming SlideShare
×

# Bayesian Core: Chapter 7

1,447 views

Published on

These are the slides of Chapter 7 of Bayesian Core (2007) on Bayesian time-series.

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,447
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
68
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Bayesian Core: Chapter 7

1. 1. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dynamic models 6 Dynamic models Dependent data The AR(p) model The MA(q) model Hidden Markov models
2. 2. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Dependent data Huge portion of real-life data involving dependent datapoints Example (Capture-recapture) capture histories capture sizes
3. 3. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Eurostoxx 50 First four stock indices of of the ﬁnancial index Eurostoxx 50 ABN AMRO HOLDING AEGON 30 50 25 40 30 20 20 15 10 10 0 500 1000 1500 0 500 1000 1500 t t AHOLD KON. AIR LIQUIDE 160 150 30 140 130 20 120 110 10 100 0 500 1000 1500 0 500 1000 1500 t t
4. 4. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Markov chain Stochastic process (xt )t∈T where distribution of xt given the past values x0:(t−1) only depends on xt−1 . Homogeneity: distribution of xt given the past constant in t ∈ T . Corresponding likelihood T ℓ(θ|x0:T ) = f0 (x0 |θ) f (xt |xt−1 , θ) t=1 [Homogeneity means f independent of t]
5. 5. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Stationarity constraints Diﬀerence with the independent case: stationarity and causality constraints put restrictions on the parameter space
6. 6. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Stationarity processes Deﬁnition (Stationary stochastic process) (xt )t∈T is stationary if the joint distributions of (x1 , . . . , xk ) and (x1+h , . . . , xk+h ) are the same for all h, k’s. It is second-order stationary if, given the autocovariance function γx (r, s) = E[{xr − E(xr )}{xs − E(xs )}], r, s ∈ T , then E(xt ) = µ and γx (r, s) = γx (r + t, s + t) ≡ γx (r − s) for all r, s, t ∈ T .
7. 7. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Imposing or not imposing stationarity Bayesian inference on a non-stationary process can be [formaly] conducted Debate From a Bayesian point of view, to impose the stationarity condition is objectionable:stationarity requirement on ﬁnite datasets artiﬁcial and/or datasets themselves should indicate whether the model is stationary Reasons for imposing stationarity:asymptotics (Bayes estimators are not necessarily convergent in non-stationary settings) causality, identiﬁability and ... common practice.
8. 8. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Dependent data Unknown stationarity constraints Practical diﬃculty: for complex models, stationarity constraints get quite involved to the point of being unknown in some cases
9. 9. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model The AR(1) model Case of linear Markovian dependence on the last value i.i.d. xt = µ + ̺(xt−1 − µ) + ǫt , ǫt ∼ N (0, σ 2 ) If |̺| < 1, (xt )t∈Z can be written as ∞ xt = µ + ̺j ǫt−j j=0 and this is a stationary representation.
10. 10. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationary but... If |̺| > 1, alternative stationary representation ∞ xt = µ − ̺−j ǫt+j . j=1 This stationary solution is criticized as artiﬁcial because xt is correlated with future white noises (ǫt )s>t , unlike the case when |̺| < 1. Non-causal representation...
11. 11. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Standard constraint c Customary to restrict AR(1) processes to the case |̺| < 1 Thus use of a uniform prior on [−1, 1] for ̺ Exclusion of the case |̺| = 1 that leads to a random walk because the process is then a random walk [no stationary solution]
12. 12. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model The AR(p) model Conditional model p xt |xt−1 , . . . ∼ N µ+ ̺i (xt−i − µ), σ 2 i=1 Generalisation of AR(1) Among the most commonly used models in dynamic settings More challenging than the static models (stationarity constraints) Diﬀerent models depending on the processing of the starting value x0
13. 13. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity+causality Stationarity constraints in the prior as a restriction on the values of θ. Theorem AR(p) model second-order stationary and causal iﬀ the roots of the polynomial p P(x) = 1 − ̺i xi i=1 are all outside the unit circle
14. 14. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Initial conditions Unobserved initial values can be processed in various ways 1 All x−i ’s (i > 0) set equal to µ, for computational convenience 2 Under stationarity and causality constraints, (xt )t∈Z has a stationary distribution: Assume x−p:−1 distributed from stationary Np (µ1p , A) distribution Corresponding marginal likelihood Z T ( p !2 ) −T Y −1 X σ exp xt − µ − ̺i (xt−i − µ) t=0 2σ 2 i=1 f (x−p:−1 |µ, A) dx−p:−1 ,
15. 15. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Initial conditions (cont’d) 3 Condition instead on the initial observed values x0:(p−1) ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) ) ∝   T  p 2  −T σ exp − xt − µ − ̺i (xt−i − µ) 2σ 2 .   t=p i=1
16. 16. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Prior selection For AR(1) model, Jeﬀreys’ prior associated with the stationary representation is 1 1 π1 (µ, σ 2 , ̺) ∝ J . σ2 1 − ̺2 Extension to higher orders quite complicated (̺ part)! Natural conjugate prior for θ = (µ, ̺1 , . . . , ̺p , σ 2 ) : normal distribution on (µ, ̺1 , . . . , ̺p ) and inverse gamma distribution on σ 2 ... and for constrained ̺’s?
17. 17. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity constraints Under stationarity constraints, complex parameter space: each value of ̺ needs to be checked for roots of corresponding polynomial with modulus less than 1 E.g., for an AR(2) process with autoregressive polynomial P(u) = 1 − ̺1 u − ̺2 u2 , constraint is ̺1 + ̺2 < 1, ̺1 − ̺2 < 1 and |̺2 | < 1 . Skip Durbin forward
18. 18. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model A ﬁrst useful reparameterisation Durbin–Levinson recursion proposes a reparametrisation from the parameters ̺i to the partial autocorrelations ψi ∈ [−1, 1] which allow for a uniform prior on the hypercube. Partial autocorrelation deﬁned as ψi = corr (xt − E[xt |xt+1 , . . . , xt+i−1 ], xt+i − E[xt+1 |xt+1 , . . . , xt+i−1 ]) [see also Yule-Walker equations]
19. 19. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Durbin–Levinson recursion Transform 1 Deﬁne ϕii = ψi and ϕij = ϕ(i−1)j − ψi ϕ(i−1)(i−j) , for i > 1 and j = 1, · · · , i − 1 . 2 Take ̺i = ϕpi for i = 1, · · · , p.
20. 20. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Stationarity & priors For AR(1) model, Jeﬀreys’ prior associated with the stationary representation is 1 1 π1 (µ, σ 2 , ̺) ∝ J . σ2 1 − ̺2 Within the non-stationary region |̺| > 1, Jeﬀreys’ prior is 1 1 1 − ̺2T π2 (µ, σ 2 , ̺) ∝ J 1− . σ2 |1 − ̺2 | T (1 − ̺2 ) The dominant part of the prior is the non-stationary region!
21. 21. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Alternative prior J The reference prior π1 is only deﬁned when the stationary constraint holds. Idea Symmetrise to the region |̺| > 1 1 1/ 1 − ̺2 if |̺| < 1, π B (µ, σ 2 , ̺) ∝ , σ2 1/|̺| ̺2 − 1 if |̺| > 1, 7 6 5 4 pi 3 2 1 0 −3 −2 −1 0 1 2 3 x
22. 22. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model MCMC consequences When devising an MCMC algorithm, use the Durbin-Levinson recursion to end up with single normal simulations of the ψi ’s since the ̺j ’s are linear functions of the ψi ’s
23. 23. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Root parameterisation Skip Durbin back Lag polynomial representation p Id − ̺i B i xt = ǫt i=1 with (inverse) roots p (Id − λi B) xt = ǫt i=1 Closed form expression of the likelihood as a function of the (inverse) roots
24. 24. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Uniform prior under stationarity Stationarity The λi ’s are within the unit circle if in C [complex numbers] and within [−1, 1] if in R [real numbers] Naturally associated with a ﬂat prior on either the unit circle or [−1, 1] 1 1 1 I|λi |<1 I ⌊k/2⌋ + 1 2 π |λi |<1 λi ∈R λi ∈R where ⌊k/2⌋ + 1 number of possible cases Term ⌊k/2⌋ + 1 is important for reversible jump applications
25. 25. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model MCMC consequences In a Gibbs sampler, each λi∗ can be simulated conditionaly on the others since p (Id − λi B) xt = yt − λi∗ yt−1 = ǫt i=1 where Yt = (Id − λi B) xt i=i∗
26. 26. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Metropolis-Hastings implementation 1 use the prior π itself as a proposal on the (inverse) roots of P, selecting one or several roots of P to be simulated from π; 2 acceptance ratio is likelihood ratio 3 need to watch out for real/complex dichotomy
27. 27. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model A [paradoxical] reversible jump implementation Deﬁne “model” M2k (0 ≤ k ≤ ⌊p/2⌋) as corresponding to a number 2k of complex roots o ≤ k ≤ ⌊p/2⌋) Moving from model M2k to model M2k+2 means that two real roots have been replaced by two conjugate complex roots. Propose jump from M2k to M2k+2 with probability 1/2 and from M2k to M2k−2 with probability 1/2 [boundary exceptions] accept move from M2k to M2k+ or −2 with probability ℓc (µ, ̺⋆ , . . . , ̺⋆ , σ|xp:T , x0:(p−1) ) 1 p ∧ 1, ℓc (µ, ̺1 , . . . , ̺p , σ|xp:T , x0:(p−1) )
28. 28. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Checking your code Try with no data and recover the prior 0.15 0.10 0.05 0.00 0 5 10 15 20 k
29. 29. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Checking your code Try with no data and recover the prior 1.0 0.5 0.0 λ2 −0.5 −1.0 −1.0 −0.5 0.0 0.5 1.0 λ1
30. 30. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Order estimation Typical setting for model choice: determine order p of AR(p) model Roots [may] change drastically from one p to the other. No diﬃculty from the previous perspective: recycle above reversible jump algorithm
31. 31. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model AR(?) reversible jump algorithm Use (purely birth-and-death) proposals based on the uniform prior k → k+1 [Creation of real root] k → k+2 [Creation of complex root] k → k-1 [Deletion of real root] k → k-2 [Deletion of complex root]
32. 32. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The AR(p) model Reversible jump output 0.0 0.1 0.2 0.3 0.4 0.5 4 5 4 2 3 xt 0 2 −2 1 0 0 100 200 300 400 500 0 5 10 15 20 −0.4 −0.2 0.0 0.2 0.4 t p αi p=3 p=4 p=5 0 1 2 3 4 5 6 7 5 0 2 4 6 8 10 4 3 2 1 0 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 αi αi αi p=3 means mean 0.997 −0.0817 0.256 −0.412 0.0 0.2 0.4 0.6 8 6 0.2 Im(λi) α 4 −0.2 2 −0.4 −0.6 0 0.9 1.0 1.1 1.2 0 1000 2000 3000 4000 5000 −0.1 0.0 0.1 0.2 0.3 0.4 0.5 σ2 Iterations Re(λi) AR(3) simulated dataset of 530 points (upper left) with true parameters αi (−0.1, 0.3, −0.4) and σ = 1. First histogram associated with p, following histograms with the αi ’s, for diﬀerent values of p, and of σ 2 . Final graph: scatterplot of the complex roots. One before last: evolution of α1 , α2 , α3 .
33. 33. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model The MA(q) model Alternative type of time series q xt = µ + ǫt − ϑj ǫt−j , ǫt ∼ N (0, σ 2 ) j=1 Stationary but, for identiﬁability considerations, the polynomial q Q(x) = 1 − ϑj xj j=1 must have all its roots outside the unit circle.
34. 34. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Identiﬁability Example For the MA(1) model, xt = µ + ǫt − ϑ1 ǫt−1 , var(xt ) = (1 + ϑ2 )σ 2 1 can also be written 1 xt = µ + ǫt−1 − ˜ ǫ ˜t , ˜ ∼ N (0, ϑ2 σ 2 ) , ǫ 1 ϑ1 Both pairs (ϑ1 , σ) & (1/ϑ1 , ϑ1 σ) lead to alternative representations of the same model.
35. 35. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Properties of MA models Non-Markovian model (but special case of hidden Markov) Autocovariance γx (s) is null for |s| > q
36. 36. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations x1:T is a normal random variable with constant mean µ and covariance matrix   σ2 γ1 γ2 ... γq 0 ... 0 0  γ1 σ2 γ1 . . . γq−1 γq ... 0 0   Σ= .. ,  .  2 0 0 0 ... 0 0 . . . γ1 σ with (|s| ≤ q) q−|s| 2 γs = σ ϑi ϑi+|s| i=0 Not manageable in practice [large T’s]
37. 37. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations (contd.) Conditional on past (ǫ0 , . . . , ǫ−q+1 ), L(µ, ϑ1 , . . . , ϑq , σ|x1:T , ǫ0 , . . . , ǫ−q+1 ) ∝   2  T   q   σ −T exp − xt − µ + ϑj ˆt−j ǫ  2σ 2 ,     t=1 j=1 where (t > 0) q ǫ t = xt − µ + ˆ ϑj ǫt−j , ˆ0 = ǫ0 , . . . , ǫ1−q = ǫ1−q ˆ ǫ ˆ j=1 Recursive deﬁnition of the likelihood, still costly O(T × q)
38. 38. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Recycling the AR algorithm Same algorithm as in the AR(p) case when modifying the likelihood Simulation of the past noises ǫ−i (i = 1, . . . , q) done via a Metropolis-Hastings step with target 0 T −ǫ2 /2σ2 2 2 f (ǫ0 , . . . , ǫ−q+1 |x1:T , µ, σ, ϑ) ∝ e i e−bt /2σ , ǫ i=−q+1 t=1
39. 39. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Representations (contd.) Encompassing approach for general time series models State-space representation xt = Gyt + εt , (1) yt+1 = F yt + ξt , (2) (1) is the observation equation and (2) is the state equation Note As seen below, this is a special case of hidden Markov model
40. 40. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model MA(q) state-space representation For the MA(q) model, take yt = (ǫt−q , . . . , ǫt−1 , ǫt )′ and then     0 1 0 ... 0 0 0  0  0 1 ... 0   yt+1 =  ...  yt + ǫt+1  .  .  0   . 0 0 ... 1 0 0 0 0 ... 0 1 xt = µ − ϑq ϑq−1 . . . ϑ1 −1 yt .
41. 41. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model MA(q) state-space representation (cont’d) Example For the MA(1) model, observation equation xt = (1 0)yt with yt = (y1t y2t )′ directed by the state equation 0 1 1 yt+1 = yt + ǫt+1 . 0 0 ϑ1
42. 42. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model ARMA extension ARMA(p, q) model p q xt − ̺i xt−1 = µ + ǫt − ϑj ǫt−j , ǫt ∼ N (0, σ 2 ) i=1 j=1 Identical stationarity and identiﬁability conditions for both groups (̺1 , . . . , ̺p ) and (ϑ1 , . . . , ϑq )
43. 43. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Reparameterisation Identical root representations p q (Id − λi B)xt = (Id − ηi B)ǫt i=1 i=1 State-space representation xt = xt = µ − ϑr−1 ϑr−2 . . . ϑ1 −1 yt and 0 1 0 1 0 0 1 0 ... 0 B0 B0C B 0 1 ... 0C C B C yt+1 =B ... C yt + ǫt+1 B . C , B.C B C B.C @0 0 0 ... 1A @0A ̺r ̺r−1 ̺r−2 ... ̺1 1 under the convention that ̺m = 0 if m > p and ϑm = 0 if m > q.
44. 44. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models The MA(q) model Bayesian approximation Quasi-identical MCMC implementation: 1 Simulate (̺1 , . . . , ̺p ) conditional on (ϑ1 , . . . , ϑq ) and µ 2 Simulate (ϑ1 , . . . , ϑq ) conditional on (̺1 , . . . , ̺p ) and µ 3 Simulate (µ, σ) conditional on (̺1 , . . . , ̺p ) and (ϑ1 , . . . , ϑq ) c Code can be recycled almost as is!
45. 45. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Hidden Markov models Generalisation both of a mixture and of a state space model. Example Extension of a mixture model with Markov dependence 2 xt |z, xj j = t ∼ N (µzt , σzt ), P (zt = u|zj , j < t) = pzt−1u , (u = 1, . . . , k) Label switching also strikes in this model!
46. 46. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Generic dependence graph xt xt+1 6 6 ··· - - ··· yt - yt+1 (xt , yt )|x0:(t−1) , y0:(t−1) ∼ f (yt |yt−1 ) f (xt |yt )
47. 47. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Deﬁnition Observable series {xt }t≥1 associated with a second process {yt }t≥1 , with a ﬁnite set of N possible values such that 1. indicators Yt have an homogeneous Markov dynamic p(yt |y1:t−1 ) = p(yt |yt−1 ) = Pyt−1 yt where y1:t−1 denotes the sequence {y1 , y2 , . . . , yt−1 }. 2. Observables xt are independent conditionally on the indicators yt T p(x1:T |y1:T ) = p(xt |yt ) t=1
48. 48. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Dnadataset DNA sequence [made of A, C, G, and T’s] corresponding to a complete HIV genome where A, C, G, and T have been recoded as 1, ..., 4. Possible modeling by a two-state hidden Markov model with Y = {1, 2} and X = {1, 2, 3, 4}
49. 49. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Parameterization For the Markov bit, transition matrix N P = [pij ] where pij = 1 j=1 and initial distribution ̺ = ̺P for the observables, fi (xt ) = p(xt |yt = i) = f (xt |θi ) usually within the same parametrized class of distributions.
50. 50. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Finite case When both hidden and observed chains are ﬁnite, with Y = {1, . . . , κ} and X = {1, . . . , k}, parameter θ made up of p probability vectors q1 = (q1 , . . . , qk ), . . . , qκ = (q1 , . . . , qk ) 1 1 κ κ Joint distribution of (xt , yt )0≤t≤T T y y ̺y0 qx00 pyt−1 yt qxt , t t=1
51. 51. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Bayesian inference in the ﬁnite case Posterior of (θ, P) given (xt , yt )t factorizes as κ k κ p m π(θ, P) ̺y0 (qj )nij i × pij ij , i=1 j=1 i=1 j=1 where nij # of visits to state j by the xt ’s when the corresponding yt ’s are equal to i and mij # of transitions from state i to state j on the hidden chain (yt )t∈N i Under a ﬂat prior on pij ’s and qj ’s, posterior distributions are [almost] Dirichlet [initial distribution side eﬀect]
52. 52. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models MCMC implementation Finite State HMM Gibbs Sampler Initialization: i 1 Generate random values of the pij ’s and of the qj ’s 2 Generate the hidden Markov chain (yt )0≤t≤T by (i = 1, 2) i pii qx0 if t = 0 , P(yt = i) ∝ i pyt−1 i qxt if t 0 , and compute the corresponding suﬃcient statistics
53. 53. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models MCMC implementation (cont’d) Finite State HMM Gibbs Sampler Iteration m (m ≥ 1): 1 Generate (pi1 , . . . , piκ ) ∼ D(1 + ni1 , . . . , 1 + niκ ) i i (q1 , . . . , qk ) ∼ D(1 + mi1 , . . . , 1 + mik ) and correct for missing initial probability by a MH step with acceptance probability ̺′ 0 /̺y0 y 2 Generate successively each yt (0 ≤ t ≤ T ) by i pii qx1 piy1 if t = 0 , P(yt = i|xt , yt−1 , yt+1 ) ∝ i pyt−1 i qxt piyt+1 if t 0 , and compute corresponding suﬃcient statistics
54. 54. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Dnadataset 0.9 0.85 0.8 0.80 0.7 0.75 0.70 0.6 0.65 0.5 0 200 400 600 800 1000 0 200 400 600 800 1000 p11 p22 0.00 0.05 0.10 0.15 0.20 0.25 0.40 0.35 0.30 0 200 400 600 800 1000 0 200 400 600 800 1000 1 q1 1 q2 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.40 0.35 0.30 0.25 0 200 400 600 800 1000 0 200 400 600 800 1000 q1 3 q2 1 0.5 0.25 0.20 0.4 0.15 0.3 0.10 0.2 0.05 0.00 0.1 0 200 400 600 800 1000 0 200 400 600 800 1000 2 2 q3 q2
55. 55. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward-Backward formulae Existence of a (magical) recurrence relation that provides the observed likelihood function in manageable computing time Called forward-backward or Baum–Welch formulas
56. 56. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Observed likelihood computation Likelihood of the complete model simple: T ℓc (θ|x, y) = pyt−1 yt f (xt |θyt ) t=2 but likelihood of the observed model is not: ℓ(θ|x) = ℓc (θ|x, y) y∈{1,...,κ}T c O(κT ) complexity
57. 57. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward-Backward paradox It is possible to express the (observed) likelihood LO (θ|x) in O(T 2 × κ) computations, based on the Markov property of the pair (xt , yt ). Direct to backward smoothing
58. 58. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Conditional distributions We have f (xt |yt ) p(y1:t |x1:(t−1) ) p(y1:t |x1:t ) = p(xt |x1:(t−1) ) [Smoothing/Bayes] and p(y1:t |x1:(t−1) ) = k(yt |yt−1 )p(y1:(t−1) |x1:(t−1) ) [Prediction] where k(yt |yt−1 ) = pyt−1 yt associated with the matrix P and f (xt |yt ) = f (xt |θyt )
59. 59. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Update of predictive Therefore p(yt |x1:(t−1) ) f (xt |yt ) p(y1:t |x1:t ) = p(xt |x1:(t−1) ) f (xt |yt ) k(yt |yt−1 ) = p(y1:(t−1) |x1:(t−1) ) p(xt |x1:(t−1) ) with the same order of complexity for p(y1:t |x1:t ) as for p(xt |x1:(t−1) )
60. 60. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Propagation and actualization equations p(yt |x1:(t−1) ) = p(y1:(t−1) |x1:(t−1) ) k(yt |yt−1 ) y1:(t−1) [Propagation] and p(yt |x1:(t−1) ) f (xt |yt ) p(yt |x1:t ) = . p(xt |x1:(t−1) ) [Actualization]
61. 61. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Forward–backward equations (1) Evaluation of p(yt |x1:T ) t ≤ T by forward-backward algorithm Denote t ≤ T γt (i) = P (yt = i|x1:T ) αt (i) = p(x1:t , yt = i) βt (i) = p(xt+1:T |yt = i)
62. 62. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Recurrence relations Then   α1 (i)  = f (x1 |yt = i)̺i κ  αt+1 (j)  = f (xt+1 |yt+1 = j) αt (i)pij i=1 [Forward]   βT (i) =  1 κ  βt (i)  = pij f (xt+1 |yt+1 = j)βt+1 (j) j=1 [Backward] and αt (i)βt (i) γt (i) = κ αt (j)βt (j) j=1
63. 63. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Extension of the recurrence relations For ξt (i, j) = P (yt = i, yt+1 = j|x1:T ) i, j = 1, . . . , κ, we also have αt (i)Pij f (xt+1 |yt = j)βt+1 (j) ξt (i, j) = κ κ αt (i)Pij f (xt+1 |yt+1 = j)βt+1 (j) i=1 j=1
64. 64. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Overﬂows and underﬂows On-line scalings of the αt (i)’s and βT (i)’s for each t by κ κ ct = 1 αt (i) and dt = 1 βt (i) i=1 i=1 avoid overﬂows or/and underﬂows for large datasets
65. 65. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Backward smoothing Recursive derivation of conditionals We have p(ys |ys−1 , x1:t ) = p(ys |ys−1 , xs:t ) [Markov property!] Therefore (s = T, T − 1, . . . , 1) p(ys |ys−1 , x1:T ) ∝ k(ys |ys−1 ) f (xs |ys ) p(ys+1 |ys , x1:T ) ys+1 [Backward equation] with p(yT |yT −1 , x1:T ) ∝ k(yT |yT −1 )f (xT |yt ) .
66. 66. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models End of the backward smoothing The ﬁrst term is p(y1 |x1:t ) ∝ π(y1 ) f (x1 |y1 ) p(y2 |y1 , x1:t ) , y2 with π stationary distribution of P The conditional for ys needs to be deﬁned for each of the κ values of ys−1 c O(t × κ2 ) operations
67. 67. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Details Need to introduce unnormalized version of the conditionals p(yt |yt−1 , x0:T ) such that p⋆ (yT |yT −1 , x0:T ) = pyT −1 yT f (xT |yT ) T κ p⋆ (yt |yt−1 , x1:T ) t = pyt−1 yt f (xt |yt ) p⋆ (i|yt , x1:T ) t+1 i=1 κ p⋆ (y0 |x0:T ) = ̺y0 f (x0 |y0 ) 0 p⋆ (i|y0 , x0:t ) 1 i=1
68. 68. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Likelihood computation Bayes formula p(x1:T |y1:T )p(y1:T ) p(x1:T ) = p(y1:T |x1:T ) gives a representation of the likelihood based on the forward–backward formulae and an arbitrary sequence xo (since 1:T the l.h.s. does not depend on x1:T ). Obtained through the p⋆ ’s as t κ p(x0:T ) = p⋆ (i|x0:T ) 1 i=1
69. 69. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Prediction ﬁlter If ϕt (i) = p(yt = i|x1:t−1 ) Forward equations ϕ1 (j) = p(y1 = j) κ 1 ϕt+1 (j) = f (xt |yt = i)ϕt (i)pij (t ≥ 1) ct i=1 where κ ct = f (xt |yt = k)ϕt (k) , k=1
70. 70. Bayesian Core:A Practical Approach to Computational Bayesian Statistics Dynamic models Hidden Markov models Likelihood computation (2) Follows the same principle as the backward equations The (log-)likelihood is thus t κ log p(x1:t ) = log p(xt , yt = i|x1:(r−1) ) r=1 i=1 t κ = log f (xt |yt = i)ϕt (i) r=1 i=1