Upcoming SlideShare
×

# Monash University short course, part I

3,733 views

Published on

First part of slides for a short course in Monash, EBS, on July 19, 2012

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
3,733
On SlideShare
0
From Embeds
0
Number of Embeds
2,418
Actions
Shares
0
82
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Monash University short course, part I

1. 1. MCMC and likelihood-free methods Part/day I: Markov chain methods MCMC and likelihood-free methods Part/day I: Markov chain methods Christian P. Robert Universit´ Paris-Dauphine, IUF, & CREST e Monash University, EBS, July 18, 2012
2. 2. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statisticsMotivations and leading example Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm The Gibbs Sampler Population Monte Carlo
3. 3. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveWhat is Bayesian statistics? Statistical model deﬁned by a likelihood function f (x1 , . . . , xn |θ) = L(θ|x1 , . . . , xn ) [inversion of what varies]
4. 4. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveWhat is Bayesian statistics? Statistical model deﬁned by a likelihood function f (x1 , . . . , xn |θ) = L(θ|x1 , . . . , xn ) [inversion of what varies] Bayesian approach turns the likelihood into a conditional density: π(θ|x1 , . . . , xn ) ∝ π(θ)L(θ|x1 , . . . , xn ) using a reference measure (or a prior) π(θ) [Thomas Bayes, 1701–1761]
5. 5. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveWhat is Bayesian statistics? Statistical model deﬁned by a likelihood function f (x1 , . . . , xn |θ) = L(θ|x1 , . . . , xn ) [inversion of what varies] Bayesian approach turns the likelihood into a conditional density: π(θ|x1 , . . . , xn ) ∝ π(θ)L(θ|x1 , . . . , xn ) using a reference measure (or a prior) π(θ) [Thomas Bayes, 1701–1761]
6. 6. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveNew perspective Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution
7. 7. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveNew perspective Uncertainty on the parameters θ of a model modeled through a probability distribution π on Θ, called prior distribution Inference processed through distribution of θ conditional on x, π(θ|x), called posterior distribution f (x|θ)π(θ) π(θ|x) = . f (x|θ)π(θ) dθ
8. 8. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectiveJustiﬁcations Semantic drift from unknown to random Actualization of the information on θ by extracting the information on θ contained in the observation x Allows incorporation of imperfect information in the decision process Unique mathematical way to condition upon the observations (conditional perspective) Penalization factor
9. 9. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectivePosterior distribution π(θ|x) central to Bayesian inference Operates conditional upon the observation s
10. 10. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectivePosterior distribution π(θ|x) central to Bayesian inference Operates conditional upon the observation s Incorporates the requirement of the Likelihood Principle
11. 11. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectivePosterior distribution π(θ|x) central to Bayesian inference Operates conditional upon the observation s Incorporates the requirement of the Likelihood Principle Avoids averaging over the unobserved values of x
12. 12. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectivePosterior distribution π(θ|x) central to Bayesian inference Operates conditional upon the observation s Incorporates the requirement of the Likelihood Principle Avoids averaging over the unobserved values of x Coherent updating of the information available on θ, independent of the order in which i.i.d. observations are collected
13. 13. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics abc of Bayesian perspectivePosterior distribution π(θ|x) central to Bayesian inference Operates conditional upon the observation s Incorporates the requirement of the Likelihood Principle Avoids averaging over the unobserved values of x Coherent updating of the information available on θ, independent of the order in which i.i.d. observations are collected Provides a complete inferential scope
14. 14. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesLatent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f (x, x |θ) dx
15. 15. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesLatent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f (x, x |θ) dx If (x, x ) observed, ﬁne!
16. 16. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesLatent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f (x, x |θ) dx If (x, x ) observed, ﬁne! If only x observed, trouble!
17. 17. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesexample: mixture models Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) .
18. 18. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesexample: mixture models Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) . For a sample of independent random variables (X1 , · · · , Xn ), sample density n {p1 f1 (xi ) + · · · + pk fk (xi )} . i=1
19. 19. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesexample: mixture models Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) . n {p1 f1 (xi ) + · · · + pk fk (xi )} . i=1 Expanding this product of sums into a sum of products involves k n elementary terms: too prohibitive to compute in large samples.
20. 20. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesSimple mixture (1) 3 2 µ2 1 0 −1 −1 0 1 2 3 µ1 Case of the 0.3N (µ1 , 1) + 0.7N (µ2 , 1) likelihood
21. 21. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesSimple mixture (2) For mixture of two normal distributions, 0.3N (µ1 , 1) + 0.7N (µ2 , 1) , likelihood proportional to n [0.3ϕ (xi − µ1 ) + 0.7 ϕ (xi − µ2 )] i=1 containing 2n terms.
22. 22. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesComplex maximisation Standard maximization techniques often fail to ﬁnd the global maximum because of multimodality or undesirable behavior (usually at the frontier of the domain) of the likelihood function. Example In the special case f (x|µ, σ) = (1 − ) exp{(−1/2)x2 } + exp{(−1/2σ 2 )(x − µ)2 } σ with > 0 known,
23. 23. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesComplex maximisation Standard maximization techniques often fail to ﬁnd the global maximum because of multimodality or undesirable behavior (usually at the frontier of the domain) of the likelihood function. Example In the special case f (x|µ, σ) = (1 − ) exp{(−1/2)x2 } + exp{(−1/2σ 2 )(x − µ)2 } σ with > 0 known, whatever n, the likelihood is unbounded: lim L(x1 , . . . , xn |µ = x1 , σ) = ∞ σ→0
24. 24. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesUnbounded likelihood n= 3 n= 6 4 4 3 3 σ 2 2 1 1 −2 0 2 4 6 −2 0 2 4 6 µ µ n= 12 n= 24 4 4 3 3 σ 2 2 1 1 −2 0 2 4 6 −2 0 2 4 6 µ µ n= 48 n= 96 4 4 3 3 σ 2 2 1 1 −2 0 2 4 6 −2 0 2 4 6 µ µ Case of the 0.3N (0, 1) + 0.7N (µ, σ) likelihood
25. 25. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again press for MA Observations from x1 , . . . , xn ∼ f (x|θ) = pϕ(x; µ1 , σ1 ) + (1 − p)ϕ(x; µ2 , σ2 )
26. 26. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again press for MA Observations from x1 , . . . , xn ∼ f (x|θ) = pϕ(x; µ1 , σ1 ) + (1 − p)ϕ(x; µ2 , σ2 ) Prior µi |σi ∼ N (ξi , σi /ni ), 2 σi ∼ I G (νi /2, s2 /2), 2 i p ∼ Be(α, β)
27. 27. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again press for MA Observations from x1 , . . . , xn ∼ f (x|θ) = pϕ(x; µ1 , σ1 ) + (1 − p)ϕ(x; µ2 , σ2 ) Prior µi |σi ∼ N (ξi , σi /ni ), 2 σi ∼ I G (νi /2, s2 /2), 2 i p ∼ Be(α, β) Posterior n π(θ|x1 , . . . , xn ) ∝ {pϕ(xj ; µ1 , σ1 ) + (1 − p)ϕ(xj ; µ2 , σ2 )} π(θ) j=1 n = ω(kt )π(θ|(kt )) =0 (kt ) [O(2n )]
28. 28. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again (cont’d) For a given permutation (kt ), conditional posterior distribution 2 σ1 π(θ|(kt )) = N ξ1 (kt ), × I G ((ν1 + )/2, s1 (kt )/2) n1 + 2 σ2 ×N ξ2 (kt ), × I G ((ν2 + n − )/2, s2 (kt )/2) n2 + n − ×Be(α + , β + n − )
29. 29. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again (cont’d) where 1 2 x1 (kt ) = ¯ t=1 xkt , s1 (kt ) = ˆ t=1 (xkt − x1 (kt )) , ¯ 1 n n 2 x2 (kt ) = ¯ n− t= +1 xkt , s2 (kt ) = ˆ t= +1 (xkt − x2 (kt )) ¯ and n1 ξ1 + x1 (kt ) ¯ n2 ξ2 + (n − )¯2 (kt ) x ξ1 (kt ) = , ξ2 (kt ) = , n1 + n2 + n − n1 s1 (kt ) = s2 + s2 (kt ) + 1 ˆ1 (ξ1 − x1 (kt ))2 , ¯ n1 + n2 (n − ) s2 (kt ) = s2 + s2 (kt ) + 2 ˆ2 (ξ2 − x2 (kt ))2 , ¯ n2 + n − posterior updates of the hyperparameters
30. 30. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Latent variablesMixture once again Bayes estimator of θ: n δ π (x1 , . . . , xn ) = ω(kt )Eπ [θ|x, (kt )] =0 (kt ) Too costly: 2n terms
31. 31. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelAR(p) model Auto-regressive representation of a time series, p xt |xt−1 , . . . ∼ N µ+ i (xt−i − µ), σ 2 i=1
32. 32. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelAR(p) model Auto-regressive representation of a time series, p xt |xt−1 , . . . ∼ N µ+ i (xt−i − µ), σ 2 i=1 Generalisation of AR(1) Among the most commonly used models in dynamic settings More challenging than the static models (stationarity constraints) Diﬀerent models depending on the processing of the starting value x0
33. 33. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelUnwieldy stationarity constraints Practical diﬃculty: for complex models, stationarity constraints get quite involved to the point of being unknown in some cases Example (AR(1)) Case of linear Markovian dependence on the last value i.i.d. xt = µ + (xt−1 − µ) + t, t ∼ N (0, σ 2 ) If | | < 1, (xt )t∈Z can be written as ∞ j xt = µ + t−j j=0 and this is a stationary representation.
34. 34. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelStationary but... If | | > 1, alternative stationary representation ∞ −j xt = µ − t+j . j=1
35. 35. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelStationary but... If | | > 1, alternative stationary representation ∞ −j xt = µ − t+j . j=1 This stationary solution is criticized as artiﬁcial because xt is correlated with future white noises ( t )s>t , unlike the case when | | < 1. Non-causal representation...
36. 36. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelStationarity+causality Stationarity constraints in the prior as a restriction on the values of θ. Theorem AR(p) model second-order stationary and causal iﬀ the roots of the polynomial p P(x) = 1 − ix i i=1 are all outside the unit circle
37. 37. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelStationarity constraints Under stationarity constraints, complex parameter space: each value of needs to be checked for roots of corresponding polynomial with modulus less than 1
38. 38. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The AR(p) modelStationarity constraints Under stationarity constraints, complex parameter space: each value of needs to be checked for roots of corresponding polynomial with modulus less than 1 E.g., for an AR(2) process with 1.0 autoregressive polynomial 0.5 P(u) = 1 − 1 u − 2 u2 , constraint is 0.0 θ2 q 1 + 2 < 1, 1 − 2 <1 −0.5 −1.0 and | 2 | < 1 −2 −1 0 1 2 θ1
39. 39. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelThe MA(q) model Alternative type of time series q xt = µ + t − ϑj t−j , t ∼ N (0, σ 2 ) j=1 Stationary but, for identiﬁability considerations, the polynomial q Q(x) = 1 − ϑj xj j=1 must have all its roots outside the unit circle
40. 40. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelIdentiﬁability Example For the MA(1) model, xt = µ + t − ϑ1 t−1 , var(xt ) = (1 + ϑ2 )σ 2 1 can also be written 1 xt = µ + ˜t−1 − ˜t , ˜ ∼ N (0, ϑ2 σ 2 ) , 1 ϑ1 Both pairs (ϑ1 , σ) & (1/ϑ1 , ϑ1 σ) lead to alternative representations of the same model.
41. 41. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelProperties of MA models Non-Markovian model (but special case of hidden Markov) Autocovariance γx (s) is null for |s| > q
42. 42. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelRepresentations x1:T is a normal random variable with constant mean µ and covariance matrix σ2   γ1 γ2 ... γq 0 ... 0 0  γ1 σ2 γ1 . . . γq−1 γq ... 0 0 Σ= ,   ..  .  2 0 0 0 ... 0 0 ... γ1 σ with (|s| ≤ q) q−|s| 2 γs = σ ϑi ϑi+|s| i=0 Not manageable in practice [large T’s]
43. 43. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelRepresentations (contd.) Conditional on past ( 0 , . . . , −q+1 ), L(µ, ϑ1 , . . . , ϑq , σ|x1:T , 0 , . . . , −q+1 ) ∝   2  T   q   −T  2σ 2 , σ exp − xt − µ + ϑj ˆt−j   t=1  j=1  where (t > 0) q ˆt = xt − µ + ϑj ˆt−j , ˆ0 = 0, . . . , ˆ1−q = 1−q j=1 Recursive deﬁnition of the likelihood, still costly O(T × q)
44. 44. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelRepresentations (contd.) Encompassing approach for general time series models State-space representation xt = Gyt + εt , (1) yt+1 = F yt + ξt , (2) (1) is the observation equation and (2) is the state equation
45. 45. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelRepresentations (contd.) Encompassing approach for general time series models State-space representation xt = Gyt + εt , (1) yt+1 = F yt + ξt , (2) (1) is the observation equation and (2) is the state equation Note This is a special case of hidden Markov model
46. 46. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelMA(q) state-space representation For the MA(q) model, take yt = ( t−q , . . . , t−1 , t ) and then    0 1 0 ... 0  0 0 0 0 1 ... 0     . yt+1 =  ... t+1  .   yt +  0  . 0 0 ... 1 0 0 0 0 ... 0 1 xt = µ − ϑq ϑq−1 ... ϑ1 −1 yt .
47. 47. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics The MA(q) modelMA(q) state-space representation (cont’d) Example For the MA(1) model, observation equation xt = (1 0)yt with yt = (y1t y2t ) directed by the state equation 0 1 1 yt+1 = yt + t+1 . 0 0 ϑ1
48. 48. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general
49. 49. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general (ii). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models;
50. 50. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general (ii). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (iii). use of a complex sampling model with an intractable likelihood, as for instance in some graphical models;
51. 51. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general (ii). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (iii). use of a complex sampling model with an intractable likelihood, as for instance in some graphical models; (iv). use of a huge dataset;
52. 52. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general (ii). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (iii). use of a complex sampling model with an intractable likelihood, as for instance in some graphical models; (iv). use of a huge dataset; (v). use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample);
53. 53. MCMC and likelihood-free methods Part/day I: Markov chain methods Computational issues in Bayesian statistics Typology of problems c A typology of Bayes computational problems (i). latent variable models in general (ii). use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (iii). use of a complex sampling model with an intractable likelihood, as for instance in some graphical models; (iv). use of a huge dataset; (v). use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample); (vi). use of a particular inferential procedure as for instance, Bayes factors π P (θ ∈ Θ0 | x) π(θ ∈ Θ0 ) B01 (x) = . P (θ ∈ Θ1 | x) π(θ ∈ Θ1 )
54. 54. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings AlgorithmThe Metropolis-Hastings Algorithm Computational issues in Bayesian statistics The Metropolis-Hastings Algorithm The Gibbs Sampler Population Monte Carlo
55. 55. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo basicsGeneral purpose A major computational issue in Bayesian statistics: Given a density π known up to a normalizing constant, and an integrable function h, compute h(x)˜ (x)µ(dx) π Π(h) = h(x)π(x)µ(dx) = π (x)µ(dx) ˜ when h(x)˜ (x)µ(dx) is intractable. π
56. 56. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo basicsMonte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠM C (h) = N −1 ˆ N h(xi ). i=1 ˆN as LLN: ΠM C (h) −→ Π(h) If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆN N ΠM C (h) − Π(h) N 0, Π [h − Π(h)]2 .
57. 57. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo basicsMonte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠM C (h) = N −1 ˆ N h(xi ). i=1 ˆN as LLN: ΠM C (h) −→ Π(h) If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: ˆN N ΠM C (h) − Π(h) N 0, Π [h − Π(h)]2 . Caveat conducting to MCMC Often impossible or ineﬃcient to simulate directly from Π
58. 58. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingImportance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx).
59. 59. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingImportance Sampling For Q proposal distribution such that Q(dx) = q(x)µ(dx), alternative representation Π(h) = h(x){π/q}(x)q(x)µ(dx). Principle of importance (!) Generate an iid sample x1 , . . . , xN ∼ Q and estimate Π(h) by N ΠIS (h) = N −1 ˆ Q,N h(xi ){π/q}(xi ). i=1 return to pMC
60. 60. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingProperties of importance Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ Q,N N (ΠIS (h) − Π(h)) N 0, Q{(hπ/q − Π(h))2 } .
61. 61. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingProperties of importance Then ˆ as LLN: ΠIS (h) −→ Π(h) Q,N and if Q((hπ/q)2 ) < ∞, √ L CLT: ˆ Q,N N (ΠIS (h) − Π(h)) N 0, Q{(hπ/q − Π(h))2 } . Caveat ˆ Q,N If normalizing constant of π unknown, impossible to use ΠIS Generic problem in Bayesian Statistics: π(θ|x) ∝ f (x|θ)π(θ).
62. 62. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingSelf-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1
63. 63. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingSelf-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1 ˆ as LLN : ΠSN IS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ Q,N N (ΠSN IS (h) − Π(h)) N 0, π {(π/q)(h − Π(h)}2 ) .
64. 64. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Importance SamplingSelf-Normalised Importance Sampling Self normalized version N −1 N ˆ Q,N ΠSN IS (h) = {π/q}(xi ) h(xi ){π/q}(xi ). i=1 i=1 ˆ as LLN : ΠSN IS (h) −→ Π(h) Q,N and if Π((1 + h2 )(π/q)) < ∞, √ L CLT : ˆ Q,N N (ΠSN IS (h) − Π(h)) N 0, π {(π/q)(h − Π(h)}2 ) . c The quality of the SNIS approximation depends on the choice of Q
65. 65. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx ,
66. 66. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx , [notation warnin: π turned to f !]
67. 67. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f , using an ergodic Markov chain with stationary distribution f Andre¨ Markov ı
68. 68. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f
69. 69. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f irreducible Markov chain with stationary distribution f is ergodic with limiting distribution f under weak conditions hence convergence in distribution of (X (t) ) to a random variable from f . for T0 “large enough” T0 , X (T0 ) distributed from f Markov sequence is dependent sample X (T0 ) , X (T0 +1) , . . . generated from f Birkoﬀ’s ergodic theorem extends LLN, suﬃcient for most approximation purposes
70. 70. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Problem: How can one build a Markov chain with a given stationary distribution?
71. 71. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmThe Metropolis–Hastings algorithm Basics The algorithm uses the objective (target) density f and a conditional density q(y|x) called the instrumental (or proposal) Nicholas Metropolis distribution
72. 72. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmThe MH algorithm Algorithm (Metropolis–Hastings) Given x(t) , 1. Generate Yt ∼ q(y|x(t) ). 2. Take Yt with prob. ρ(x(t) , Yt ), X (t+1) = x(t) with prob. 1 − ρ(x(t) , Yt ), where f (y) q(x|y) ρ(x, y) = min ,1 . f (x) q(y|x)
73. 73. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmFeatures Independent of normalizing constants for both f and q(·|x) (ie, those constants independent of x) Never move to values with f (y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
74. 74. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisﬁes the detailed balance condition f (y) K(y, x) = f (x) K(x, y)
75. 75. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisﬁes the detailed balance condition f (y) K(y, x) = f (x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent
76. 76. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisﬁes the detailed balance condition f (y) K(y, x) = f (x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent 3. If f (Yt ) q(X (t) |Yt ) Pr ≥ 1 < 1. (1) f (X (t) ) q(Yt |X (t) ) that is, the event {X (t+1) = X (t) } is possible, then the chain is aperiodic
77. 77. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible
78. 78. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible 5. For M-H, f -irreducibility implies Harris recurrence
79. 79. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties (2) 4. If q(y|x) > 0 for every (x, y), (2) the chain is irreducible 5. For M-H, f -irreducibility implies Harris recurrence 6. Thus, for M-H satisfying (1) and (2) (i) For h, with Ef |h(X)| < ∞, T 1 lim h(X (t) ) = h(x)df (x) a.e. f. T →∞ T t=1 (ii) and lim K n (x, ·)µ(dx) − f =0 n→∞ TV for every initial distribution µ, where K n (x, ·) denotes the kernel for n transitions.
80. 80. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsRandom walk Metropolis–Hastings Use of a local perturbation as proposal Yt = X (t) + εt , where εt ∼ g, independent of X (t) . The instrumental density is of the form g(y − x) and the Markov chain is a random walk if we take g to be symmetric g(x) = g(−x)
81. 81. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsRandom walk Metropolis–Hastings [code] Algorithm (Random walk Metropolis) Given x(t) 1. Generate Yt ∼ g(y − x(t) ) 2. Take f (Yt )  Y with prob. min 1, , (t+1) t X = f (x(t) )  (t) x otherwise.
82. 82. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsThe original example Example (Random walk and normal target) Generate N (0, 1) based on the uniform proposal [−δ, δ] forget History! The probability of acceptance is then 2 ρ(x(t) , yt ) = exp{(x(t) − yt )/2} ∧ 1. 2 [Hastings (1970)]
83. 83. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsThe original example Example (Random walk & normal (2)) Sample statistics δ 0.1 0.5 1.0 mean 0.399 -0.111 0.10 variance 0.698 1.11 1.06 c As δ ↑, we get better histograms and a faster exploration of the support of f .
84. 84. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithms The original example 400 400 250 0.5 0.5 0.5 300 200 300 0.0 0.0 0.0 150 200 200 -0.5 -0.5 -0.5 100 100 100 -1.0 -1.0 -1.0 50 -1.5 -1.5 -1.5 0 0 0 -1 0 1 2 -2 0 2 -3 -2 -1 0 1 2 3 (a) (b) (c)ples based on U [−δ, δ] with (a) δ = 0.1, (b) δ = 0.5 and (c) δ = 1.0, superimposed with the convergence of the means (15, 000 si
85. 85. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsMixtures by random walk MH Example (Mixture models) n k π(θ|x) ∝ p f (xj |µ , σ ) π(θ) j=1 =1
86. 86. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsMixtures by random walk MH Example (Mixture models) n k π(θ|x) ∝ p f (xj |µ , σ ) π(θ) j=1 =1 Metropolis-Hastings proposal: θ(t) + ωε(t) if u(t) < ρ(t) θ(t+1) = θ(t) otherwise where π(θ(t) + ωε(t) |x) ρ(t) = ∧1 π(θ(t) |x) and ω scaled for good acceptance rate
87. 87. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsMixtures by random walk MH Random walk sampling (50000 iterations) 2 2 1 1 theta theta 0 0 -1 -1 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 1.2 p tau 1.2 0.0 1.0 2.0 1.0 -1 0 1 2 theta 0.8 0 1 2 3 4 5 6 tau 0.6 0.4 0.0 0.2 0.4 0.6 0.8 1.0 p 0 2 4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 p tau General case of a 3 component normal mixture [Celeux & al., 2000]
88. 88. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsMixtures by random walk MH 3 2 µ2 1 0 X −1 −1 0 1 2 3 µ1 Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1)
89. 89. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsConvergence properties Uniform ergodicity prohibited by random walk structure
90. 90. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsConvergence properties Uniform ergodicity prohibited by random walk structure At best, geometric ergodicity: Theorem (Suﬃcient ergodicity) For a symmetric density f , log-concave in the tails, and a positive and symmetric density g, the chain (X (t) ) is geometrically ergodic. [Mengersen & Tweedie, 1996] no tail eﬀect
91. 91. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsillustration of the tail eﬀect 1.5 1.5 Example (Comparison of tails) 1.0 1.0 Random walk Metropolis 0.5 0.5 Hastings algorithms based on a 0.0 0.0 N (0, 1) instrumental for the -0.5 -0.5 generation of (left) a N (0, 1) distribution and (right) a -1.0 -1.0 distribution with density -1.5 -1.5 ψ(x) ∝ (1 + |x|)−3 0 50 100 150 200 0 50 100 150 200 (a) (b) 90% conﬁdence envelopes of the means, derived from 500 parallel independent
92. 92. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsFurther convergence properties Under assumptions skip detailed convergence (A1) f is super-exponential, i.e. it is positive with positive continuous ﬁrst derivative such that lim|x|→∞ n(x) log f (x) = −∞ where n(x) := x/|x|. In words : exponential decay of f in every direction with rate tending to ∞ (A2) lim sup|x|→∞ n(x) m(x) < 0, where m(x) = f (x)/| f (x)| In words: non degeneracy of the countour manifold Cf (y) = {y : f (y) = f (x)} Q is geometrically ergodic, and V (x) ∝ f (x)−1/2 veriﬁes the drift condition [Jarner & Hansen, 2000]
93. 93. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsFurther [further] convergence properties skip hyperdetailed convergence If P ψ-irreducible and aperiodic, for r = (r(n))n∈N real-valued non decreasing sequence, such that, for all n, m ∈ N, r(n + m) ≤ r(n)r(m), and r(0) = 1, for C a small set, τC = inf{n ≥ 1, Xn ∈ C}, and h ≥ 1, assume τC −1 sup Ex r(k)h(Xk ) < ∞, x∈C k=0
94. 94. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsFurther [further] convergence properties then, τC −1 S(f, C, r) := x ∈ X, Ex r(k)h(Xk ) <∞ k=0 is full and absorbing and for x ∈ S(f, C, r), lim r(n) P n (x, .) − f h = 0. n→∞ [Tuominen & Tweedie, 1994]
95. 95. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsComments [CLT, Rosenthal’s inequality...] h-ergodicity implies CLT for additive (possibly unbounded) functionals of the chain, Rosenthal’s inequality and so on... [Control of the moments of the return-time] The condition implies (because h ≥ 1) that τC −1 sup Ex [r0 (τC )] ≤ sup Ex r(k)h(Xk ) < ∞, x∈C x∈C k=0 where r0 (n) = n r(l) Can be used to derive bounds for l=0 the coupling time, an essential step to determine computable bounds, using coupling inequalities [Roberts & Tweedie, 98; Fort & Moulines, 00; Jones et al., 02]
96. 96. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsAlternative conditions The condition is not really easy to work with... [Possible alternative conditions] (a) [Tuominen, Tweedie, 1994] There exists a sequence (Vn )n∈N , Vn ≥ r(n)h, such that (i) supC V0 < ∞, (ii) {V0 = ∞} ⊂ {V1 = ∞} and (iii) P Vn+1 ≤ Vn − r(n)h + br(n)IC .
97. 97. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsAlternative conditions (b) [Fort 2000] ∃V ≥ f ≥ 1 and b < ∞, such that supC V < ∞ and σC P V (x) + Ex ∆r(k)f (Xk ) ≤ V (x) + bIC (x) k=0 where σC is the hitting time on C and ∆r(k) = r(k) − r(k − 1), k ≥ 1 and ∆r(0) = r(0). τC −1 Result (a) ⇔ (b) ⇔ supx∈C Ex k=0 r(k)f (Xk ) < ∞.
98. 98. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsLangevin Algorithms Proposal based on the Langevin diﬀusion Lt is deﬁned by the stochastic diﬀerential equation 1 dLt = dBt + log f (Lt )dt, 2 where Bt is the standard Brownian motion Theorem The Langevin diﬀusion is the only non-explosive diﬀusion which is reversible with respect to f .
99. 99. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsDiscretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretization step
100. 100. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsDiscretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretization step Unfortunately, the discretized chain may be be transient, for instance when lim σ2 log f (x)|x|−1 > 1 x→±∞
101. 101. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsMH correction Accept the new value Yt with probability 2 σ2 exp − Yt − x(t) − 2 log f (x(t) ) 2σ 2 f (Yt ) · ∧1. f (x(t) ) σ2 2 exp − x(t) − Yt − 2 log f (Yt ) 2σ 2 Choice of the scaling factor σ Should lead to an acceptance rate of 0.574 to achieve optimal convergence rates (when the components of x are uncorrelated) [Roberts & Rosenthal, 1998; Girolami & Calderhead, 2011]
102. 102. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsOptimizing the Acceptance Rate Problem of choice of the transition kernel from a practical point of view Most common alternatives: (a) a fully automated algorithm like ARMS; (b) an instrumental density g which approximates f , such that f /g is bounded for uniform ergodicity to apply; (c) a random walk In both cases (b) and (c), the choice of g is critical,
103. 103. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk Diﬀerent approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f .
104. 104. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk Diﬀerent approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f . If x(t) and yt are close, i.e. f (x(t) ) f (yt ) y is accepted with probability f (yt ) min ,1 1. f (x(t) ) For multimodal densities with well separated modes, the negative eﬀect of limited moves on the surface of f clearly shows.
105. 105. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk (2) If the average acceptance rate is low, the successive values of f (yt ) tend to be small compared with f (x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f
106. 106. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
107. 107. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995] This rule is to be taken with a pinch of salt!
108. 108. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t
109. 109. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t The distribution of xt given xt−1 , xt+1 and yt is −1 τ2 exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2 t . 2τ 2 σ2
110. 110. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is suﬃciently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
111. 111. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Markov chain based on a random walk with scale ω = .1.
112. 112. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Markov chain based on a random walk with scale ω = .5.
113. 113. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsMA(2) Since the constraints on (ϑ1 , ϑ2 ) are well-deﬁned, use of a ﬂat prior over the triangle as prior. Simple representation of the likelihood library(mnormt) ma2like=function(theta){ n=length(y) sigma = toeplitz(c(1 +theta[1]^2+theta[2]^2, theta[1]+theta[1]*theta[2],theta[2],rep(0,n-3))) dmnorm(y,rep(0,n),sigma,log=TRUE) }
114. 114. MCMC and likelihood-free methods Part/day I: Markov chain methods The Metropolis-Hastings Algorithm ExtensionsBasic RWHM for MA(2) Algorithm 1 RW-HM-MA(2) sampler set ω and ϑ(1) for i = 2 to T do ˜ (i−1) (i−1) generate ϑj ∼ U(ϑj − ω, ϑj + ω) set p = 0 and ϑ (i) = ϑ(i−1) ˜ if ϑ within the triangle then ˜ p = exp(ma2like(ϑ) − ma2like(ϑ(i−1) )) end if if U < p then ˜ ϑ(i) = ϑ end if end for