• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to advanced Monte Carlo methods
 

Introduction to advanced Monte Carlo methods

on

  • 3,196 views

 

Statistics

Views

Total Views
3,196
Views on SlideShare
2,717
Embed Views
479

Actions

Likes
0
Downloads
91
Comments
0

5 Embeds 479

http://xianblog.wordpress.com 427
http://www.r-bloggers.com 44
http://ace2jntuk.hpage.co.in 4
https://xianblog.wordpress.com 3
http://www.hanrss.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to advanced Monte Carlo methods Introduction to advanced Monte Carlo methods Presentation Transcript

    • An introduction to advanced (?) MCMC methods An introduction to advanced (?) MCMC methods Christian P. Robert Universit´ Paris-Dauphine and CREST-INSEE e http://www.ceremade.dauphine.fr/~xian Royal Statistical Society, October 13, 2010
    • An introduction to advanced (?) MCMC methods Motivating example Motivating example 1 Motivating example 2 The Metropolis-Hastings Algorithm
    • An introduction to advanced (?) MCMC methods Motivating example Latent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆
    • An introduction to advanced (?) MCMC methods Motivating example Latent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆ If (x, x⋆ ) observed, fine!
    • An introduction to advanced (?) MCMC methods Motivating example Latent structures make life harder! Even simple models may lead to computational complications, as in latent variable models f (x|θ) = f ⋆ (x, x⋆ |θ) dx⋆ If (x, x⋆ ) observed, fine! If only x observed, trouble!
    • An introduction to advanced (?) MCMC methods Motivating example Example (Mixture models) Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) .
    • An introduction to advanced (?) MCMC methods Motivating example Example (Mixture models) Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) . For a sample of independent random variables (X1 , · · · , Xn ), sample density n {p1 f1 (xi ) + · · · + pk fk (xi )} . i=1
    • An introduction to advanced (?) MCMC methods Motivating example Example (Mixture models) Models of mixtures of distributions: X ∼ fj with probability pj , for j = 1, 2, . . . , k, with overall density X ∼ p1 f1 (x) + · · · + pk fk (x) . For a sample of independent random variables (X1 , · · · , Xn ), sample density n {p1 f1 (xi ) + · · · + pk fk (xi )} . i=1 Expanding this product involves k n elementary terms: prohibitive to compute in large samples.
    • An introduction to advanced (?) MCMC methods Motivating example 0.3N (µ1 , 1) + 0.7N (µ2 , 1) loglikelihood 3 2 µ2 1 0 −1 −1 0 1 2 3 µ1
    • An introduction to advanced (?) MCMC methods Motivating example A typology of Bayes computational problems (i) use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models;
    • An introduction to advanced (?) MCMC methods Motivating example A typology of Bayes computational problems (i) use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii) use of a complex sampling model with an intractable likelihood, as for instance in missing data and graphical models;
    • An introduction to advanced (?) MCMC methods Motivating example A typology of Bayes computational problems (i) use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii) use of a complex sampling model with an intractable likelihood, as for instance in missing data and graphical models; (iii) use of a huge dataset;
    • An introduction to advanced (?) MCMC methods Motivating example A typology of Bayes computational problems (i) use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii) use of a complex sampling model with an intractable likelihood, as for instance in missing data and graphical models; (iii) use of a huge dataset; (iv) use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample);
    • An introduction to advanced (?) MCMC methods Motivating example A typology of Bayes computational problems (i) use of a complex parameter space, as for instance in constrained parameter sets like those resulting from imposing stationarity constraints in dynamic models; (ii) use of a complex sampling model with an intractable likelihood, as for instance in missing data and graphical models; (iii) use of a huge dataset; (iv) use of a complex prior distribution (which may be the posterior distribution associated with an earlier sample); (v) use of a complex inferential procedure as for instance, Bayes factors π π(θ ∈ Θ0 ) B01 (x) = P (θ ∈ Θ0 | x)/P (θ ∈ Θ1 | x) . π(θ ∈ Θ1 )
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis-Hastings Algorithm 1 Motivating example 2 The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains The Metropolis–Hastings algorithm A collection of Metropolis-Hastings algorithms Extensions Convergence assessment
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains Fact: It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx ,
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains Fact: It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f (x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f , using an ergodic Markov chain with stationary distribution f
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov Chains Running Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X (t) ) is generated using a transition kernel with stationary distribution f Ensures the convergence in distribution of (X (t) ) to a random variable from f . For a “large enough” T0 , X (T0 ) can be considered as distributed from f Produces a dependent sample X (T0 ) , X (T0 +1) , . . ., which is generated from f , sufficient for most approximation purposes.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The Metropolis–Hastings algorithm Problem: How can one build a Markov chain with a given stationary distribution?
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The Metropolis–Hastings algorithm Problem: How can one build a Markov chain with a given stationary distribution? MH basics Algorithm that converges to the objective (target) density f using an arbitrary transition kernel density q(x, y) called instrumental (or proposal) distribution
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm The MH algorithm Algorithm (Metropolis–Hastings) Given x(t) , 1 Generate Yt ∼ q(x(t) , y). 2 Take Yt with prob. ρ(x(t) , Yt ), X (t+1) = x(t) with prob. 1 − ρ(x(t) , Yt ), where f (y) q(y, x) ρ(x, y) = min ,1 . f (x) q(x, y)
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Features Independent of normalizing constants for both f and q(x, ·) (ie, those constants independent of x) Never move to values with f (y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Features Independent of normalizing constants for both f and q(x, ·) (ie, those constants independent of x) Never move to values with f (y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain P( θ-> θ ’) Satisfies the detailed balance condition θ’ θ f (x)K(x, y) = f (y)K(y, x) P(θ’-> θ ) [Green, 1995]
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1 The M-H Markov chain is reversible, with invariant/stationary density f .
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1 The M-H Markov chain is reversible, with invariant/stationary density f . 2 As f is a probability measure, the chain is positive recurrent
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties 1 The M-H Markov chain is reversible, with invariant/stationary density f . 2 As f is a probability measure, the chain is positive recurrent 3 If f (Yt ) q(Yt , X (t) ) Pr ≥ 1 < 1. (1) f (X (t) ) q(X (t) , Yt ) i.e., if the event {X (t+1) = X (t) } occurs with positive probability, then the chain is aperiodic
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4 If q(x, y) > 0 for every (x, y), (2) the chain is irreducible
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4 If q(x, y) > 0 for every (x, y), (2) the chain is irreducible 5 For M-H, f -irreducibility implies Harris recurrence
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithm Convergence properties (2) 4 If q(x, y) > 0 for every (x, y), (2) the chain is irreducible 5 For M-H, f -irreducibility implies Harris recurrence 6 Thus, under conditions (1) and (2) (i) For h, with Ef |h(X)| < ∞, T 1 lim h(X (t) ) = h(x)df (x) a.e. f. T →∞ T t=1 (ii) and lim K n (x, ·)µ(dx) − f =0 n→∞ TV for every initial distribution µ, where K n (x, ·) denotes the kernel for n transitions.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms The Independent Case The instrumental distribution q(x, ·) is independent of x and is denoted g
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms The Independent Case The instrumental distribution q(x, ·) is independent of x and is denoted g Algorithm (Independent Metropolis-Hastings) Given x(t) , 1 Generate Yt ∼ g(y) 2 Take  Y f (Yt ) g(x(t) ) with prob. min ,1 ,  t X (t+1) = f (x(t) ) g(Yt )  x(t) otherwise. 
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Properties The resulting sample is not iid
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Properties The resulting sample is not iid but there exist strong convergence properties: Theorem (Ergodicity) The algorithm produces a uniformly ergodic chain if there exists a constant M such that f (x) ≤ M g(x) , x ∈ supp f. In this case, n 1 K n (x, ·) − f TV ≤ 1− . M [Mengersen & Tweedie, 1996]
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + ǫt+1 ǫt ∼ N (0, τ 2 ) and observables yt |xt ∼ N (x2 , σ 2 ) t The distribution of xt given xt−1 , xt+1 and yt is −1 τ2 exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2 t . 2τ 2 σ2
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Noisy AR(1) too) 2 Use for proposal the N (µt , ωt ) distribution, with xt−1 + xt+1 2 τ2 µt = ϕ and ωt = . 1 + ϕ2 1 + ϕ2
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Noisy AR(1) too) 2 Use for proposal the N (µt , ωt ) distribution, with xt−1 + xt+1 2 τ2 µt = ϕ and ωt = . 1 + ϕ2 1 + ϕ2 Ratio π(x)/qind (x) = exp −(yt − x2 )2 /2σ 2 t is bounded
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms (top) Last 500 realisations of the chain {Xk }k out of 10, 000 iterations; (bottom) histogram of the chain, compared with the target distribution.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk Metropolis–Hastings Instead, use a local perturbation as proposal Yt = X (t) + εt , where εt ∼ g, independent of X (t) . The instrumental density is now of the form g(y − x) and the Markov chain is a random walk if g is symmetric g(x) = g(−x)
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Algorithm (Random walk Metropolis) Given x(t) 1 Generate Yt ∼ g(y − x(t) ) 2 Take f (Yt )  Y with prob. min 1, , (t+1) t X = f (x(t) )  (t) x otherwise.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Probit illustration Likelihood and posterior given by n π(β|y, X) ∝ ℓ(β|y, X) ∝ Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . i=1 under the flat prior
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Probit illustration Likelihood and posterior given by n π(β|y, X) ∝ ℓ(β|y, X) ∝ Φ(xiT β)yi (1 − Φ(xiT β))ni −yi . i=1 under the flat prior A random walk proposal works well for a small number of ˆ predictors. Use the maximum likelihood estimate β as starting ˆ value and asymptotic (Fisher) covariance matrix of the MLE, Σ, as scale
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms MCMC algorithm Probit random-walk Metropolis-Hastings ˆ ˆ Initialization: Set β (0) = β and compute Σ Iteration t: 1 ˜ ˆ Generate β ∼ Nk+1 (β (t−1) , τ Σ) 2 Compute ˜ π(β|y) ˜ ρ(β (t−1) , β) = min 1, π(β (t−1) |y) 3 ˜ ˜ With probability ρ(β (t−1) , β) set β (t) = β; otherwise set β (t) = β (t−1) .
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms R bank benchmark Probit modelling with no intercept over the 0.8 −1.0 1.0 0.4 four measurements. −2.0 0.0 0.0 0 4000 8000 −2.0 −1.5 −1.0 −0.5 0 200 600 1000 Three different scales 3 0.0 0.4 0.8 τ = 1, 0.1, 10: best 2 0.4 1 mixing behavior is −1 0.0 0 4000 8000 −1 0 1 2 3 0 200 600 1000 associated with τ = 1. 2.5 0.8 0.0 0.4 0.8 −0.5 1.0 Average of the 0.4 0.0 parameters over 1.8 0 4000 8000 −0.5 0.5 1.5 2.5 0 200 600 1000 MCMC 9, 000 0.0 0.4 0.8 2.0 1.2 1.0 iterations gives plug-in 0.0 0.6 0 4000 8000 0.6 1.0 1.4 1.8 0 200 600 1000 estimate pi = Φ (−1.2193xi1 + 0.9540xi2 + 0.9795xi3 + 1.1481xi4 ) . ˆ
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Mixture models) n k π(θ|x) ∝ pℓ f (xj |µℓ , σℓ ) π(θ) j=1 ℓ=1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Example (Mixture models) n k π(θ|x) ∝ pℓ f (xj |µℓ , σℓ ) π(θ) j=1 ℓ=1 Metropolis-Hastings proposal: θ(t) + ωε(t) if u(t) < ρ(t) θ(t+1) = θ(t) otherwise where π(θ(t) + ωε(t) |x) ρ(t) = ∧1 π(θ(t) |x) and ω scaled for good acceptance rate
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) and scale 1 Iteration 1 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) and scale 1 Iteration 10 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) and scale 1 Iteration 100 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) and scale 1 Iteration 500 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) and scale 1 Iteration 1000 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 10 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 100 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 500 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 1000 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 10,000 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Random walk MCMC output for .7N (µ1 , 1) + .3N (µ2 , 1) √ and scale .1 Iteration 5000 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Convergence properties Uniform ergodicity prohibited by random walk structure
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms Convergence properties Uniform ergodicity prohibited by random walk structure At best, geometric ergodicity: Theorem (Sufficient ergodicity) For a symmetric density f , log-concave in the tails, and a positive and symmetric density g, the chain (X (t) ) is geometrically ergodic. [Mengersen & Tweedie, 1996] no tail effect
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm A collection of Metropolis-Hastings algorithms 1.5 1.5 1.0 1.0 Example (Comparison of tail effects) 0.5 0.5 0.0 0.0 Random-walk Metropolis–Hastings algorithms -0.5 -0.5 based on a N (0, 1) instrumental -1.0 -1.0 for the generation of (a) a -1.5 -1.5 N (0, 1) distribution and (b) a 0 50 100 (a) 150 200 0 50 100 (b) 150 200 distribution with density 90% confidence envelopes of ψ(x) ∝ (1 + |x|)−3 the means, derived from 500 parallel independent chains
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Extensions There are many other families of HM algorithms Adaptive Rejection Metropolis Sampling Reversible Jump Langevin algorithms to name just a few...
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Langevin Algorithms Proposal based on the Langevin diffusion Lt is defined by the stochastic differential equation 1 dLt = dBt + ∇ log f (Lt )dt, 2 where Bt is the standard Brownian motion Theorem The Langevin diffusion is the only non-explosive diffusion which is reversible with respect to f .
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Because continuous time cannot be simulated, consider the discretised sequence σ2 x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretisation step 0.6 0.5 0.4 Example of Density 0.3 f (x) = exp(−x4 ) 0.2 0.1 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 σ2 = .1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Because continuous time cannot be simulated, consider the discretised sequence σ2 x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretisation step 0.6 0.5 0.4 Example of 0.3 f (x) = exp(−x4 ) 0.2 0.1 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 σ2 = .01
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Because continuous time cannot be simulated, consider the discretised sequence σ2 x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretisation step 0.6 0.5 0.4 Example of Density 0.3 f (x) = exp(−x4 ) 0.2 0.1 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 σ2 = .001
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Because continuous time cannot be simulated, consider the discretised sequence σ2 x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretisation step 0.8 0.6 Example of Density 0.4 f (x) = exp(−x4 ) 0.2 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 σ2 = .0001
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Because continuous time cannot be simulated, consider the discretised sequence σ2 x(t+1) = x(t) + ∇ log f (x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ 2 corresponds to the discretisation step 0.6 0.5 0.4 Example of Density 0.3 f (x) = exp(−x4 ) 0.2 0.1 0.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 σ2 = .0001∗
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Discretization Unfortunately, the discretized chain may be transient, for instance when lim σ 2 ∇ log f (x)|x|−1 > 1 x→±∞ Example of f (x) = exp(−x4 ) when σ 2 = .2
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions MH correction Accept the new value Yt with probability 2 σ2 exp − Yt − x(t) − (t) 2 ∇ log f (x ) 2σ 2 f (Yt ) · ∧1. f (x(t) ) σ2 2 exp − x(t) − Yt − 2 ∇ log f (Yt ) 2σ 2 Choice of the scaling factor σ Should lead to an acceptance rate of 0.574 to achieve optimal convergence rates (when the components of x are uncorrelated) [Roberts & Rosenthal, 1998]
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Optimizing the Acceptance Rate Problem of choice of the transition kernel from a practical point of view Most common alternatives: 1 a fully automated algorithm like ARMS; 2 an instrumental density g which approximates f , such that f /g is bounded for uniform ergodicity to apply; 3 a random walk In both cases (b) and (c), the choice of g is critical,
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f .
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Case of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f . If x(t) and yt are close, i.e. f (x(t) ) ≃ f (yt ) y is accepted with probability f (yt ) min ,1 ≃ 1 . f (x(t) ) For multimodal densities with well separated modes, the negative effect of limited moves on the surface of f clearly shows.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Case of the random walk (2) If the average acceptance rate is low, the successive values of f (yt ) tend to be small compared with f (x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Rule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995] This rule is to be taken with a pinch of salt!
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Markov chain based on a random walk with scale ω = .1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Markov chain based on a random walk with scale ω = .5
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Where do we stand? MCMC in a nutshell:
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Where do we stand? MCMC in a nutshell: Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation to target density f when detailed balance condition holds f (x)K(x, y) = f (y)K(y, x)
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Where do we stand? MCMC in a nutshell: Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation to target density f when detailed balance condition holds f (x)K(x, y) = f (y)K(y, x) Easiest implementation of the principle is random walk Metropolis-Hastings Yt = X (t) + εt
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Extensions Where do we stand? MCMC in a nutshell: Running a sequence Xt+1 = Ψ(Xt , Yy ) provides approximation to target density f when detailed balance condition holds f (x)K(x, y) = f (y)K(y, x) Easiest implementation of the principle is random walk Metropolis-Hastings Yt = X (t) + εt Practical convergence requires sufficient energy from the proposal that is calibrated by trial and error.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Convergence diagnostics How many iterations?
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Convergence diagnostics How many iterations? Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Convergence diagnostics How many iterations? Rule # 1 There is no absolute number of simulations, i.e. 1, 000 is neither large, nor small. Rule # 2 It takes [much] longer to check for convergence than for the chain itself to converge. Rule # 3 MCMC is a “what-you-get-is-what-you-see” algorithm: it fails to tell about unexplored parts of the space. Rule # 4 When in doubt, run MCMC chains in parallel and check for consistency. Many “quick-&-dirty” solutions in the literature, but not necessarily 100% trustworthy.
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Example (Bimodal target) 0.4 Density 0.3 exp −x2 /2 4(x − .3)2 + .01 0.2 f (x) = √ . 4(1 + (.3)2 ) + .01 0.1 2π 0.0 −4 −2 0 2 4 and use of random walk Metropolis–Hastings algorithm with variance .04 Evaluation of the missing mass by T −1 [θ(t+1) − θ(t) ] f (θ(t) ) t=1
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment 1.0 0.8 0.6 mass 0.4 0.2 0.0 0 500 1000 1500 2000 Index Sequence [in blue] and mass evaluation [in brown] [Philippe & Robert, 2001]
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm?
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Effective sample size How many iid simulations from π are equivalent to N simulations from the MCMC algorithm? Based on estimated k-th order auto-correlation, ρk = cov x(t) , x(t+k) , effective sample size T0 −1/2 ess N =n 1+2 ρk ˆ , k=1 Only partial indicator that fails to signal chains stuck in one mode of the target
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Tempering Facilitate exploration of π by flattening the target: simulate from πα (x) ∝ π(x)α for α > 0 small enough
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Tempering Facilitate exploration of π by flattening the target: simulate from πα (x) ∝ π(x)α for α > 0 small enough Determine where the modal regions of π are (possibly with parallel versions using different α’s) Recycle simulations from π(x)α into simulations from π by importance sampling Simple modification of the Metropolis–Hastings algorithm, with new acceptance α π(θ′ |x) q(θ|θ′ ) ∧1 π(θ|x) q(θ′ |θ)
    • An introduction to advanced (?) MCMC methods The Metropolis-Hastings Algorithm Convergence assessment Tempering with the mean mixture 1 0.5 0.2 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 −1 −1 −1 −1 0 1 2 3 4 −1 0 1 2 3 4 −1 0 1 2 3 4