• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Séminaire de Physique à Besancon, Nov. 22, 2012
 

Séminaire de Physique à Besancon, Nov. 22, 2012

on

  • 1,860 views

pot-pourri of different sets of slides on ABC, MCMC, and their applications to cosmology and model choice.

pot-pourri of different sets of slides on ABC, MCMC, and their applications to cosmology and model choice.

Statistics

Views

Total Views
1,860
Views on SlideShare
535
Embed Views
1,325

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 1,325

http://xianblog.wordpress.com 1324
http://feedheap.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Séminaire de Physique à Besancon, Nov. 22, 2012 Séminaire de Physique à Besancon, Nov. 22, 2012 Presentation Transcript

    • MCMC and likelihood-free methods MCMC and likelihood-free methods Christian P. Robert Universit´ Paris-Dauphine, IUF, & CREST e Universit´ de Besan¸on, November 22, 2012 e c
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyComputational issues in Bayesian cosmology Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyStatistical problems in cosmology Potentially high dimensional parameter space [Not considered here] Immensely slow computation of likelihoods, e.g WMAP, CMB, because of numerically costly spectral transforms [Data is a Fortran program] Nonlinear dependence and degeneracies between parameters introduced by physical constraints or theoretical assumptions
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyCosmological data Posterior distribution of cosmological parameters for recent observational data of CMB anisotropies (differences in temperature from directions) [WMAP], SNIa, and cosmic shear. Combination of three likelihoods, some of which are available as public (Fortran) code, and of a uniform prior on a hypercube.
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyCosmology parameters Parameters for the cosmology likelihood (C=CMB, S=SNIa, L=lensing) Symbol Description Minimum Maximum Experiment Ωb Baryon density 0.01 0.1 C L Ωm Total matter density 0.01 1.2 C S L w Dark-energy eq. of state -3.0 0.5 C S L ns Primordial spectral index 0.7 1.4 C L ∆2R Normalization (large scales) C σ8 Normalization (small scales) C L h Hubble constant C L τ Optical depth C M Absolute SNIa magnitude S α Colour response S β Stretch response S a L b galaxy z-distribution fit L c L For WMAP5, σ8 is a deduced quantity that depends on the other parameters
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyAdaptation of importance function [Benabed et al., MNRAS, 2010]
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyEstimates Parameter PMC MCMC Ωb 0.0432+0.0027 −0.0024 0.0432+0.0026 −0.0023 Ωm 0.254+0.018 −0.017 0.253+0.018 −0.016 τ 0.088+0.018 −0.016 0.088+0.019 −0.015 w −1.011 ± 0.060 −1.010+0.059 −0.060 ns 0.963+0.015 −0.014 0.963+0.015 −0.014 109 ∆2 R 2.413+0.098 −0.093 2.414+0.098 −0.092 h 0.720+0.022 −0.021 0.720+0.023 −0.021 a 0.648+0.040 −0.041 0.649+0.043 −0.042 b 9.3+1.4 −0.9 9.3+1.7 −0.9 c 0.639+0.084 −0.070 0.639+0.082 −0.070 −M 19.331 ± 0.030 19.332+0.029 −0.031 α 1.61+0.15 −0.14 1.62+0.16 −0.14 −β −1.82+0.17 −0.16 −1.82 ± 0.16 σ8 0.795+0.028 −0.030 0.795+0.030 −0.027 Means and 68% credible intervals using lensing, SNIa and CMB
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyEvidence/Marginal likelihood/Integrated Likelihood ... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x)dx. q(x) expressed as an expectation under any density q with large enough support.
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyEvidence/Marginal likelihood/Integrated Likelihood ... Central quantity of interest in (Bayesian) model choice π(x) E = π(x)dx = q(x)dx. q(x) expressed as an expectation under any density q with large enough support. Importance sampling provides a sample x1 , . . . xN ∼ q and approximation of the above integral, N E≈ wn n=1 π(xn ) where the wn = q(xn ) are the (unnormalised) importance weights.
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyBack to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ωm , Ωb , h, ns , τ, σ8 )
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyBack to cosmology questions Standard cosmology successful in explaining recent observations, such as CMB, SNIa, galaxy clustering, cosmic shear, galaxy cluster counts, and Lyα forest clustering. Flat ΛCDM model with only six free parameters (Ωm , Ωb , h, ns , τ, σ8 ) Extensions to ΛCDM may be based on independent evidence (massive neutrinos from oscillation experiments), predicted by compelling hypotheses (primordial gravitational waves from inflation) or reflect ignorance about fundamental physics (dynamical dark energy). Testing for dark energy, curvature, and inflationary models
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyExtended models Focus on the dark energy equation-of-state parameter, modeled as w = −1 ΛCDM w = w0 wCDM w = w0 + w1 (1 − a) w(z)CDM In addition, curvature parameter ΩK for each of the above is either ΩK = 0 (‘flat’) or ΩK = 0 (‘curved’). Choice of models represents simplest models beyond a “cosmological constant” model able to explain the observed, recent accelerated expansion of the Universe.
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyCosmology priors Prior ranges for dark energy and curvature models. In case of w(a) models, the prior on w1 depends on w0 Parameter Description Min. Max. Ωm Total matter density 0.15 0.45 Ωb Baryon density 0.01 0.08 h Hubble parameter 0.5 0.9 ΩK Curvature −1 1 w0 Constant dark-energy par. −1 −1/3 w1 Linear dark-energy par. −1 − w0 −1/3−w0 1−aacc
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyResults In most cases evidence in favour of the standard model. especially when more datasets/experiments are combined. Largest evidence is ln B12 = 1.8, for the w(z)CDM model and CMB alone. Case where a large part of the prior range is still allowed by the data, and a region of comparable size is excluded. Hence weak evidence that both w0 and w1 are required, but excluded when adding SNIa and BAO datasets. Results on the curvature are compatible with current findings: non-flat Universe(s) strongly disfavoured for the three dark-energy cases.
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyEvidence
    • MCMC and likelihood-free methods Computational issues in Bayesian cosmologyPosterior outcome Posterior on dark-energy parameters w0 and w1 as 68%- and 95% credible regions for WMAP (solid blue lines), WMAP+SNIa (dashed green) and WMAP+SNIa+BAO (dotted red curves). Allowed prior range as red straight lines.
    • MCMC and likelihood-free methods The Metropolis-Hastings AlgorithmThe Metropolis-Hastings Algorithm Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basicsGeneral purpose A major computational issue in Bayesian statistics: Given a density π known up to a normalizing constant, and an integrable function h, compute ˜ h(x)π(x)µ(dx) Π(h) = h(x)π(x)µ(dx) = ˜ π(x)µ(dx) when ˜ h(x)π(x)µ(dx) is intractable.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basicsMonte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N−1 ^N h(xi ). i=1 as LLN: ΠMC (h) −→ Π(h) ^N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: N ΠMC (h) − Π(h) ^N N 0, Π [h − Π(h)]2 .
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo basicsMonte Carlo 101 Generate an iid sample x1 , . . . , xN from π and estimate Π(h) by N ΠMC (h) = N−1 ^N h(xi ). i=1 as LLN: ΠMC (h) −→ Π(h) ^N If Π(h2 ) = h2 (x)π(x)µ(dx) < ∞, √ L CLT: N ΠMC (h) − Π(h) ^N N 0, Π [h − Π(h)]2 . Caveat conducting to MCMC Often impossible or inefficient to simulate directly from Π
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx ,
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , [notation warnin: π turned to f!]
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (MCMC) It is not necessary to use a sample from the distribution f to approximate the integral I= h(x)f(x)dx , We can obtain X1 , . . . , Xn ∼ f (approx) without directly simulating from f, using an ergodic Markov chain with stationary distribution f Andre¨ Markov ı
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f irreducible Markov chain with stationary distribution f is ergodic with limiting distribution f under weak conditions hence convergence in distribution of (X(t) ) to a random variable from f. for T0 “large enough” T0 , X(T0 ) distributed from f Markov sequence is dependent sample X(T0 ) , X(T0 +1) , . . . generated from f Birkoff’s ergodic theorem extends LLN, sufficient for most approximation purposes
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Monte Carlo Methods based on Markov ChainsRunning Monte Carlo via Markov Chains (2) Idea For an arbitrary starting value x(0) , an ergodic chain (X(t) ) is generated using a transition kernel with stationary distribution f Problem: How can one build a Markov chain with a given stationary distribution?
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmThe Metropolis–Hastings algorithm Arguments: The algorithm uses the objective (target) density f and a conditional density q(y|x) called the instrumental (or proposal) Nicholas Metropolis distribution
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmThe MH algorithm Algorithm (Metropolis–Hastings) Given x(t) , 1. Generate Yt ∼ q(y|x(t) ). 2. Take Yt with prob. ρ(x(t) , Yt ), X(t+1) = x(t) with prob. 1 − ρ(x(t) , Yt ), where f(y) q(x|y) ρ(x, y) = min ,1 . f(x) q(y|x)
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmFeatures Independent of normalizing constants for both f and q(·|x) (ie, those constants independent of x) Never move to values with f(y) = 0 The chain (x(t) )t may take the same value several times in a row, even though f is a density wrt Lebesgue measure The sequence (yt )t is usually not a Markov chain
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y)
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm The Metropolis–Hastings algorithmConvergence properties 1. The M-H Markov chain is reversible, with invariant/stationary density f since it satisfies the detailed balance condition f(y) K(y, x) = f(x) K(x, y) 2. As f is a probability measure, the chain is positive recurrent 3. If f(Yt ) q(X(t) |Yt ) Pr 1 < 1. (1) f(X(t) ) q(Yt |X(t) ) that is, the event {X(t+1) = X(t) } is possible, then the chain is aperiodic
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsRandom walk Metropolis–Hastings Use of a local perturbation as proposal Yt = X(t) + εt , where εt ∼ g, independent of X(t) . The instrumental density is of the form g(y − x) and the Markov chain is a random walk if we take g to be symmetric g(x) = g(−x)
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm Random-walk Metropolis-Hastings algorithmsRandom walk Metropolis–Hastings [code] Algorithm (Random walk Metropolis) Given x(t) 1. Generate Yt ∼ g(y − x(t) ) 2. Take  Y f(Yt ) (t+1) t with prob. min 1, , X = f(x(t) )  (t) x otherwise.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsLangevin Algorithms Proposal based on the Langevin diffusion Lt is defined by the stochastic differential equation 1 dLt = dBt + log f(Lt )dt, 2 where Bt is the standard Brownian motion Theorem The Langevin diffusion is the only non-explosive diffusion which is reversible with respect to f.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsDiscretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f(x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ2 corresponds to the discretization step
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsDiscretization Instead, consider the sequence σ2 x(t+1) = x(t) + log f(x(t) ) + σεt , εt ∼ Np (0, Ip ) 2 where σ2 corresponds to the discretization step Unfortunately, the discretized chain may be transient, for instance when lim σ2 log f(x)|x|−1 > 1 x→±∞
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsMH correction Accept the new value Yt with probability 2 σ2 exp − Yt − x(t) − 2 log f(x(t) ) 2σ2 f(Yt ) · ∧1. f(x(t) ) σ2 2 exp − x(t) − Yt − 2 log f(Yt ) 2σ2 Choice of the scaling factor σ Should lead to an acceptance rate of 0.574 to achieve optimal convergence rates (when the components of x are uncorrelated) [Roberts & Rosenthal, 1998; Girolami & Calderhead, 2011]
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsOptimizing the Acceptance Rate Problem of choosing the transition q kernel from a practical point of view Most common solutions: (a) a fully automated algorithm like ARMS; [Gilks & Wild, 1992] (b) an instrumental density g which approximates f, such that f/g is bounded for uniform ergodicity to apply; (c) a random walk In both cases (b) and (c), the choice of g is critical,
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk Different approach to acceptance rates A high acceptance rate does not indicate that the algorithm is moving correctly since it indicates that the random walk is moving too slowly on the surface of f. If x(t) and yt are close, i.e. f(x(t) ) f(yt ) y is accepted with probability f(yt ) min ,1 1. f(x(t) ) For multimodal densities with well separated modes, the negative effect of limited moves on the surface of f clearly shows.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsCase of the random walk (2) If the average acceptance rate is low, the successive values of f(yt ) tend to be small compared with f(x(t) ), which means that the random walk moves quickly on the surface of f since it often reaches the “borders” of the support of f
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995]
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRule of thumb In small dimensions, aim at an average acceptance rate of 50%. In large dimensions, at an average acceptance rate of 25%. [Gelman,Gilks and Roberts, 1995] warnin: rule to be taken with a pinch of salt!
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N(0, τ2 ) and observables yt |xt ∼ N(x2 , σ2 ) t
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1)) Hidden Markov chain from a regular AR(1) model, xt+1 = ϕxt + t+1 t ∼ N(0, τ2 ) and observables yt |xt ∼ N(x2 , σ2 ) t The distribution of xt given xt−1 , xt+1 and yt is −1 τ2 exp (xt − ϕxt−1 )2 + (xt+1 − ϕxt )2 + (yt − x2 )2 t . 2τ2 σ2
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Example (Noisy AR(1) continued) For a Gaussian random walk with scale ω small enough, the random walk never jumps to the other mode. But if the scale ω is sufficiently large, the Markov chain explores both modes and give a satisfactory approximation of the target distribution.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Markov chain based on a random walk with scale ω = .1.
    • MCMC and likelihood-free methods The Metropolis-Hastings Algorithm ExtensionsRole of scale Markov chain based on a random walk with scale ω = .5.
    • MCMC and likelihood-free methods The Gibbs SamplerThe Gibbs Sampler Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesGeneral Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesGeneral Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp )
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesGeneral Principles A very specific simulation algorithm based on the target distribution f: 1. Uses the conditional densities f1 , . . . , fp from f 2. Start with the random variable X = (X1 , . . . , Xp ) 3. Simulate from the conditional densities, Xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ∼ fi (xi |x1 , x2 , . . . , xi−1 , xi+1 , . . . , xp ) for i = 1, 2, . . . , p.
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesGibbs code Algorithm (Gibbs sampler) (t) (t) Given x(t) = (x1 , . . . , xp ), generate (t+1) (t) (t) 1. X1 ∼ f1 (x1 |x2 , . . . , xp ); (t+1) (t+1) (t) (t) 2. X2 ∼ f2 (x2 |x1 , x3 , . . . , xp ), ... (t+1) (t+1) (t+1) p. Xp ∼ fp (xp |x1 , . . . , xp−1 ) X(t+1) → X ∼ f
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesProperties The full conditionals densities f1 , . . . , fp are the only densities used for simulation. Thus, even in a high dimensional problem, all of the simulations may be univariate
    • MCMC and likelihood-free methods The Gibbs Sampler General Principlestoy example: iid N(µ, σ2 ) variates iid When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the posterior in (µ, σ2 ) is conjugate outside a standard familly
    • MCMC and likelihood-free methods The Gibbs Sampler General Principlestoy example: iid N(µ, σ2 ) variates iid When Y1 , . . . , Yn ∼ N(y|µ, σ2 ) with both µ and σ unknown, the posterior in (µ, σ2 ) is conjugate outside a standard familly But... n σ2 µ|Y 0:n , σ2 ∼ N µ 1 n i=1 Yi , n ) σ2 |Y 1:n , µ ∼ IG σ2 n − 1, 2 n (Yi 2 1 i=1 − µ)2 assuming constant (improper) priors on both µ and σ2 Hence we may use the Gibbs sampler for simulating from the posterior of (µ, σ2 )
    • MCMC and likelihood-free methods The Gibbs Sampler General Principlestoy example: R code Gibbs Sampler for Gaussian posterior n = length(Y); S = sum(Y); mu = S/n; for (i in 1:500) S2 = sum((Y-mu)^2); sigma2 = 1/rgamma(1,n/2-1,S2/2); mu = S/n + sqrt(sigma2/n)*rnorm(1);
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with n = 10 observations from theN(0, 1) distribution Number of Iterations 1, 2, 3, 4, 5, 10, 25, 50, 100, 500
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesLimitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesLimitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesLimitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesLimitations of the Gibbs sampler Formally, a special case of a sequence of 1-D M-H kernels, all with acceptance rate uniformly equal to 1. The Gibbs sampler 1. limits the choice of instrumental distributions 2. requires some knowledge of f 3. is, by construction, multidimensional 4. does not apply to problems where the number of parameters varies as the resulting chain is not irreducible.
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesA wee problem 4 3 2 µ2 1 0 −1 −1 0 1 2 3 4 µ1 Gibbs started at random
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesA wee problem Gibbs stuck at the wrong mode 4 3 3 2 2 µ2 1 µ2 1 0 0 −1 −1 −1 0 1 2 3 4 µ1 Gibbs started at random −1 0 1 2 3 µ1
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesSlice sampler as generic Gibbs If f(θ) can be written as a product k fi (θ), i=1
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesSlice sampler as generic Gibbs If f(θ) can be written as a product k fi (θ), i=1 it can be completed as k I0 ωi fi (θ) , i=1 leading to the following Gibbs algorithm:
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesSlice sampler (code) Algorithm (Slice sampler) Simulate (t+1) 1. ω1 ∼ U[0,f1 (θ(t) )] ; ... (t+1) k. ωk ∼ U[0,fk (θ(t) )] ; k+1. θ(t+1) ∼ UA(t+1) , with (t+1) A(t+1) = {y; fi (y) ωi , i = 1, . . . , k}.
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50
    • MCMC and likelihood-free methods The Gibbs Sampler General PrinciplesExample of results with a truncated N(−3, 1) distribution 0.010 0.008 0.006 y 0.004 0.002 0.000 0.0 0.2 0.4 0.6 0.8 1.0 x Number of Iterations 2, 3, 4, 5, 10, 50, 100
    • MCMC and likelihood-free methods Approximate Bayesian computationApproximate Bayesian computation Computational issues in Bayesian cosmology The Metropolis-Hastings Algorithm The Gibbs Sampler Approximate Bayesian computation
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsRegular Bayesian computation issues Recap’: When faced with a non-standard posterior distribution π(θ|y) ∝ π(θ)L(θ|y) the standard solution is to use simulation (Monte Carlo) to produce a sample θ1 , . . . , θT from π(θ|y) (or approximately by Markov chain Monte Carlo methods) [Robert & Casella, 2004]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsUntractable likelihoods Cases when the likelihood function f(y|θ) is unavailable (in analytic and numerical senses) and when the completion step f(y|θ) = f(y, z|θ) dz Z is impossible or too costly because of the dimension of z c MCMC cannot be implemented!
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsIllustration Phylogenetic tree: in population genetics, reconstitution of a common ancestor from a sample of genes via a phylogenetic tree that is close to impossible to integrate out [100 processor days with 4 parameters] [Cornuet et al., 2009, Bioinformatics]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsIllustration !""#$%&()*+,(-*.&(/+0$"1)()&$/+2!,03! 1/+*%*"4*+56(""4&7()&$/.+.1#+4*.+8-9:*.+ Différents scénarios possibles, choix de scenario par ABC demo-genetic inference Genetic model of evolution from a common ancestor (MRCA) characterized by a set of parameters that cover historical, demographic, and genetic factors Dataset of polymorphism (DNA sample) observed at the present time Le scenario 1a est largement soutenu par rapport aux autres ! plaide pour une origine commune des Verdu et al. 2009 populations pygmées d’Afrique de l’Ouest 97
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsIllustration !""#$%&()*+,(-*.&(/+0$"1)()&$/+2!,03! 1/+*%*"4*+56(""4&7()&$/.+.1#+4*.+8-9:*.+ Pygmies population demo-genetics Pygmies populations: do they have a common origin? when and how did they split from non-pygmies populations? were there more recent interactions between pygmies and non-pygmies populations? 94
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f(x|θ)
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f(x|θ) When likelihood f(x|θ) not in closed form, likelihood-free rejection technique:
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsThe ABC method Bayesian setting: target is π(θ)f(x|θ) When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f(y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f(z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´ et al., 1997] e
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsWhy does it work?! The proof is trivial: f(θi ) ∝ π(θi )f(z|θi )Iy (z) z∈D ∝ π(θi )f(y|θi ) = π(θi |y) . [Accept–Reject 101]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsA as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsA as approximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ {ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f(·|θ ) until ρ{η(z), η(y)} set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsOutput The likelihood-free algorithm samples from the marginal in z of: π(θ)f(z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f(z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsOutput The likelihood-free algorithm samples from the marginal in z of: π(θ)f(z|θ)IA ,y (z) π (θ, z|y) = , A ,y ×Θ π(θ)f(z|θ)dzdθ where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsPima Indian benchmark 80 100 1.0 80 60 0.8 60 0.6 Density Density Density 40 40 0.4 20 20 0.2 0.0 0 0 −0.005 0.010 0.020 0.030 −0.05 −0.03 −0.01 −1.0 0.0 1.0 2.0 Figure: Comparison between density estimates of the marginals on β1 (left), β2 (center) and β3 (right) from ABC rejection samples (red) and MCMC samples (black) .
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in efficiency
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Sisson et al., 2007] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002] .....or even by including in the inferential framework [ABCµ ] [Ratmann et al., 2009]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise,
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC-MCMC Markov chain (θ(t) ) created via the transition function  θ ∼ Kω (θ |θ(t) ) if x ∼ f(x|θ ) is such that x = y   π(θ )Kω (t) |θ ) θ (t+1) = and u ∼ U(0, 1) π(θ(t) )K (θ |θ(t) ) ,   (t) ω (θ θ otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC-MCMC (2) Algorithm 2 Likelihood-free MCMC sampler Use Algorithm 1 to get (θ(0) , z(0) ) for t = 1 to N do Generate θ from Kω ·|θ(t−1) , Generate z from the likelihood f(·|θ ), Generate u from U[0,1] , π(θ )Kω (θ(t−1) |θ ) if u I π(θ(t−1) Kω (θ |θ(t−1) ) A ,y (z ) then set (θ(t) , z(t) ) = (θ , z ) else (θ(t) , z(t) )) = (θ(t−1) , z(t−1) ), end if end for
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsSequential Monte Carlo SMC is a simulation technique to approximate a sequence of related probability distributions πn with π0 “easy” and πT as target. Iterated IS as PMC : particles moved from time n to time n via kernel Kn and use of a sequence of extended targets πn˜ n ˜ πn (z0:n ) = πn (zn ) Lj (zj+1 , zj ) j=0 where the Lj ’s are backward Markov kernels [check that πn (zn ) is a marginal] [Del Moral, Doucet & Jasra, Series B, 2006]
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsSequential Monte Carlo (2) Algorithm 3 SMC sampler [Del Moral, Doucet & Jasra, Series B, 2006] (0) sample zi ∼ γ0 (x) (i = 1, . . . , N) (0) (0) (0) compute weights wi = π0 (zi ))/γ0 (zi ) for t = 1 to N do if ESS(w(t−1) ) < NT then resample N particles z(t−1) and set weights to 1 end if (t−1) (t−1) generate zi ∼ Kt (zi , ·) and set weights to (t) (t) (t−1) (t) (t−1) πt (zi ))Lt−1 (zi ), zi )) wi = Wi−1 (t−1) (t−1) (t) πt−1 (zi ))Kt (zi ), zi )) end for
    • MCMC and likelihood-free methods Approximate Bayesian computation ABC basicsABC-SMC [Del Moral, Doucet & Jasra, 2009] True derivation of an SMC-ABC algorithm Use of a kernel Kn associated with target π n and derivation of the backward kernel π n (z )Kn (z , z) Ln−1 (z, z ) = πn (z) Update of the weights M m=1 IA n (xm ) in win ∝ wi(n−1) M m=1 IA n−1 (xm i(n−1) ) when xm ∼ K(xi(n−1) , ·) in