Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
      ...
On some computational methods for Bayesian model choice




Outline


      1    Introduction

      2    Importance sampl...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests

  ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests

  ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Self-contained concept

      ...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model compari...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model compari...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence



      All these proble...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bayes f...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabete...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Probit ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Probit ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 10...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 10...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



MCMC 10...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabete...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabete...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



(Furthe...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



The ori...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



The ori...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Wo...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Wo...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Extensi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Extensi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Illustr...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Illustr...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Diabete...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improvi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improvi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



General...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD indicat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



HPD indicat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Diabetes in...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approxi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approxi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approxi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approxi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Approxi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Evidenc...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Mixtures to bridge



Evidenc...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s rep...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s rep...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of la...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



License


...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of th...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of th...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Diabetes i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Th...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Me...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Si...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Si...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Co...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Co...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Fi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Fi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Br...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Br...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Br...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Br...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     The Savage–Dickey ratio



Di...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
  ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
  ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)
  ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Green’s sampler

 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation


 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation


 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



Alternative
   ...
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Yes III: Computational methods for model choice
Upcoming SlideShare
Loading in …5
×

Yes III: Computational methods for model choice

941 views

Published on

These are the slides I gave during three lectures at the YES III meeting in Eindhoven, October 5-7, 2009. They include a presentation of the Savage-Dickey paradox and a comparison of major importance sampling solution, both obtained in collaboration with Jean-Michel Marin.

Published in: Education, Technology, Business
  • Be the first to comment

Yes III: Computational methods for model choice

  1. 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://www.ceremade.dauphine.fr/~xian YES III workshop, Eindhoven, October 7, 2009 Joint works with J.-M. Marin, M. Beaumont, N. Chopin, J.-M. Cornuet, and D. Wraith
  2. 2. On some computational methods for Bayesian model choice Outline 1 Introduction 2 Importance sampling solutions 3 Cross-model solutions 4 Nested sampling 5 ABC model choice
  3. 3. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is 1 if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), δ π (x) = 0 otherwise,
  4. 4. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is 1 if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), δ π (x) = 0 otherwise,
  5. 5. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) 0 π(Θc ) 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  6. 6. On some computational methods for Bayesian model choice Introduction Bayes factor Self-contained concept Outside decision-theoretic environment: eliminates impact of π(Θ0 ) but depends on the choice of (π0 , π1 ) Bayesian/marginal equivalent to the likelihood ratio Jeffreys’ scale of evidence: π if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π if log10 (B10 ) 0.5 and 1, evidence substantial, π if log10 (B10 ) 1 and 2, evidence strong and π if log10 (B10 ) above 2, evidence decisive Requires the computation of the marginal/evidence under both hypotheses/models
  7. 7. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  8. 8. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  9. 9. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  10. 10. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  11. 11. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  12. 12. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj j Θj
  13. 13. On some computational methods for Bayesian model choice Introduction Evidence Evidence All these problems end up with a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.
  14. 14. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )
  15. 15. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women Example (R benchmark) “A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix (AZ), was tested for diabetes according to WHO criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases.” 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  16. 16. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  17. 17. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  18. 18. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  19. 19. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  20. 20. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance MCMC 101 for probit models Use of either a random walk proposal β =β+ in a Metropolis-Hastings algorithm (since the likelihood is available) or of a Gibbs sampler that takes advantage of the missing/latent variable z|y, x, β ∼ N (xT β, 1) Iy × I1−y z≥0 z≤0 (since β|y, X, z is distributed as a standard normal) [Gibbs three times faster]
  21. 21. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women ˆ ˆ MCMC output for βmax = 1.5, β = 1.15, k = 40, and 20, 000 simulations. 500 1000 1500 2000 2500 1.3 1.2 Frequency 1.1 1.0 0.9 0 0 5000 10000 15000 20000 0.9 1.0 1.1 1.2 1.3 400 10 20 30 40 50 60 70 300 200 100 0 0 5000 10000 15000 20000 12 21 29 37 45 53 61 69
  22. 22. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  23. 23. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link="probit"))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  24. 24. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the prior and the above importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  25. 25. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n 1 π1 (θi |x) ˜ B12 ≈ θi ∼ π2 (θ|x) n π2 (θi |x) ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  26. 26. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  27. 27. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 var(B12 ) 1 π1 (θ) − π2 (θ) 2 ≈ E B12 n π2 (θ) is large, i.e. if π1 and π2 have little overlap...
  28. 28. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  29. 29. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ|x)˜2 (θ|x) ˜ π harmonic mean approximation to B12 n1 1 1/˜1 (θ1i |x) π n1 i=1 B21 = n2 θji ∼ πj (θ|x) 1 1/˜2 (θ2i |x) π n2 i=1
  30. 30. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  31. 31. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  32. 32. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  33. 33. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  34. 34. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later!
  35. 35. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drag: Dependence on the unknown normalising constants solved iteratively
  36. 36. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: Var(B12 ) 1 π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ 2 ≈ 2 −1 B12 n1 n2 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Drag: Dependence on the unknown normalising constants solved iteratively
  37. 37. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Extension to varying dimensions When π1 and π2 work on different dimensions, for instance θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = , π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜
  38. 38. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Extension to varying dimensions When π1 and π2 work on different dimensions, for instance θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), to augment π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = , π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜
  39. 39. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R Bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo, expecta[3],cova[3,3])))
  40. 40. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R Bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+dnorm(pseudo, expecta[3],cova[3,3])))
  41. 41. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the prior, the above bridge sampler and the above importance sampler 5 4 q q 3 q 2 MC Bridge IS
  42. 42. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  43. 43. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  44. 44. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 var(B12 ) 1 (π1 (θ) − π2 (θ))2 2 ≈ Eϕ B12 n ϕ(θ)2 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  45. 45. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 var(B12 ) 1 (π1 (θ) − π2 (θ))2 2 ≈ Eϕ B12 n ϕ(θ)2 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  46. 46. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  47. 47. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  48. 48. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions Generalisation to point null situations When π1 (θ1 )dθ1 ˜ Θ1 B12 = π2 (θ2 )dθ2 ˜ Θ2 and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and π1 (θ1 )ω(ψ|θ1 ) ˜ B12 = Eπ2 π2 (θ1 , ψ) ˜ holds for any conditional density ω(ψ|θ1 ).
  49. 49. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  50. 50. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  51. 51. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  52. 52. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  53. 53. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  54. 54. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  55. 55. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  56. 56. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 1 ϕ(θ) = I (t) T t d(θ,θ )≤
  57. 57. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 1 ϕ(θ) = I (t) T t d(θ,θ )≤
  58. 58. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116 q q Harmonic mean Importance sampling
  59. 59. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  60. 60. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  61. 61. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  62. 62. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  63. 63. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  64. 64. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  65. 65. On some computational methods for Bayesian model choice Importance sampling solutions Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  66. 66. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  67. 67. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  68. 68. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  69. 69. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  70. 70. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  71. 71. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  72. 72. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  73. 73. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi 0 100 200 n 300 400 500 pi −1 0 µ 1 i 2 3 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 n σi
  74. 74. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  75. 75. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  76. 76. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  77. 77. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  78. 78. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  79. 79. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  80. 80. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  81. 81. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  82. 82. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  83. 83. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  84. 84. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  85. 85. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above Chib’s and importance samplers q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  86. 86. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio The Savage–Dickey ratio Special representation of the Bayes factor used for simulation Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  87. 87. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Measure-theoretic difficulty The representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 )] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = π1 (θ0 ) c Dickey’s (1971) condition is not a condition
  88. 88. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  89. 89. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Similar measure-theoretic difficulty Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) depends on similar choices of versions Monte Carlo implementation relies on continuous versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  90. 90. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  91. 91. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  92. 92. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  93. 93. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ When π1 (θ0 |x, ψ unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1
  94. 94. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  95. 95. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  96. 96. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  97. 97. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and (θ ¯ T ¯ 1 π0 (ψ (t) ) T ¯ ¯ π1 (ψ (t) |θ(t) ) t=1 Resulting estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  98. 98. On some computational methods for Bayesian model choice Importance sampling solutions The Savage–Dickey ratio Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge
  99. 99. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  100. 100. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  101. 101. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  102. 102. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Example of a regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition... [Marin & Robert, Bayesian Core, 2007]
  103. 103. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  104. 104. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  105. 105. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  106. 106. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  107. 107. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  108. 108. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  109. 109. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1
  110. 110. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  111. 111. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  112. 112. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability Acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 ) c Difficult calibration
  113. 113. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Green’s sampler Algorithm Iteration t (t ≥ 1): if x(t) = (m, θ(m) ), 1 Select model Mn with probability πmn 2 Generate umn ∼ ϕmn (u) and set (θ(n) , vnm ) = Ψm→n (θ(m) , umn ) 3 Take x(t+1) = (n, θ(n) ) with probability π(n, θ(n) ) πnm ϕnm (vnm ) ∂Ψm→n (θ(m) , umn ) min ,1 π(m, θ(m) ) πmn ϕmn (umn ) ∂(θ(m) , umn ) and take x(t+1) = x(t) otherwise.
  114. 114. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic
  115. 115. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic
  116. 116. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes Alternative Saturation of the parameter space H = k {k} × Θk by creating θ = (θ1 , . . . , θD ) a model index M pseudo-priors πj (θj |M = k) for j = k [Carlin & Chib, 1995] Validation by P(M = k|x) = P (M = k|x, θ)π(θ|x)dθ = Zk where the (marginal) posterior is [not πk !] D π(θ|x) = P(θ, M = k|x) k=1 D = pk Zk πk (θk |x) πj (θj |M = k) . k=1 j=k

×