Your SlideShare is downloading. ×
0
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
      ...
On some computational methods for Bayesian model choice




Outline


           Introduction
      1


           Importa...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests



...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Construction of Bayes tests



...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



The 0 − 1 loss

      Neyman–Pe...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



The 0 − 1 loss

      Neyman–Pe...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Type–one and type–two errors
  ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes tests



Type–one and type–two errors
  ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Self-contained concept

      ...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomi...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomi...
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Hot hand
      Example (Binomi...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model compari...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model compari...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian model choice
      Pr...
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence



      All these proble...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Importa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bayes f...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



(Furthe...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



An infa...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Wo...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



“The Wo...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Ratio i...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improvi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Improvi...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



General...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Varying dimensions



X-dimen...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence ap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence ap...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s rep...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s rep...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of la...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Connected ...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



License


...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Label swit...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensati...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Galaxy dat...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Bayesian variab...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Model notations...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Model indicator...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Models in compe...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Models in compe...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Informative G-p...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior definition...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior completio...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Prior competiti...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection




      f (y|γ, ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine procession...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine procession...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine procession...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Noninformative ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine procession...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Stochastic sear...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Stochastic sear...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Gibbs sampling ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



MCMC interpreta...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



MCMC interpreta...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Variable selection



Pine procession...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation

  ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Interpretation

  ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Pseudo-determinist...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Pseudo-determinist...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Generic reversible...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Green’s sampler

 ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Mixture of normal ...
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Mixture of normal ...
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Computational methods for Bayesian model choice
Upcoming SlideShare
Loading in...5
×

Computational methods for Bayesian model choice

1,449

Published on

Cours OFPR given in CREST on March 2, 5, 9 and 12

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,449
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
65
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Computational methods for Bayesian model choice"

  1. 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://www.ceremade.dauphine.fr/~xian c Cours OFPR, CREST, Malakoff 2-12 mars 2009
  2. 2. On some computational methods for Bayesian model choice Outline Introduction 1 Importance sampling solutions 2 Cross-model solutions 3 Nested sampling 4 ABC model choice 5
  3. 3. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Example (Normal mean) For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
  4. 4. On some computational methods for Bayesian model choice Introduction Bayes tests Construction of Bayes tests Definition (Test) Given an hypothesis H0 : θ ∈ Θ0 on the parameter θ ∈ Θ0 of a statistical model, a test is a statistical procedure that takes its values in {0, 1}. Example (Normal mean) For x ∼ N (θ, 1), decide whether or not θ ≤ 0.
  5. 5. On some computational methods for Bayesian model choice Introduction Bayes tests The 0 − 1 loss Neyman–Pearson loss for testing hypotheses Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 . Then D = {0, 1} The 0 − 1 loss 1 − d if θ ∈ Θ0 L(θ, d) = d otherwise,
  6. 6. On some computational methods for Bayesian model choice Introduction Bayes tests The 0 − 1 loss Neyman–Pearson loss for testing hypotheses Test of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ0 . Then D = {0, 1} The 0 − 1 loss 1 − d if θ ∈ Θ0 L(θ, d) = d otherwise,
  7. 7. On some computational methods for Bayesian model choice Introduction Bayes tests Type–one and type–two errors Associated with the risk R(θ, δ) = Eθ [L(θ, δ(x))] if θ ∈ Θ0 , Pθ (δ(x) = 0) = Pθ (δ(x) = 1) otherwise, Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), 1 δ π (x) = 0 otherwise,
  8. 8. On some computational methods for Bayesian model choice Introduction Bayes tests Type–one and type–two errors Associated with the risk R(θ, δ) = Eθ [L(θ, δ(x))] if θ ∈ Θ0 , Pθ (δ(x) = 0) = Pθ (δ(x) = 1) otherwise, Theorem (Bayes test) The Bayes estimator associated with π and with the 0 − 1 loss is if π(θ ∈ Θ0 |x) > π(θ ∈ Θ0 |x), 1 δ π (x) = 0 otherwise,
  9. 9. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) π(Θc ) 0 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  10. 10. On some computational methods for Bayesian model choice Introduction Bayes factor Self-contained concept Outside decision-theoretic environment: eliminates impact of π(Θ0 ) but depends on the choice of (π0 , π1 ) Bayesian/marginal equivalent to the likelihood ratio Jeffreys’ scale of evidence: π if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π if log10 (B10 ) 0.5 and 1, evidence substantial, π if log10 (B10 ) 1 and 2, evidence strong and π if log10 (B10 ) above 2, evidence decisive Requires the computation of the marginal/evidence under both hypotheses/models
  11. 11. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  12. 12. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  13. 13. On some computational methods for Bayesian model choice Introduction Bayes factor Hot hand Example (Binomial homogeneity) Consider H0 : yi ∼ B(ni , p) (i = 1, . . . , G) vs. H1 : yi ∼ B(ni , pi ). Conjugate priors pi ∼ Be(α = ξ/ω, β = (1 − ξ)/ω), with a uniform prior on E[pi |ξ, ω] = ξ and on p (ω is fixed) 1G 1 pyi (1 − pi )ni −yi pα−1 (1 − pi )β−1 d pi B10 = i i 0 i=1 0 ×Γ(1/ω)/[Γ(ξ/ω)Γ((1 − ξ)/ω)]dξ 1 P P i (ni −yi ) i yi (1 − p) 0p dp For instance, log10 (B10 ) = −0.79 for ω = 0.005 and G = 138 slightly favours H0 . [Kass & Raftery, 1995]
  14. 14. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  15. 15. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite Replace hypotheses with models but keep marginal likelihoods and Bayes factors
  16. 16. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  17. 17. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  18. 18. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  19. 19. On some computational methods for Bayesian model choice Introduction Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive π(Mj |x) fj (x |θj )πj (θj |x)dθj Θj j
  20. 20. On some computational methods for Bayesian model choice Introduction Evidence Evidence All these problems end up with a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood.
  21. 21. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling Paradox Simulation from f (the true density) is not necessarily optimal Alternative to direct sampling from f is importance sampling, based on the alternative representation f (x) Ef [h(X)] = h(x) g(x) dx . g(x) X which allows us to use other distributions than f
  22. 22. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling Paradox Simulation from f (the true density) is not necessarily optimal Alternative to direct sampling from f is importance sampling, based on the alternative representation f (x) Ef [h(X)] = h(x) g(x) dx . g(x) X which allows us to use other distributions than f
  23. 23. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Importance sampling algorithm Evaluation of Ef [h(X)] = h(x) f (x) dx X by Generate a sample X1 , . . . , Xn from a distribution g 1 Use the approximation 2 m f (Xj ) 1 h(Xj ) m g(Xj ) j=1
  24. 24. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions and and 0 1 n−1 n0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/ 0 (θ0 ) 0 B01 = n−1 n1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/ 1 (θ1 ) 1
  25. 25. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n π1 (θi |x) 1 ˜ B12 ≈ θi ∼ π2 (θ|x) π2 (θi |x) n ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  26. 26. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 π1 (θ) − π2 (θ) var(B12 ) 1 =E 2 n π2 (θ) B12 is large, i.e. if π1 and π2 have little overlap...
  27. 27. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling variance The bridge sampling estimator does poorly if 2 π1 (θ) − π2 (θ) var(B12 ) 1 =E 2 n π2 (θ) B12 is large, i.e. if π1 and π2 have little overlap...
  28. 28. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ ∀ α(·) B12 = π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ θji ∼ πj (θ|x) n2 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  29. 29. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ)˜2 (θ) ˜ π harmonic mean approximation to B12 n1 1 1/˜ (θ1i |x) π n1 i=1 1 θji ∼ πj (θ|x) B12 = n2 1 1/˜2 (θ2i |x) π n2 i=1 [Newton & Raftery, 1994] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  30. 30. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance An infamous example When 1 α(θ) = π1 (θ)˜2 (θ) ˜ π harmonic mean approximation to B12 n1 1 1/˜ (θ1i |x) π n1 i=1 1 θji ∼ πj (θ|x) B12 = n2 1 1/˜2 (θ2i |x) π n2 i=1 [Newton & Raftery, 1994] Infamous: Most often leads to an infinite variance!!! [Radford Neal’s blog, 2008]
  31. 31. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that itws easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  32. 32. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that itws easy for people to not realize this, and to naively accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  33. 33. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α= n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 π2 (θ1i |x) 1 ˜ n1 π1 (θ1i |x) + n2 π2 (θ1i |x) n1 i=1 B12 ≈ n2 π1 (θ2i |x) 1 ˜ n1 π1 (θ2i |x) + n2 π2 (θ2i |x) n2 i=1 Back later!
  34. 34. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ Var(B12 ) 1 ≈ −1 2 2 n1 n2 B12 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Dependence on the unknown normalising constants solved iteratively
  35. 35. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling (2) Reason: π1 (θ)π2 (θ)[n1 π1 (θ) + n2 π2 (θ)]α(θ)2 dθ Var(B12 ) 1 ≈ −1 2 2 n1 n2 B12 π1 (θ)π2 (θ)α(θ) dθ (by the δ method) Dependence on the unknown normalising constants solved iteratively
  36. 36. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  37. 37. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling Another identity: Eϕ [˜1 (θ)/ϕ(θ)] π B12 = Eϕ [˜2 (θ)/ϕ(θ)] π for any density ϕ with sufficiently large support [Torrie & Valleau, 1977] Use of a single sample θ1 , . . . , θn from ϕ i=1 π1 (θi )/ϕ(θi ) ˜ B12 = i=1 π2 (θi )/ϕ(θi ) ˜
  38. 38. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 (π1 (θ) − π2 (θ))2 var(B12 ) 1 = Eϕ 2 ϕ(θ)2 n B12 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  39. 39. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Ratio importance sampling (2) Approximate variance: 2 (π1 (θ) − π2 (θ))2 var(B12 ) 1 = Eϕ 2 ϕ(θ)2 n B12 Optimal choice: | π1 (θ) − π2 (θ) | ϕ∗ (θ) = | π1 (η) − π2 (η) | dη [Chen, Shao & Ibrahim, 2000]
  40. 40. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  41. 41. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Improving upon bridge sampler Theorem 5.5.3: The asymptotic variance of the optimal ratio importance sampling estimator is smaller than the asymptotic variance of the optimal bridge sampling estimator [Chen, Shao, & Ibrahim, 2000] Does not require the normalising constant | π1 (η) − π2 (η) | dη but a simulation from ϕ∗ (θ) ∝| π1 (θ) − π2 (θ) | .
  42. 42. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions Generalisation to point null situations When π1 (θ1 )dθ1 ˜ Θ1 B12 = π2 (θ2 )dθ2 ˜ Θ2 and Θ2 = Θ1 × Ψ, we get θ2 = (θ1 , ψ) and π1 (θ1 )ω(ψ|θ1 ) ˜ B12 = Eπ2 π2 (θ1 , ψ) ˜ holds for any conditional density ω(ψ|θ1 ).
  43. 43. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  44. 44. On some computational methods for Bayesian model choice Importance sampling solutions Varying dimensions X-dimen’al bridge sampling Generalisation of the previous identity: For any α, Eπ [˜1 (θ1 )ω(ψ|θ1 )α(θ1 , ψ)] π B12 = 2 Eπ1 ×ω [˜2 (θ1 , ψ)α(θ1 , ψ)] π and, for any density ϕ, Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π B12 = Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π [Chen, Shao, & Ibrahim, 2000] Optimal choice: ω(ψ|θ1 ) = π2 (ψ|θ1 ) [Theorem 5.8.2]
  45. 45. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x= dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  46. 46. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x= dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output RB-RJ
  47. 47. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) ϕ(θk ) 1 Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  48. 48. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) ϕ(θk ) 1 Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  49. 49. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) πk (θk )Lk (θk ) 1 = Z2k (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  50. 50. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  51. 51. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight
  52. 52. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  53. 53. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  54. 54. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where 2 MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) If δ (t) = 2, generate θk ∼ ϕ(θk ) independently 3
  55. 55. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 (t) (t) (t) (t) (t) ξ= ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , T t=1 converges to ω1 Zk /{ω1 Zk + 1} ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie ˆ ˆ 3k (t) (t) (t) (t) (t) T t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = (t) (t) (t) (t) T t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  56. 56. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 (t) (t) (t) (t) (t) ξ= ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , T t=1 converges to ω1 Zk /{ω1 Zk + 1} ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie ˆ ˆ 3k (t) (t) (t) (t) (t) T t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = (t) (t) (t) (t) T t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  57. 57. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ∗ πk (θk |x)
  58. 58. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ∗ πk (θk |x)
  59. 59. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T 1 (t) ∗ ∗ πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  60. 60. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  61. 61. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  62. 62. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties Number of modes of the likelihood of order O(k!): 1 c Maximization and even [MCMC] exploration of the posterior surface harder Under exchangeable priors on (θ, p) [prior invariant under 2 permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  63. 63. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Connected difficulties Number of modes of the likelihood of order O(k!): 1 c Maximization and even [MCMC] exploration of the posterior surface harder Under exchangeable priors on (θ, p) [prior invariant under 2 permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  64. 64. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi pi −1 0 1 2 3 0 100 200 300 400 500 µ n i 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 σ n i
  65. 65. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  66. 66. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  67. 67. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  68. 68. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  69. 69. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  70. 70. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  71. 71. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  72. 72. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  73. 73. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  74. 74. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  75. 75. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  76. 76. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  77. 77. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Bayesian variable selection Regression setting: one dependent random variable y and a set {x1 , . . . , xk } of k explanatory variables. Question: Are all xi ’s involved in the regression? Assumption: every subset {i1 , . . . , iq } of q (0 ≤ q ≤ k) explanatory variables, {1n , xi1 , . . . , xiq }, is a proper set of explanatory variables for the regression of y [intercept included in every corresponding model] Computational issue 2k models in competition...
  78. 78. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Model notations 1 X = 1n x 1 · · · xk is the matrix containing 1n and all the k potential predictor variables Each model Mγ associated with binary indicator vector 2 γ ∈ Γ = {0, 1}k where γi = 1 means that the variable xi is included in the model Mγ qγ = 1T γ number of variables included in the model Mγ 3 n t1 (γ) and t0 (γ) indices of variables included in the model and 4 indices of variables not included in the model
  79. 79. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Model indicators For β ∈ Rk+1 and X, we define βγ as the subvector βγ = β0 , (βi )i∈t1 (γ) and Xγ as the submatrix of X where only the column 1n and the columns in t1 (γ) have been left.
  80. 80. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Models in competition The model Mγ is thus defined as y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters. + Warning σ 2 is common to all models and thus uses the same prior for all models
  81. 81. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Models in competition The model Mγ is thus defined as y|γ, βγ , σ 2 , X ∼ Nn Xγ βγ , σ 2 In where βγ ∈ Rqγ +1 and σ 2 ∈ R∗ are the unknown parameters. + Warning σ 2 is common to all models and thus uses the same prior for all models
  82. 82. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Informative G-prior Many (2k ) models in competition: we cannot expect a practitioner to specify a prior on every Mγ in a completely subjective and autonomous manner. Shortcut: We derive all priors from a single global prior associated with the so-called full model that corresponds to γ = (1, . . . , 1).
  83. 83. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior definitions For the full model, Zellner’s G-prior: (i) β|σ 2 , X ∼ Nk+1 (β, cσ 2 (X T X)−1 ) and σ 2 ∼ π(σ 2 |X) = σ −2 ˜ For each model Mγ , the prior distribution of βγ conditional (ii) on σ 2 is fixed as −1 ˜ βγ |γ, σ 2 ∼ Nqγ +1 βγ , cσ 2 Xγ Xγ T , −1 ˜ T˜ T Xγ β and same prior on σ 2 . where βγ = Xγ Xγ
  84. 84. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior completion The joint prior for model Mγ is the improper prior 1 T −(qγ +1)/2−1 ˜ π(βγ , σ 2 |γ) ∝ σ2 exp − βγ − βγ 2(cσ 2 ) ˜ T (Xγ Xγ ) βγ − βγ .
  85. 85. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Prior competition (2) Infinitely many ways of defining a prior on the model index γ: choice of uniform prior π(γ|X) = 2−k . Posterior distribution of γ central to variable selection since it is proportional to marginal density of y on Mγ (or evidence of Mγ ) π(γ|y, X) ∝ f (y|γ, X)π(γ|X) ∝ f (y|γ, X) f (y|γ, β, σ 2 , X)π(β|γ, σ 2 , X) dβ π(σ 2 |X) dσ 2 . =
  86. 86. On some computational methods for Bayesian model choice Cross-model solutions Variable selection f (y|γ, σ 2 , X) = f (y|γ, β, σ 2 )π(β|γ, σ 2 ) dβ 1T −n/2 = (c + 1)−(qγ +1)/2 (2π)−n/2 σ 2 exp − yy 2σ 2 1 −1 ˜T T ˜ cy T Xγ Xγ Xγ T T Xγ y − βγ Xγ Xγ βγ + , 2σ 2 (c + 1) this posterior density satisfies −1 cT π(γ|y, X) ∝ (c + 1)−(qγ +1)/2 y T y − T T y Xγ Xγ Xγ Xγ y c+1 −n/2 1 ˜T T ˜ − β X Xγ βγ . c+1 γ γ
  87. 87. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.2316 0,1,2,4,5,9 0.0374 0,1,9 0.0344 0,1,2,4,5,10 0.0328 0,1,4,5 0.0306 0,1,2,9 0.0250 0,1,2,4,5,7 0.0241 0,1,2,4,5,8 0.0238 0,1,2,4,5,6 0.0237 0,1,2,3,4,5 0.0232 0,1,6,9 0.0146 0,1,2,3,9 0.0145 0,9 0.0143 0,1,2,6,9 0.0135 0,1,4,5,9 0.0128 0,1,3,9 0.0117 0,1,2,8 0.0115
  88. 88. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars (cont’d) Interpretation Model Mγ with the highest posterior probability is t1 (γ) = (1, 2, 4, 5), which corresponds to the variables - altitude, - slope, - height of the tree sampled in the center of the area, and - diameter of the tree sampled in the center of the area. Corresponds to the five variables identified in the R regression output
  89. 89. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars (cont’d) Interpretation Model Mγ with the highest posterior probability is t1 (γ) = (1, 2, 4, 5), which corresponds to the variables - altitude, - slope, - height of the tree sampled in the center of the area, and - diameter of the tree sampled in the center of the area. Corresponds to the five variables identified in the R regression output
  90. 90. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Noninformative extension For Zellner noninformative prior with π(c) = 1/c, we have ∞ c−1 (c + 1)−(qγ +1)/2 y T y− π(γ|y, X) ∝ c=1 −n/2 −1 cT T T y Xγ Xγ Xγ Xγ y . c+1
  91. 91. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars t1 (γ) π(γ|y, X) 0,1,2,4,5 0.0929 0,1,2,4,5,9 0.0325 0,1,2,4,5,10 0.0295 0,1,2,4,5,7 0.0231 0,1,2,4,5,8 0.0228 0,1,2,4,5,6 0.0228 0,1,2,3,4,5 0.0224 0,1,2,3,4,5,9 0.0167 0,1,2,4,5,6,9 0.0167 0,1,2,4,5,8,9 0.0137 0,1,4,5 0.0110 0,1,2,4,5,9,10 0.0100 0,1,2,3,9 0.0097 0,1,2,9 0.0093 0,1,2,4,5,7,9 0.0092 0,1,2,6,9 0.0092
  92. 92. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Stochastic search for the most likely model When k gets large, impossible to compute the posterior probabilities of the 2k models. Need of a tailored algorithm that samples from π(γ|y, X) and selects the most likely models. Can be done by Gibbs sampling, given the availability of the full conditional posterior probabilities of the γi ’s. If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k) π(γi |y, γ−i , X) ∝ π(γ|y, X) (to be evaluated in both γi = 0 and γi = 1)
  93. 93. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Stochastic search for the most likely model When k gets large, impossible to compute the posterior probabilities of the 2k models. Need of a tailored algorithm that samples from π(γ|y, X) and selects the most likely models. Can be done by Gibbs sampling, given the availability of the full conditional posterior probabilities of the γi ’s. If γ−i = (γ1 , . . . , γi−1 , γi+1 , . . . , γk ) (1 ≤ i ≤ k) π(γi |y, γ−i , X) ∝ π(γ|y, X) (to be evaluated in both γi = 0 and γi = 1)
  94. 94. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Gibbs sampling for variable selection Initialization: Draw γ 0 from the uniform distribution on Γ (t−1) (t−1) Iteration t: Given (γ1 , . . . , γk ), generate (t) (t−1) (t−1) 1. γ1 according to π(γ1 |y, γ2 , . . . , γk , X) (t) 2. γ2 according to (t) (t−1) (t−1) π(γ2 |y, γ1 , γ3 , . . . , γk , X) . . . (t) (t) (t) p. γk according to π(γk |y, γ1 , . . . , γk−1 , X)
  95. 95. On some computational methods for Bayesian model choice Cross-model solutions Variable selection MCMC interpretation After T 1 MCMC iterations, output used to approximate the posterior probabilities π(γ|y, X) by empirical averages T 1 π(γ|y, X) = Iγ (t) =γ . T − T0 + 1 t=T0 where the T0 first values are eliminated as burnin. And approximation of the probability to include i-th variable, T 1 P π (γi = 1|y, X) = Iγ (t) =1 . T − T0 + 1 i t=T0
  96. 96. On some computational methods for Bayesian model choice Cross-model solutions Variable selection MCMC interpretation After T 1 MCMC iterations, output used to approximate the posterior probabilities π(γ|y, X) by empirical averages T 1 π(γ|y, X) = Iγ (t) =γ . T − T0 + 1 t=T0 where the T0 first values are eliminated as burnin. And approximation of the probability to include i-th variable, T 1 P π (γi = 1|y, X) = Iγ (t) =1 . T − T0 + 1 i t=T0
  97. 97. On some computational methods for Bayesian model choice Cross-model solutions Variable selection Pine processionary caterpillars P π (γi = 1|y, X) P π (γi = 1|y, X) γi γ1 0.8624 0.8844 γ2 0.7060 0.7716 γ3 0.1482 0.2978 γ4 0.6671 0.7261 γ5 0.6515 0.7006 γ6 0.1678 0.3115 γ7 0.1371 0.2880 γ8 0.1555 0.2876 γ9 0.4039 0.5168 γ10 0.1151 0.2609 ˜ Probabilities of inclusion with both informative (β = 011 , c = 100) and noninformative Zellner’s priors
  98. 98. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  99. 99. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  100. 100. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  101. 101. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  102. 102. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  103. 103. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  104. 104. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability 1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  105. 105. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic (??)
  106. 106. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Interpretation The representation puts us back in a fixed dimension setting: M1 × V1→2 and M2 in one-to-one relation. reversibility imposes that θ1 is derived as (θ1 , v1→2 ) = Ψ−1 (θ2 ) 1→2 appears like a regular Metropolis–Hastings move from the couple (θ1 , v1→2 ) to θ2 when stationary distributions are π(M1 , θ1 ) × ϕ1→2 (v1→2 ) and π(M2 , θ2 ), and when proposal distribution is deterministic (??)
  107. 107. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Pseudo-deterministic reasoning Consider the proposals θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε) Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε) and Reciprocal proposal has density exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε ∂Ψ1→2 (θ1 , v1→2 ) √ × ∂(θ1 , v1→2 ) 2πε by the Jacobian rule. Thus Metropolis–Hastings acceptance probability is π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) 1∧ π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) Does not depend on ε: Let ε go to 0
  108. 108. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Pseudo-deterministic reasoning Consider the proposals θ2 ∼ N (Ψ1→2 (θ1 , v1→2 ), ε) Ψ1→2 (θ1 , v1→2 ) ∼ N (θ2 , ε) and Reciprocal proposal has density exp −(θ2 − Ψ1→2 (θ1 , v1→2 ))2 /2ε ∂Ψ1→2 (θ1 , v1→2 ) √ × ∂(θ1 , v1→2 ) 2πε by the Jacobian rule. Thus Metropolis–Hastings acceptance probability is π(M2 , θ2 ) ∂Ψ1→2 (θ1 , v1→2 ) 1∧ π(M1 , θ1 ) ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) Does not depend on ε: Let ε go to 0
  109. 109. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  110. 110. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  111. 111. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Generic reversible jump acceptance probability If several models are considered simultaneously, with probability 1→2 of choosing move to M2 while in M1 , as in ∞ XZ K(x, B) = ρm (x, y)qm (x, dy) + ω(x)IB (x) m=1 acceptance probability of θ2 = Ψ1→2 (θ1 , v1→2 ) is π(M2 , θ2 ) 2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) while acceptance probability of θ1 with (θ1 , v1→2 ) = Ψ−1 (θ2 ) is 1→2 −1 π(M1 , θ1 ) 1→2 ϕ1→2 (v1→2 ) ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1 ∧ π(M2 , θ2 ) 2→1 ∂(θ1 , v1→2 )
  112. 112. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Green’s sampler Algorithm Iteration t (t ≥ 1): if x(t) = (m, θ(m) ), Select model Mn with probability πmn 1 Generate umn ∼ ϕmn (u) and set 2 (θ(n) , vnm ) = Ψm→n (θ(m) , umn ) Take x(t+1) = (n, θ(n) ) with probability 3 π(n, θ(n) ) πnm ϕnm (vnm ) ∂Ψm→n (θ(m) , umn ) min ,1 π(m, θ(m) ) πmn ϕmn (umn ) ∂(θ(m) , umn ) and take x(t+1) = x(t) otherwise.
  113. 113. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Mixture of normal distributions   k   2 pjk N (µjk , σjk ) Mk = (pjk , µjk , σjk );   j=1 Restrict moves from Mk to adjacent models, like Mk+1 and Mk−1 , with probabilities πk(k+1) and πk(k−1) .
  114. 114. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Mixture of normal distributions   k   2 pjk N (µjk , σjk ) Mk = (pjk , µjk , σjk );   j=1 Restrict moves from Mk to adjacent models, like Mk+1 and Mk−1 , with probabilities πk(k+1) and πk(k−1) .
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×