SlideShare a Scribd company logo
1 of 84
Download to read offline
Importance sampling methods for Bayesian discrimination between embedded models




                Importance sampling methods for Bayesian
                 discrimination between embedded models

                                             Christian P. Robert

                                 Universit´ Paris Dauphine & CREST-INSEE
                                          e
                                 http://www.ceremade.dauphine.fr/~xian


           45th Scientific Meeting of the Italian Statistical Society
                          Padova, 16 giugno 2010
                        Joint work with J.-M. Marin
Importance sampling methods for Bayesian discrimination between embedded models




Outline



      1    Bayesian model choice

      2    Importance sampling model comparison solutions compared
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Model choice as model comparison




      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),                    i∈I

      Subsitute hypotheses with models
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Model choice as model comparison




      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),                    i∈I

      Subsitute hypotheses with models
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
              allocate probabilities pi to all models Mi
              define priors πi (θi ) for each parameter space Θi
              compute

                                                         pi         fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                     j          Θj



              take largest π(Mi |x) to determine “best” model,
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
              allocate probabilities pi to all models Mi
              define priors πi (θi ) for each parameter space Θi
              compute

                                                         pi         fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                     j          Θj



              take largest π(Mi |x) to determine “best” model,
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
              allocate probabilities pi to all models Mi
              define priors πi (θi ) for each parameter space Θi
              compute

                                                         pi         fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                     j          Θj



              take largest π(Mi |x) to determine “best” model,
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Model choice



Bayesian model choice

      Probabilise the entire model/parameter space
              allocate probabilities pi to all models Mi
              define priors πi (θi ) for each parameter space Θi
              compute

                                                         pi         fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                     j          Θj



              take largest π(Mi |x) to determine “best” model,
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Bayes factor



Bayes factor


      Definition (Bayes factors)
      For comparing model M0 with θ ∈ Θ0 vs. M1 with θ ∈ Θ1 , under
      priors π0 (θ) and π1 (θ), central quantity

                                                                        f0 (x|θ0 )π0 (θ0 )dθ0
                            π(Θ0 |x)             π(Θ0 )            Θ0
                    B01   =                             =
                            π(Θ1 |x)             π(Θ1 )
                                                                        f1 (x|θ)π1 (θ1 )dθ1
                                                                   Θ1

                                                                                     [Jeffreys, 1939]
Importance sampling methods for Bayesian discrimination between embedded models
  Bayesian model choice
     Evidence



Evidence



      Problems using a similar quantity, the evidence

                                      Zk =            πk (θk )Lk (θk ) dθk ,
                                                 Θk

      aka the marginal likelihood.
                                                                                  [Jeffreys, 1939]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared




A comparison of importance sampling solutions

      1    Bayesian model choice

      2    Importance sampling model comparison solutions compared
             Regular importance
             Bridge sampling
             Mixtures to bridge
             Harmonic means
             Chib’s solution
             The Savage–Dickey ratio


                                                                            [Marin & Robert, 2010]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                       f0 (x|θ0 )π0 (θ0 )dθ0
                                                  Θ0
                                    B01 =
                                                       f1 (x|θ1 )π1 (θ1 )dθ1
                                                  Θ1

      use of importance functions                      0    and     1   and

                                       n−1
                                        0
                                                  n0         i       i
                                                  i=1 f0 (x|θ0 )π0 (θ0 )/
                                                                                      i
                                                                                  0 (θ0 )
                           B01 =
                                       n−1
                                        1
                                                  n1         i       i
                                                  i=1 f1 (x|θ1 )π1 (θ1 )/
                                                                                      i
                                                                                  1 (θ1 )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Diabetes in Pima Indian women

      Example (R benchmark)
      “A population of women who were at least 21 years old, of Pima
      Indian heritage and living near Phoenix (AZ), was tested for
      diabetes according to WHO criteria. The data were collected by
      the US National Institute of Diabetes and Digestive and Kidney
      Diseases.
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                             P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                        β ∼ N3 (0, n XT X)−1
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Probit modelling on Pima Indian women


      Probability of diabetes function of above variables

                             P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:

                                        β ∼ N3 (0, n XT X)−1
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link=probit)))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Importance sampling for the Pima Indian dataset


      Use of the importance function inspired from the MLE estimate
      distribution
                                        ˆ ˆ
                                β ∼ N (β, Σ)

      R Importance sampling code
      model1=summary(glm(y~-1+X1,family=binomial(link=probit)))
      is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled)
      is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)
      bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1],
           sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)-
           dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Regular importance



Diabetes in Pima Indian women
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations from the prior and
      the above MLE importance sampler
             5
             4




                                                                             q
             3
             2




                                 Monte Carlo                       Importance sampling
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Bridge sampling


      Special case:
      If
                                           π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                        ˜
                                           π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                        ˜
      live on the same space (Θ1 = Θ2 ), then
                                            n
                                       1         π1 (θi |x)
                                                 ˜
                            B12 ≈                                       θi ∼ π2 (θ|x)
                                       n         π2 (θi |x)
                                                 ˜
                                           i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



(Further) bridge sampling

      General identity:

                                        π2 (θ|x)α(θ)π1 (θ|x)dθ
                                        ˜
                  B12 =                                                             ∀ α(·)
                                        π1 (θ|x)α(θ)π2 (θ|x)dθ
                                        ˜

                                         n1
                                   1
                                              π2 (θ1i |x)α(θ1i )
                                              ˜
                                   n1
                                        i=1
                            ≈            n2                                       θji ∼ πj (θ|x)
                                   1
                                              π1 (θ2i |x)α(θ2i )
                                              ˜
                                   n2
                                        i=1
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Optimal bridge sampling
      The optimal choice of auxiliary function is
                                                       n1 + n2
                                     α =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                       1                    π2 (θ1i |x)
                                                             ˜
                                       n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                            i=1
                           B12 ≈             n2
                                       1                    π1 (θ2i |x)
                                                             ˜
                                       n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                            i=1

                                                                                    Back later!

      Drawback: Dependence on the unknown normalising constants
      solved iteratively
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Optimal bridge sampling
      The optimal choice of auxiliary function is
                                                       n1 + n2
                                     α =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                             n1
                                       1                    π2 (θ1i |x)
                                                             ˜
                                       n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                            i=1
                           B12 ≈             n2
                                       1                    π1 (θ2i |x)
                                                             ˜
                                       n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                            i=1

                                                                                    Back later!

      Drawback: Dependence on the unknown normalising constants
      solved iteratively
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                              π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜
               B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                              π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜
               B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
           dnorm(pseudo,expecta[3],cova[3,3])))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
      R bridge sampling code
      cova=model2$cov.unscaled
      expecta=model2$coeff[,1]
      covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3]

      probit1=hmprobit(Niter,y,X1)
      probit2=hmprobit(Niter,y,X2)
      pseudo=rnorm(Niter,meanw(probit1),sqrt(covw))
      probit1p=cbind(probit1,pseudo)

      bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]),
           sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3],
           cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+
           dnorm(pseudo,expecta[3],cova[3,3])))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Bridge sampling



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 × 20, 000 simulations from the prior (MC), the above
      bridge sampler and the above importance sampler
             5
             4




                                                            q


                                                                                  q
             3




                                                            q
             2




                                MC                     Bridge                     IS
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Approximating Zk using a mixture representation


                                                                                    Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
                                                                          [Chopin & Robert, 2010]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Approximating Zk using a mixture representation


                                                                                    Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,

      where ϕ(·) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
                                                                          [Chopin & Robert, 2010]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                      (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )         ω1 πk (θk        )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                 (t)                    (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                 (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                      (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )         ω1 πk (θk        )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                 (t)                    (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                 (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
          1   Take δ (t) = 1 with probability

                          (t−1)           (t−1)                      (t−1)         (t−1)          (t−1)
              ω1 πk (θk           )Lk (θk         )         ω1 πk (θk        )Lk (θk       ) + ϕ(θk       )

              and δ (t) = 2 otherwise;
                                                 (t)                    (t−1)
          2   If δ (t) = 1, generate θk ∼ MCMC(θk          , θk ) where
              MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated
              with the posterior πk (θk |x) ∝ πk (θk )Lk (θk );
                                                 (t)
          3   If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                        (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)                  (t)           (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                     (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                         (t)           (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )               ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                      (t)            (t)          (t)
                                   t=1 ϕ(θk )          ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                        [Bridge sampler]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Mixtures to bridge



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                          T
           ˆ 1
           ξ=
                                        (t)
                                ω1 πk (θk )Lk (θk )
                                                     (t)                  (t)           (t)
                                                              ω1 πk (θk )Lk (θk ) + ϕ(θk ) ,
                                                                                                     (t)
              T
                          t=1

      converges to ω1 Zk /{ω1 Zk + 1}
               3k
                           ˆ       ˆ        ˆ
      Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie

                           T           (t)     (t)                         (t)           (t)          (t)
                           t=1 ω1 πk (θk )Lk (θk )               ω1 π(θk )Lk (θk ) + ϕ(θk )
           ˆ
           Z3k =
                                   T      (t)                      (t)            (t)          (t)
                                   t=1 ϕ(θk )          ω1 πk (θk )Lk (θk ) + ϕ(θk )

                                                                                        [Bridge sampler]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1              1
                                                T           L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                        [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                                     T
                                                1              1
                                                T           L(θkt |x)
                                                    t=1

      is an unbiased estimator of 1/mk (x)
                                                                        [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                                ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                    dθk =
                πk (θk )Lk (θk )                      πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                                ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                                    dθk =
                πk (θk )Lk (θk )                      πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                         T                 (t)
                                                   1                 ϕ(θk )
                                 Z1k = 1                             (t)          (t)
                                                   T          πk (θk )Lk (θk )
                                                       t=1

      to have a finite variance.
      E.g., use finite support kernels for ϕ
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                                         T                 (t)
                                                   1                 ϕ(θk )
                                 Z1k = 1                             (t)          (t)
                                                   T          πk (θk )Lk (θk )
                                                       t=1

      to have a finite variance.
      E.g., use finite support kernels for ϕ
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1k with a standard importance sampling approximation
                                                     T           (t)          (t)
                                             1              πk (θk )Lk (θk )
                                    Z2k    =                           (t)
                                             T                   ϕ(θk )
                                                    t=1

                         (t)
      where the θk ’s are generated from the density ϕ(·) (with fatter
      tails like t’s)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



HPD indicator as ϕ
      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:
                                10
                         ϕ(θ) =          Id(θ,θ(t) )≤
                                T
                                                        t∈HPD
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Harmonic means



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above harmonic mean sampler and importance samplers
             3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116




                                                                     q




                                                                                       q




                                                               Harmonic mean   Importance sampling
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                    ∗       ∗
                                                            fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                                 .
                                                                 ˆ ∗
                                                                πk (θk |x)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                                    ∗       ∗
                                                            fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                                 .
                                                                 ˆ ∗
                                                                πk (θk |x)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                                            t=1
                         (t)
      where the zk ’s are Gibbs sampled latent variables
                                                                                  Skip difficulties...
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Label switching


      A mixture model [special case of missing variable model] is
      invariant under permutations of the indices of the components.
      E.g., mixtures
                          0.3N (0, 1) + 0.7N (2.3, 1)
      and
                                       0.7N (2.3, 1) + 0.3N (0, 1)
      are exactly the same!
       c The component parameters θi are not identifiable
      marginally since they are exchangeable
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Connected difficulties


          1   Number of modes of the likelihood of order O(k!):
               c Maximization and even [MCMC] exploration of the
              posterior surface harder
          2   Under exchangeable priors on (θ, p) [prior invariant under
              permutation of the indices], all posterior marginals are
              identical:
               c Posterior expectation of θ1 equal to posterior expectation
              of θ2
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



License


      Since Gibbs output does not produce exchangeability, the Gibbs
      sampler has not explored the whole parameter space: it lacks
      energy to switch simultaneously enough component allocations at
      once

                                                                             0.2 0.3 0.4 0.5
                  −1 0 1 2 3
          µi




                                  0   100   200

                                                  n
                                                      300   400   500
                                                                        pi                     −1         0
                                                                                                                    µ
                                                                                                                     1

                                                                                                                     i
                                                                                                                               2       3




                                                                             0.4 0.6 0.8 1.0
                0.2 0.3 0.4 0.5




                                                                        σi
           pi




                                  0   100   200       300   400   500                           0.2           0.3        0.4         0.5
                                                  n                                                                 pi
                0.4 0.6 0.8 1.0




                                                                             −1 0 1 2 3
          σi




                                                                        µi




                                  0   100   200       300   400   500                               0.4         0.6            0.8         1.0
                                                  n                                                                 σi
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Label switching paradox




      We should observe the exchangeability of the components [label
      switching] to conclude about convergence of the Gibbs sampler.
      If we observe it, then we do not know how to estimate the
      parameters.
      If we do not, then we are uncertain about the convergence!!!
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Compensation for label switching
                                          (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                                   1
                       πk (θk |x) = πk (σ(θk )|x) =                           πk (σ(θk )|x)
                                                                   k!
                                                                        σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                               T
                              ∗              1                            ∗         (t)
                         πk (θk |x) =                              πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                     [Berkhof, Mechelen, & Gelman, 2003]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Compensation for label switching
                                          (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                                   1
                       πk (θk |x) = πk (σ(θk )|x) =                           πk (σ(θk )|x)
                                                                   k!
                                                                        σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                               T
                              ∗              1                            ∗         (t)
                         πk (θk |x) =                              πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                     [Berkhof, Mechelen, & Gelman, 2003]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Galaxy dataset

      n = 82 galaxies as a mixture of k normal distributions with both
      mean and variance unknown.
                                                          [Roeder, 1992]
                                                                          Average density
                                                          0.8
                                                          0.6
                                     Relative Frequency

                                                          0.4
                                                          0.2
                                                          0.0




                                                                −2   −1      0              1   2   3

                                                                                 data
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                        log(mk (x)) = −105.1396
                                            ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                        log(mk (x)) = −103.3479
                                            ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

         k                2            3             4             5             6         7         8
         mk (x)        -115.68      -103.35       -102.66       -101.93       -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                        log(mk (x)) = −105.1396
                                            ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                        log(mk (x)) = −103.3479
                                            ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

         k                2            3             4             5             6         7         8
         mk (x)        -115.68      -103.35       -102.66       -101.93       -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Galaxy dataset (k)
                                              ∗
      Using only the original estimate, with θk as the MAP estimator,
                                        log(mk (x)) = −105.1396
                                            ˆ
      for k = 3 (based on 103 simulations), while introducing the
      permutations leads to
                                        log(mk (x)) = −103.3479
                                            ˆ
      Note that
                                 −105.1396 + log(3!) = −103.3479

         k                2            3             4             5             6         7         8
         mk (x)        -115.68      -103.35       -102.66       -101.93       -102.88   -105.48   -108.44

      Estimations of the marginal likelihoods by the symmetrised Chib’s
      approximation (based on 105 Gibbs iterations and, for k > 5, 100
      permutations selected at random in Sk ).
                                [Lee, Marin, Mengersen & Robert, 2008]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                      1
                                      π (θ|x) =
                                      ˆ                         π(θ|x, z (t) )
                                                      T     t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Case of the probit model

      For the completion by z,
                                                      1
                                      π (θ|x) =
                                      ˆ                         π(θ|x, z (t) )
                                                      T     t

      is a simple average of normal densities
      R Bridge sampling code
      gibbs1=gibbsprobit(Niter,y,X1)
      gibbs2=gibbsprobit(Niter,y,X2)
      bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3),
              sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/
            mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2),
              sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     Chib’s solution



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above Chib’s and importance samplers

                                                                             q
              0.0255




                                        q
                                        q                                    q
                                        q
              0.0250
              0.0245
              0.0240




                                        q
                                        q



                                Chib's method                    importance sampling
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



The Savage–Dickey ratio


      Special representation of the Bayes factor used for simulation

      Original version (Dickey, AoMS, 1971)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Savage’s density ratio theorem
      Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
      parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that

                                              π1 (ψ|θ0 ) = π0 (ψ)

      then
                                                            π1 (θ0 |x)
                                              B01 =                    ,
                                                             π1 (θ0 )
      with the obvious notations

                 π1 (θ) =          π1 (θ, ψ)dψ ,            π1 (θ|x) =            π1 (θ, ψ|x)dψ ,

                                        [Dickey, 1971; Verdinelli & Wasserman, 1995]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                                      and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                                       [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                                       [version dependent]
                 π1 (θ0 )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                                      and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                                       [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                                       [version dependent]
                 π1 (θ0 )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                 π1 (θ0 |x)             π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π1 (θ0 )                    m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                 π1 (θ0 |x)             π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π1 (θ0 )                    m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                E
                                        π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                          π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                          ˜

      define the associated posterior

                           π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                           ˜                                    ˜

      and impose
                                 π1 (θ0 |x)
                                 ˜                      π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π0 (θ0 )                    m1 (x)
                                                              ˜
      to hold.
      Then
                                                     π1 (θ0 |x) m1 (x)
                                                     ˜          ˜
                                         B01 =
                                                      π1 (θ0 ) m1 (x)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Computational implementation
      Starting from the (new) prior

                                          π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                          ˜

      define the associated posterior

                           π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                           ˜                                    ˜

      and impose
                                 π1 (θ0 |x)
                                 ˜                      π0 (ψ)f (x|θ0 , ψ) dψ
                                            =
                                  π0 (θ0 )                    m1 (x)
                                                              ˜
      to hold.
      Then
                                                     π1 (θ0 |x) m1 (x)
                                                     ˜          ˜
                                         B01 =
                                                      π1 (θ0 ) m1 (x)
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



First ratio


      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                            1
                                                       π1 (θ0 |x, ψ (t) )
                                                       ˜
                                            T      t

      converges to π1 (θ0 |x) (if the right version is used in θ0 ).
                   ˜

                                                            π1 (θ0 )f (x|θ0 , ψ)
                                 π1 (θ0 |x, ψ) =
                                 ˜
                                                            π1 (θ)f (x|θ, ψ) dθ
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                               T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                    f (x|θ, ψ) =             ˜
                                                             f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                   ˜
                                                               f (x, z|θ0 , ψ)
                                             =                                     .
                                π1 (θ0 )                    ˜
                                                            f (x, z|θ, ψ)π1 (θ) dθ
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                               T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                    f (x|θ, ψ) =             ˜
                                                             f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                   ˜
                                                               f (x, z|θ0 , ψ)
                                             =                                     .
                                π1 (θ0 )                    ˜
                                                            f (x, z|θ, ψ)π1 (θ) dθ
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                    T
                                                        π1 (ψ (t) |θ(t) )
                                           T
                                                  t=1
                                                          π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                               = Eπ1 (θ,ψ|x)
                                                  ˜
                                                                      =
                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                                    T
                                                        π1 (ψ (t) |θ(t) )
                                           T
                                                  t=1
                                                          π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯    ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate
      (θ       ¯

                                           T
                                     1             ¯                ¯ ¯
                                               π0 (ψ (t) )      π1 (ψ (t) |θ(t) )
                                     T
                                         t=1

      Resulting unbiased estimate:

                                                        (t) , ψ (t) )       T            ¯
                               1      t π1 (θ0 |x, z
                                        ˜                               1           π0 (ψ (t) )
                   B01 =
                               T               π1 (θ0 )                 T
                                                                            t=1
                                                                                  π1 (ψ ¯
                                                                                      ¯ (t) |θ (t) )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Bridge revival (2)
      Alternative identity

                        π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                        ˜
      Eπ1 (θ,ψ|x)                              = Eπ1 (θ,ψ|x)          =
                         π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
                                                 ¯    ¯
      suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . ,
       ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate
      (θ       ¯

                                           T
                                     1             ¯                ¯ ¯
                                               π0 (ψ (t) )      π1 (ψ (t) |θ(t) )
                                     T
                                         t=1

      Resulting unbiased estimate:

                                                        (t) , ψ (t) )       T            ¯
                               1      t π1 (θ0 |x, z
                                        ˜                               1           π0 (ψ (t) )
                   B01 =
                               T               π1 (θ0 )                 T
                                                                            t=1
                                                                                  π1 (ψ ¯
                                                                                      ¯ (t) |θ (t) )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Difference with Verdinelli–Wasserman representation


      The above leads to the representation

                                          π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
                                          ˜
                               B01 =                E
                                           π1 (θ0 )             π1 (ψ|θ)

      which shows how our approach differs from Verdinelli and
      Wasserman’s
                                       π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                E
                                        π1 (θ0 )                 π1 (ψ|θ0 )
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Difference with Verdinelli–Wasserman approximation
      In terms of implementation,

                                                              (t) , ψ (t) )       T            ¯
                    MR         1             t π1 (θ0 |x, z
                                               ˜                              1           π0 (ψ (t) )
              B01        (x) =                                                              ¯ ¯
                               T                   π1 (θ0 )                   T
                                                                                  t=1
                                                                                        π1 (ψ (t) |θ(t) )

      formaly resembles
                                         T                                         T          ˜
                   VW         1                π1 (θ0 |x, z (t) , ψ (t) ) 1              π0 (ψ (t) )
             B01        (x) =                                                                           .
                              T                       π1 (θ0 )            T                 ˜
                                                                                        π1 (ψ (t) |θ0 )
                                        t=1                                       t=1

      But the simulated sequences differ: first average involves
      simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second
                        ˜
      average relies on simulations from π1 (θ, ψ, z|x) and from
      π1 (ψ, z|x, θ0 ),
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Difference with Verdinelli–Wasserman approximation
      In terms of implementation,

                                                              (t) , ψ (t) )       T            ¯
                    MR         1             t π1 (θ0 |x, z
                                               ˜                              1           π0 (ψ (t) )
              B01        (x) =                                                              ¯ ¯
                               T                   π1 (θ0 )                   T
                                                                                  t=1
                                                                                        π1 (ψ (t) |θ(t) )

      formaly resembles
                                         T                                         T          ˜
                   VW         1                π1 (θ0 |x, z (t) , ψ (t) ) 1              π0 (ψ (t) )
             B01        (x) =                                                                           .
                              T                       π1 (θ0 )            T                 ˜
                                                                                        π1 (ψ (t) |θ0 )
                                        t=1                                       t=1

      But the simulated sequences differ: first average involves
      simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second
                        ˜
      average relies on simulations from π1 (θ, ψ, z|x) and from
      π1 (ψ, z|x, θ0 ),
Importance sampling methods for Bayesian discrimination between embedded models
  Importance sampling model comparison solutions compared
     The Savage–Dickey ratio



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above importance, Chib’s, Savage–Dickey’s and bridge
      samplers

                                                                                    q
             3.4
             3.2
             3.0
             2.8




                                                                                    q



                               IS              Chib         Savage−Dickey         Bridge

More Related Content

What's hot

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Christian Robert
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choiceChristian Robert
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingJulyan Arbel
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 

What's hot (20)

ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Boston talk
Boston talkBoston talk
Boston talk
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
Computational tools for Bayesian model choice
Computational tools for Bayesian model choiceComputational tools for Bayesian model choice
Computational tools for Bayesian model choice
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 

Similar to Bayesian discrimination IS methods

An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testingChristian Robert
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Christian Robert
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihoodChristian Robert
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and datahaharrington
 
Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Christian Robert
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methodsChristian Robert
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010Christian Robert
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information TheoryKhalidSaghiri2
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325Yi-Hsin Liu
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325Yi-Hsin Liu
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Tro07 sparse-solutions-talk
Tro07 sparse-solutions-talkTro07 sparse-solutions-talk
Tro07 sparse-solutions-talkmpbchina
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerChristian Robert
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessAlessandro Panella
 

Similar to Bayesian discrimination IS methods (20)

MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
 
An overview of Bayesian testing
An overview of Bayesian testingAn overview of Bayesian testing
An overview of Bayesian testing
 
Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)Bayesian model choice (and some alternatives)
Bayesian model choice (and some alternatives)
 
ABC in Varanasi
ABC in VaranasiABC in Varanasi
ABC in Varanasi
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
ABC and empirical likelihood
ABC and empirical likelihoodABC and empirical likelihood
ABC and empirical likelihood
 
JSM 2011 round table
JSM 2011 round tableJSM 2011 round table
JSM 2011 round table
 
JSM 2011 round table
JSM 2011 round tableJSM 2011 round table
JSM 2011 round table
 
Non-parametric analysis of models and data
Non-parametric analysis of models and dataNon-parametric analysis of models and data
Non-parametric analysis of models and data
 
Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?Is ABC a new empirical Bayes approach?
Is ABC a new empirical Bayes approach?
 
Course on Bayesian computational methods
Course on Bayesian computational methodsCourse on Bayesian computational methods
Course on Bayesian computational methods
 
San Antonio short course, March 2010
San Antonio short course, March 2010San Antonio short course, March 2010
San Antonio short course, March 2010
 
Elementary Probability and Information Theory
Elementary Probability and Information TheoryElementary Probability and Information Theory
Elementary Probability and Information Theory
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Tro07 sparse-solutions-talk
Tro07 sparse-solutions-talkTro07 sparse-solutions-talk
Tro07 sparse-solutions-talk
 
Rss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven WalkerRss talk for Bayes 250 by Steven Walker
Rss talk for Bayes 250 by Steven Walker
 
Bayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet ProcessBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process
 

More from Christian Robert

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceChristian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinChristian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 

More from Christian Robert (20)

Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 

Recently uploaded

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 

Recently uploaded (20)

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 

Bayesian discrimination IS methods

  • 1. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling methods for Bayesian discrimination between embedded models Christian P. Robert Universit´ Paris Dauphine & CREST-INSEE e http://www.ceremade.dauphine.fr/~xian 45th Scientific Meeting of the Italian Statistical Society Padova, 16 giugno 2010 Joint work with J.-M. Marin
  • 2. Importance sampling methods for Bayesian discrimination between embedded models Outline 1 Bayesian model choice 2 Importance sampling model comparison solutions compared
  • 3. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Model choice as model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I Subsitute hypotheses with models
  • 4. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Model choice as model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I Subsitute hypotheses with models
  • 5. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 6. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 7. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 8. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Model choice Bayesian model choice Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj j Θj take largest π(Mi |x) to determine “best” model,
  • 9. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Bayes factor Bayes factor Definition (Bayes factors) For comparing model M0 with θ ∈ Θ0 vs. M1 with θ ∈ Θ1 , under priors π0 (θ) and π1 (θ), central quantity f0 (x|θ0 )π0 (θ0 )dθ0 π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θ1 |x) π(Θ1 ) f1 (x|θ)π1 (θ1 )dθ1 Θ1 [Jeffreys, 1939]
  • 10. Importance sampling methods for Bayesian discrimination between embedded models Bayesian model choice Evidence Evidence Problems using a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , Θk aka the marginal likelihood. [Jeffreys, 1939]
  • 11. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared A comparison of importance sampling solutions 1 Bayesian model choice 2 Importance sampling model comparison solutions compared Regular importance Bridge sampling Mixtures to bridge Harmonic means Chib’s solution The Savage–Dickey ratio [Marin & Robert, 2010]
  • 12. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions 0 and 1 and n−1 0 n0 i i i=1 f0 (x|θ0 )π0 (θ0 )/ i 0 (θ0 ) B01 = n−1 1 n1 i i i=1 f1 (x|θ1 )π1 (θ1 )/ i 1 (θ1 )
  • 13. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Diabetes in Pima Indian women Example (R benchmark) “A population of women who were at least 21 years old, of Pima Indian heritage and living near Phoenix (AZ), was tested for diabetes according to WHO criteria. The data were collected by the US National Institute of Diabetes and Digestive and Kidney Diseases. 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes
  • 14. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 15. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Probit modelling on Pima Indian women Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1
  • 16. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link=probit))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 17. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Importance sampling for the Pima Indian dataset Use of the importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ) R Importance sampling code model1=summary(glm(y~-1+X1,family=binomial(link=probit))) is1=rmvnorm(Niter,mean=model1$coeff[,1],sigma=2*model1$cov.unscaled) is2=rmvnorm(Niter,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled) bfis=mean(exp(probitlpost(is1,y,X1)-dmvlnorm(is1,mean=model1$coeff[,1], sigma=2*model1$cov.unscaled))) / mean(exp(probitlpost(is2,y,X2)- dmvlnorm(is2,mean=model2$coeff[,1],sigma=2*model2$cov.unscaled)))
  • 18. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 19. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Bridge sampling Special case: If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space (Θ1 = Θ2 ), then n 1 π1 (θi |x) ˜ B12 ≈ θi ∼ π2 (θ|x) n π2 (θi |x) ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 20. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling (Further) bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 21. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later! Drawback: Dependence on the unknown normalising constants solved iteratively
  • 22. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is n1 + n2 α = n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1 Back later! Drawback: Dependence on the unknown normalising constants solved iteratively
  • 23. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 24. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 25. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+ dnorm(pseudo,expecta[3],cova[3,3])))
  • 26. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate R bridge sampling code cova=model2$cov.unscaled expecta=model2$coeff[,1] covw=cova[3,3]-t(cova[1:2,3])%*%ginv(cova[1:2,1:2])%*%cova[1:2,3] probit1=hmprobit(Niter,y,X1) probit2=hmprobit(Niter,y,X2) pseudo=rnorm(Niter,meanw(probit1),sqrt(covw)) probit1p=cbind(probit1,pseudo) bfbs=mean(exp(probitlpost(probit2[,1:2],y,X1)+dnorm(probit2[,3],meanw(probit2[,1:2]), sqrt(covw),log=T))/ (dmvnorm(probit2,expecta,cova)+dnorm(probit2[,3],expecta[3], cova[3,3])))/ mean(exp(probitlpost(probit1p,y,X2))/(dmvnorm(probit1p,expecta,cova)+ dnorm(pseudo,expecta[3],cova[3,3])))
  • 27. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Bridge sampling Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 × 20, 000 simulations from the prior (MC), the above bridge sampler and the above importance sampler 5 4 q q 3 q 2 MC Bridge IS
  • 28. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight [Chopin & Robert, 2010]
  • 29. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Approximating Zk using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕk (θk ) ∝ ω1 πk (θk )Lk (θk ) + ϕ(θk ) , where ϕ(·) is arbitrary (but normalised) Note: ω1 is not a probability weight [Chopin & Robert, 2010]
  • 30. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 31. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 32. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t 1 Take δ (t) = 1 with probability (t−1) (t−1) (t−1) (t−1) (t−1) ω1 πk (θk )Lk (θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) and δ (t) = 2 otherwise; (t) (t−1) 2 If δ (t) = 1, generate θk ∼ MCMC(θk , θk ) where MCMC(θk , θk ) denotes an arbitrary MCMC kernel associated with the posterior πk (θk |x) ∝ πk (θk )Lk (θk ); (t) 3 If δ (t) = 2, generate θk ∼ ϕ(θk ) independently
  • 33. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 34. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Mixtures to bridge Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ 1 ξ= (t) ω1 πk (θk )Lk (θk ) (t) (t) (t) ω1 πk (θk )Lk (θk ) + ϕ(θk ) , (t) T t=1 converges to ω1 Zk /{ω1 Zk + 1} 3k ˆ ˆ ˆ Deduce Zˆ from ω1 Z3k /{ω1 Z3k + 1} = ξ ie T (t) (t) (t) (t) (t) t=1 ω1 πk (θk )Lk (θk ) ω1 π(θk )Lk (θk ) + ϕ(θk ) ˆ Z3k = T (t) (t) (t) (t) t=1 ϕ(θk ) ω1 πk (θk )Lk (θk ) + ϕ(θk ) [Bridge sampler]
  • 35. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 36. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 37. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 38. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 39. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels for ϕ
  • 40. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels for ϕ
  • 41. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1k with a standard importance sampling approximation T (t) (t) 1 πk (θk )Lk (θk ) Z2k = (t) T ϕ(θk ) t=1 (t) where the θk ’s are generated from the density ϕ(·) (with fatter tails like t’s)
  • 42. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ T t∈HPD
  • 43. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.102 3.104 3.106 3.108 3.110 3.112 3.114 3.116 q q Harmonic mean Importance sampling
  • 44. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 45. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 46. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables Skip difficulties...
  • 47. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 48. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Label switching A mixture model [special case of missing variable model] is invariant under permutations of the indices of the components. E.g., mixtures 0.3N (0, 1) + 0.7N (2.3, 1) and 0.7N (2.3, 1) + 0.3N (0, 1) are exactly the same! c The component parameters θi are not identifiable marginally since they are exchangeable
  • 49. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 50. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Connected difficulties 1 Number of modes of the likelihood of order O(k!): c Maximization and even [MCMC] exploration of the posterior surface harder 2 Under exchangeable priors on (θ, p) [prior invariant under permutation of the indices], all posterior marginals are identical: c Posterior expectation of θ1 equal to posterior expectation of θ2
  • 51. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution License Since Gibbs output does not produce exchangeability, the Gibbs sampler has not explored the whole parameter space: it lacks energy to switch simultaneously enough component allocations at once 0.2 0.3 0.4 0.5 −1 0 1 2 3 µi 0 100 200 n 300 400 500 pi −1 0 µ 1 i 2 3 0.4 0.6 0.8 1.0 0.2 0.3 0.4 0.5 σi pi 0 100 200 300 400 500 0.2 0.3 0.4 0.5 n pi 0.4 0.6 0.8 1.0 −1 0 1 2 3 σi µi 0 100 200 300 400 500 0.4 0.6 0.8 1.0 n σi
  • 52. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 53. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 54. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Label switching paradox We should observe the exchangeability of the components [label switching] to conclude about convergence of the Gibbs sampler. If we observe it, then we do not know how to estimate the parameters. If we do not, then we are uncertain about the convergence!!!
  • 55. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 56. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 57. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Galaxy dataset n = 82 galaxies as a mixture of k normal distributions with both mean and variance unknown. [Roeder, 1992] Average density 0.8 0.6 Relative Frequency 0.4 0.2 0.0 −2 −1 0 1 2 3 data
  • 58. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 59. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 60. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Galaxy dataset (k) ∗ Using only the original estimate, with θk as the MAP estimator, log(mk (x)) = −105.1396 ˆ for k = 3 (based on 103 simulations), while introducing the permutations leads to log(mk (x)) = −103.3479 ˆ Note that −105.1396 + log(3!) = −103.3479 k 2 3 4 5 6 7 8 mk (x) -115.68 -103.35 -102.66 -101.93 -102.88 -105.48 -108.44 Estimations of the marginal likelihoods by the symmetrised Chib’s approximation (based on 105 Gibbs iterations and, for k > 5, 100 permutations selected at random in Sk ). [Lee, Marin, Mengersen & Robert, 2008]
  • 61. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 62. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities R Bridge sampling code gibbs1=gibbsprobit(Niter,y,X1) gibbs2=gibbsprobit(Niter,y,X2) bfchi=mean(exp(dmvlnorm(t(t(gibbs2$mu)-model2$coeff[,1]),mean=rep(0,3), sigma=gibbs2$Sigma2)-probitlpost(model2$coeff[,1],y,X2)))/ mean(exp(dmvlnorm(t(t(gibbs1$mu)-model1$coeff[,1]),mean=rep(0,2), sigma=gibbs1$Sigma2)-probitlpost(model1$coeff[,1],y,X1)))
  • 63. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared Chib’s solution Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above Chib’s and importance samplers q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 64. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio The Savage–Dickey ratio Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 65. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Savage’s density ratio theorem Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 66. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 67. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 68. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 69. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 70. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 71. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 72. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 73. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Computational implementation Starting from the (new) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ to hold. Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 74. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) (if the right version is used in θ0 ). ˜ π1 (θ0 )f (x|θ0 , ψ) π1 (θ0 |x, ψ) = ˜ π1 (θ)f (x|θ, ψ) dθ
  • 75. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 76. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 77. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 78. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 79. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate (θ ¯ T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 80. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(1) , ψ (1) , z (1) ), . . . , ¯(T ) , ψ (T ) , z (T ) ) ∼ π1 (θ, ψ|x) and the ratio estimate (θ ¯ T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = T π1 (θ0 ) T t=1 π1 (ψ ¯ ¯ (t) |θ (t) )
  • 81. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Difference with Verdinelli–Wasserman representation The above leads to the representation π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ) ˜ B01 = E π1 (θ0 ) π1 (ψ|θ) which shows how our approach differs from Verdinelli and Wasserman’s π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 )
  • 82. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Difference with Verdinelli–Wasserman approximation In terms of implementation, (t) , ψ (t) ) T ¯ MR 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 (x) = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) ) formaly resembles T T ˜ VW 1 π1 (θ0 |x, z (t) , ψ (t) ) 1 π0 (ψ (t) ) B01 (x) = . T π1 (θ0 ) T ˜ π1 (ψ (t) |θ0 ) t=1 t=1 But the simulated sequences differ: first average involves simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second ˜ average relies on simulations from π1 (θ, ψ, z|x) and from π1 (ψ, z|x, θ0 ),
  • 83. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Difference with Verdinelli–Wasserman approximation In terms of implementation, (t) , ψ (t) ) T ¯ MR 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 (x) = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) ) formaly resembles T T ˜ VW 1 π1 (θ0 |x, z (t) , ψ (t) ) 1 π0 (ψ (t) ) B01 (x) = . T π1 (θ0 ) T ˜ π1 (ψ (t) |θ0 ) t=1 t=1 But the simulated sequences differ: first average involves simulations from π1 (θ, ψ, z|x) and from π1 (θ, ψ, z|x), while second ˜ average relies on simulations from π1 (θ, ψ, z|x) and from π1 (ψ, z|x, θ0 ),
  • 84. Importance sampling methods for Bayesian discrimination between embedded models Importance sampling model comparison solutions compared The Savage–Dickey ratio Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge