SlideShare a Scribd company logo
On some computational methods for Bayesian model choice




             On some computational methods for Bayesian
                          model choice

                                            Christian P. Robert

                               CREST-INSEE and Universit´ Paris Dauphine
                                                        e
                               http://www.ceremade.dauphine.fr/~xian


            Joint work with Nicolas Chopin and Jean-Michel Marin
On some computational methods for Bayesian model choice




Outline


           Introduction
      1


           Importance sampling solutions
      2


           Cross-model solutions
      3


           Nested sampling
      4


           Mixture example
      5
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Bayes factor

      Definition (Bayes factors)
      For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior

                                      π(Θ0 )π0 (θ) + π(Θc )π1 (θ) ,
                                                        0

      central quantity

                                                                   f (x|θ)π0 (θ)dθ
                               π(Θ0 |x)            π(Θ0 )     Θ0
                      B01    =                            =
                               π(Θc |x)            π(Θc )
                                  0                   0            f (x|θ)π1 (θ)dθ
                                                              Θc
                                                               0


                                                                            [Jeffreys, 1939]
On some computational methods for Bayesian model choice
  Introduction
     Bayes factor



Self-contained concept

      Outside decision-theoretic environment:
                 eliminates impact of π(Θ0 ) but depends on the choice of
                 (π0 , π1 )
                 Bayesian/marginal equivalent to the likelihood ratio
                 Jeffreys’ scale of evidence:
                     if   log10 (B10 )   between 0 and 0.5, evidence against H0 weak,
                                  π

                     if   log10 (B10 )   0.5 and 1, evidence substantial,
                                  π

                     if   log10 (B10 )   1 and 2, evidence strong and
                                  π

                     if   log10 (B10 )   above 2, evidence decisive
                                  π

                 Requires the computation of the marginal/evidence under
                 both hypotheses/models
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Model choice and model comparison



      Choice between models
      Several models available for the same observation

                                    Mi : x ∼ fi (x|θi ),   i∈I

      where I can be finite or infinite
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian resolution
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                                               fj (x′ |θj )πj (θj |x)dθj
                                        π(Mj |x)
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian resolution
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                                               fj (x′ |θj )πj (θj |x)dθj
                                        π(Mj |x)
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian resolution
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                                               fj (x′ |θj )πj (θj |x)dθj
                                        π(Mj |x)
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Model choice



Bayesian resolution
      Probabilise the entire model/parameter space
          allocate probabilities pi to all models Mi
          define priors πi (θi ) for each parameter space Θi
          compute

                                                        pi          fi (x|θi )πi (θi )dθi
                                                               Θi
                               π(Mi |x) =
                                                          pj         fj (x|θj )πj (θj )dθj
                                                                Θj
                                                    j


                 take largest π(Mi |x) to determine “best” model,
                 or use averaged predictive

                                                               fj (x′ |θj )πj (θj |x)dθj
                                        π(Mj |x)
                                                          Θj
                                    j
On some computational methods for Bayesian model choice
  Introduction
     Evidence



Evidence




      All these problems end up with a similar quantity, the evidence

                                           Z=             π(θ)L(θ) dθ,

      aka the marginal likelihood.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Bridge sampling


      If
                                         π1 (θ1 |x) ∝ π1 (θ1 |x)
                                                      ˜
                                         π2 (θ2 |x) ∝ π2 (θ2 |x)
                                                      ˜
      live on the same space, then
                                            n
                                                π1 (θi |x)
                                    1           ˜
                                  ≈                          θi ∼ π2 (θ|x)
                           B12
                                                π2 (θi |x)
                                    n           ˜
                                          i=1

                           [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



(Further) bridge sampling

      In addition

                                       π2 (θ|x)α(θ)π1 (θ|x)dθ
                                       ˜
                                                                     ∀ α(·)
                  B12 =
                                       π1 (θ|x)α(θ)π2 (θ|x)dθ
                                       ˜

                                        n1
                                  1
                                              π2 (θ1i |x)α(θ1i )
                                              ˜
                                  n1
                                        i=1
                           ≈                                       θji ∼ πj (θ|x)
                                         n2
                                  1
                                              π1 (θ2i |x)α(θ2i )
                                              ˜
                                  n2
                                        i=1
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Regular importance



Optimal bridge sampling

      The optimal choice of auxiliary function α
                                                       n1 + n2
                                    α⋆ =
                                              n1 π1 (θ|x) + n2 π2 (θ|x)
                                                 ˜             ˜

      leading to
                                             n1
                                                            π2 (θ1i |x)
                                       1                     ˜
                                                  n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                       n1            ˜                ˜
                                            i=1
                           B12 ≈             n2
                                                            π1 (θ2i |x)
                                       1                     ˜
                                                  n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                       n2            ˜                ˜
                                            i=1

                                                                                    Back later!
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z from a posterior sample


      Use of the identity

                            ϕ(θ)                            ϕ(θ) π(θ)L(θ)      1
                  Eπ               x=                                     dθ =
                          π(θ)L(θ)                        π(θ)L(θ)  Z          Z

      no matter what the proposal ϕ(θ) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of MCMC output
                                                                                   RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z from a posterior sample


      Use of the identity

                            ϕ(θ)                            ϕ(θ) π(θ)L(θ)      1
                  Eπ               x=                                     dθ =
                          π(θ)L(θ)                        π(θ)L(θ)  Z          Z

      no matter what the proposal ϕ(θ) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of MCMC output
                                                                                   RB-RJ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      π(θ)L(θ) for the approximation
                                                          T
                                                                   ϕ(θ(t) )
                                                   1
                                   Z1 = 1
                                                                π(θ(t) )L(θ(t) )
                                                   T
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      π(θ)L(θ) for the approximation
                                                          T
                                                                   ϕ(θ(t) )
                                                   1
                                   Z1 = 1
                                                                π(θ(t) )L(θ(t) )
                                                   T
                                                          t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Comparison with regular importance sampling (cont’d)



      Compare Z1 with a standard importance sampling approximation
                                                      T
                                                          π(θ(t) )L(θ(t) )
                                                1
                                      Z2 =
                                                             ϕ(θ(t) )
                                                T
                                                    t=1

      where the θ(t) ’s are generated from the density ϕ(θ) (with fatter
      tails like t’s)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation



                                                                   Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                     ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) ,
                                     ˜

      where ϕ(θ) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation



                                                                   Bridge sampling redux

      Design a specific mixture for simulation [importance sampling]
      purposes, with density

                                     ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) ,
                                     ˜

      where ϕ(θ) is arbitrary (but normalised)
      Note: ω1 is not a probability weight
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1




                ω1 π(θ(t−1) )L(θ(t−1) )                   ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )

              and δ (t) = 2 otherwise;
              If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
          2

              MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
              with the posterior π(θ|x) ∝ π(θ)L(θ);
              If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1




                ω1 π(θ(t−1) )L(θ(t−1) )                   ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )

              and δ (t) = 2 otherwise;
              If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
          2

              MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
              with the posterior π(θ|x) ∝ π(θ)L(θ);
              If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Approximating Z using a mixture representation (cont’d)

      Corresponding MCMC (=Gibbs) sampler
      At iteration t
              Take δ (t) = 1 with probability
          1




                ω1 π(θ(t−1) )L(θ(t−1) )                   ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) )

              and δ (t) = 2 otherwise;
              If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where
          2

              MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated
              with the posterior π(θ|x) ∝ π(θ)L(θ);
              If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently
          3
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                            T
              ˆ1                  ω1 π(θ(t) )L(θ(t) )         ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) ,
              ξ=
                 T
                           t=1

      converges to ω1 Z/{ω1 Z + 1}
                                          ˆ
              ˆ          ˆ      ˆ
      Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie

                            T         (t)  (t)                 ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
                            t=1 ω1 π(θ )L(θ )
              ˆ
              Z3 =
                                    T      (t)            ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
                                    t=1 ϕ(θ )


                                                                                [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Harmonic means



Evidence approximation by mixtures
      Rao-Blackwellised estimate
                            T
              ˆ1                  ω1 π(θ(t) )L(θ(t) )         ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) ,
              ξ=
                 T
                           t=1

      converges to ω1 Z/{ω1 Z + 1}
                                          ˆ
              ˆ          ˆ      ˆ
      Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie

                            T         (t)  (t)                 ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
                            t=1 ω1 π(θ )L(θ )
              ˆ
              Z3 =
                                    T      (t)            ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) )
                                    t=1 ϕ(θ )


                                                                                [Bridge sampler]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                     fk (x|θk ) πk (θk )
                           mk (x) =                      ,
                                         πk (θk |x)
      Use of an approximation to the posterior
                                                              ∗       ∗
                                                      fk (x|θk ) πk (θk )
                                      mk (x) =
                                      ˆ                                   .
                                                           ˆ∗
                                                          πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                     fk (x|θk ) πk (θk )
                           mk (x) =                      ,
                                         πk (θk |x)
      Use of an approximation to the posterior
                                                              ∗       ∗
                                                      fk (x|θk ) πk (θk )
                                      mk (x) =
                                      ˆ                                   .
                                                           ˆ∗
                                                          πk (θk |x)
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                                    1                 (t)
                        ˆ∗                     ∗
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                                          t=1
                        (t)
      where the        zk ’s      are the latent variables simulated by a Gibbs
      sampler.
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                                             1                                 (t)
                         ˜∗                                           ∗
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Importance sampling solutions
     Chib’s solution



Compensation for label switching
                                         (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                              1
                       πk (θk |x) = πk (σ(θk )|x) =                      πk (σ(θk )|x)
                                                              k!
                                                                   σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                          T
                                             1                                 (t)
                         ˜∗                                           ∗
                         πk (θk |x) =                          πk (σ(θk )|x, zk ) .
                                            T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Reversible jump


      Idea: Set up a proper measure–theoretic framework for designing
      moves between models Mk
                                                         [Green, 1995]
      Create a reversible kernel K on H = k {k} × Θk such that

                                  K(x, dy)π(x)dx =                K(y, dx)π(y)dy
                          A   B                           B   A

      for the invariant density π [x is of the form (k, θ(k) )]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves
      For a move between two models, M1 and M2 , the Markov chain
      being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ)
      the corresponding kernels, under the detailed balance condition

                          π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) ,

      and take, wlog, dim(M2 ) > dim(M1 ).
      Proposal expressed as

                                           θ2 = Ψ1→2 (θ1 , v1→2 )

      where v1→2 is a random variable of dimension
      dim(M2 ) − dim(M1 ), generated as

                                          v1→2 ∼ ϕ1→2 (v1→2 ) .
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability ̟1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) ̟2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                            .
                                   π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability ̟1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) ̟2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                            .
                                   π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Reversible jump



Local moves (2)

      In this case, q1→2 (θ1 , dθ2 ) has density
                                                                          −1
                                                     ∂Ψ1→2 (θ1 , v1→2 )
                               ϕ1→2 (v1→2 )                                    ,
                                                       ∂(θ1 , v1→2 )

      by the Jacobian rule.
                                                                                   Reverse importance link

      If probability ̟1→2 of choosing move to M2 while in M1 ,
      acceptance probability reduces to

                                         π(M2 , θ2 ) ̟2→1        ∂Ψ1→2 (θ1 , v1→2 )
      α(θ1 , v1→2 ) = 1∧                                                            .
                                   π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 )   ∂(θ1 , v1→2 )

                                           c Difficult calibration
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



Alternative
      Saturation of the parameter space H = k {k} × Θk by creating
          a model index M
          pseudo-priors πj (θj |M = k) for j = k
                                                 [Carlin & Chib, 1995]

      Validation by

                      π(M = k|y) =                P (M = k|y, θ)π(θ|y)dθ = Zk

      where the (marginal) posterior is
                                    D
                   π(θ|y) =              π(θ, M = k|y)
                                   k=1
                                    D
                                         ̺k mk (y) πk (θk |y)         πj (θj |M = k) .
                               =
                                   k=1                          j=k
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



Alternative
      Saturation of the parameter space H = k {k} × Θk by creating
          a model index M
          pseudo-priors πj (θj |M = k) for j = k
                                                 [Carlin & Chib, 1995]

      Validation by

                      π(M = k|y) =                P (M = k|y, θ)π(θ|y)dθ = Zk

      where the (marginal) posterior is
                                    D
                   π(θ|y) =              π(θ, M = k|y)
                                   k=1
                                    D
                                         ̺k mk (y) πk (θk |y)         πj (θj |M = k) .
                               =
                                   k=1                          j=k
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



MCMC implementation
                                                    (t)      (t)
      Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
      distribution π(θ, M = k|y) by
              Pick M (t) = k with probability P (θ(t−1) , M = k|y)
          1

                               (t−1)
                                         from the posterior πk (θk |y) [or MCMC step]
              Generate θk
          2

                              (t−1)
                                         (j = k) from the pseudo-prior πj (θj |M = k)
              Generate       θj
          3


      Approximate π(M = k|y) = Zk by
                                     T
                                                   (t)      (t)              (t)
                 ̺k (y) ∝ ̺k                                             πj (θj |M = k)
                 ˇ                        fk (y|θk ) πk (θk )
                                    t=1                            j=k
                                    D
                                                      (t)    (t)              (t)
                                                                          πj (θj |M = ℓ)
                                         ̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
                                   ℓ=1                              j=ℓ
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



MCMC implementation
                                                    (t)      (t)
      Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
      distribution π(θ, M = k|y) by
              Pick M (t) = k with probability P (θ(t−1) , M = k|y)
          1

                               (t−1)
                                         from the posterior πk (θk |y) [or MCMC step]
              Generate θk
          2

                              (t−1)
                                         (j = k) from the pseudo-prior πj (θj |M = k)
              Generate       θj
          3


      Approximate π(M = k|y) = Zk by
                                     T
                                                   (t)      (t)              (t)
                 ̺k (y) ∝ ̺k                                             πj (θj |M = k)
                 ˇ                        fk (y|θk ) πk (θk )
                                    t=1                            j=k
                                    D
                                                      (t)    (t)              (t)
                                                                          πj (θj |M = ℓ)
                                         ̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
                                   ℓ=1                              j=ℓ
On some computational methods for Bayesian model choice
  Cross-model solutions
     Saturation schemes



MCMC implementation
                                                    (t)      (t)
      Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary
      distribution π(θ, M = k|y) by
              Pick M (t) = k with probability P (θ(t−1) , M = k|y)
          1

                               (t−1)
                                         from the posterior πk (θk |y) [or MCMC step]
              Generate θk
          2

                              (t−1)
                                         (j = k) from the pseudo-prior πj (θj |M = k)
              Generate       θj
          3


      Approximate π(M = k|y) = Zk by
                                     T
                                                   (t)      (t)              (t)
                 ̺k (y) ∝ ̺k                                             πj (θj |M = k)
                 ˇ                        fk (y|θk ) πk (θk )
                                    t=1                            j=k
                                    D
                                                      (t)    (t)              (t)
                                                                          πj (θj |M = ℓ)
                                         ̺ℓ fℓ (y|θℓ ) πℓ (θℓ )
                                   ℓ=1                              j=ℓ
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Scott’s (2002) proposal


      Suggest estimating P (M = k|y) by
                                                                                 
                           T                                                     
                                                               D
                                    (t)                                       (t)
              ̺k (y) ∝ ̺k
               ˜              f (y|θk )                             ̺j fj (y|θj ) ,
                             k                                                   
                                       t=1                    j=1

      based on D simultaneous and independent MCMC chains
                                           (t)
                                                          1 ≤ k ≤ D,
                                        (θk )t ,

      with stationary distributions πk (θk |y) [instead of above joint]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Scott’s (2002) proposal


      Suggest estimating P (M = k|y) by
                                                                                 
                           T                                                     
                                                               D
                                    (t)                                       (t)
              ̺k (y) ∝ ̺k
               ˜              f (y|θk )                             ̺j fj (y|θj ) ,
                             k                                                   
                                       t=1                    j=1

      based on D simultaneous and independent MCMC chains
                                           (t)
                                                          1 ≤ k ≤ D,
                                        (θk )t ,

      with stationary distributions πk (θk |y) [instead of above joint]
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Congdon’s (2006) extension



      Selecting flat [prohibited!] pseudo-priors, uses instead
                                                                       
                    T                                                  
                                               D
                                (t)     (t)                 (t)     (t)
       ̺k (y) ∝ ̺k
       ˆ                 fk (y|θk )πk (θk )       ̺j fj (y|θj )πj (θj ) ,
                                                                       
                            t=1                           j=1

                                   (t)
      where again the θk ’s are MCMC chains with stationary
      distributions πk (θk |y)
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Examples

      Example (Model choice)
      Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model
      M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on
      both models: ̺1 = ̺2 = 0.5.



     Approximations of π(M = 1|y):
     Scott’s (2002) (green), and
     Congdon’s (2006) (brown)
     (N = 106 simulations).
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Examples

      Example (Model choice)
      Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model
      M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on
      both models: ̺1 = ̺2 = 0.5.



     Approximations of π(M = 1|y):
     Scott’s (2002) (green), and
     Congdon’s (2006) (brown)
     (N = 106 simulations).
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Examples (2)

      Example (Model choice (2))
      Normal model M1 : y ∼ N (θ, 1) with θ ∼ N (0, 1) vs. normal
      model M2 : y ∼ N (θ, 1) with θ ∼ N (5, 1)

     Comparison of both
     approximations with
     π(M = 1|y): Scott’s (2002)
     (green and mixed dashes) and
     Congdon’s (2006) (brown and
     long dashes) (N = 104
     simulations).
On some computational methods for Bayesian model choice
  Cross-model solutions
     Implementation error



Examples (3)

      Example (Model choice (3))
      Model M1 : y ∼ N (0, 1/ω) with ω ∼ Exp(a) vs.
      M2 : exp(y) ∼ Exp(λ) with λ ∼ Exp(b).

     Comparison of Congdon’s (2006)
     (brown and dashed lines) with
     π(M = 1|y) when (a, b) is equal
     to (.24, 8.9), (.56, .7), (4.1, .46)
     and (.98, .081), resp. (N = 104
     simulations).
On some computational methods for Bayesian model choice
  Nested sampling
     Purpose



Nested sampling: Goal


      Skilling’s (2007) technique using the one-dimensional
      representation:
                                                              1
                                     Z = Eπ [L(θ)] =              ϕ(x) dx
                                                          0

      with
                                        ϕ−1 (l) = P π (L(θ) > l).
      Note; ϕ(·) is intractable in most cases.
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: First approximation

      Approximate Z by a Riemann sum:
                                                 j
                                                     (xi−1 − xi )ϕ(xi )
                                        Z=
                                               i=1

      where the xi ’s are either:
                                        xi = e−i/N
              deterministic:
              or random:

                                                                  ti ∼ Be(N, 1)
                                x0 = 0,          xi+1 = ti xi ,

              so that E[log xi ] = −i/N .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
                      e−θ dθ =
          Z=                                e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ                   δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
          Z=
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
                      e−θ dθ =
          Z=                                e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ                   δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
          Z=
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Extraneous white noise
      Take
                                          1 −(1−δ)θ −δθ      1 −(1−δ)θ
                      e−θ dθ =
          Z=                                e      e    = Eδ   e
                                          δ                  δ
                        N
             1
          ˆ                   δ −1 e−(1−δ)θi (xi−1 − xi ) ,   θi ∼ E(δ) I(θi ≤ θi−1 )
          Z=
             N
                        i=1

        N           deterministic       random
        50              4.64              10.5
                        4.65              10.5
        100             2.47               4.9 Comparison of variances and MSEs
                        2.48              5.02
        500             .549              1.01
                        .550              1.14
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
              Take ϕi = L(θk ), where θk is the point with smallest
          1

              likelihood in the pool of θi ’s
              Replace θk with a sample from the prior constrained to
          2

              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
              Take ϕi = L(θk ), where θk is the point with smallest
          1

              likelihood in the pool of θi ’s
              Replace θk with a sample from the prior constrained to
          2

              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Second approximation

      Replace (intractable) ϕ(xi ) by ϕi , obtained by

      Nested sampling
      Start with N values θ1 , . . . , θN sampled from π
      At iteration i,
              Take ϕi = L(θk ), where θk is the point with smallest
          1

              likelihood in the pool of θi ’s
              Replace θk with a sample from the prior constrained to
          2

              L(θ) > ϕi : the current N points are sampled from prior
              constrained to L(θ) > ϕi .
On some computational methods for Bayesian model choice
  Nested sampling
     Implementation



Nested sampling: Third approximation


      Iterate the above steps until a given stopping iteration j is
      reached: e.g.,
              observe very small changes in the approximation Z;
              reach the maximal value of L(θ) when the likelihood is
              bounded and its maximum is known;
              truncate the integral Z at level ǫ, i.e. replace
                                        1                            1
                                            ϕ(x) dx       with           ϕ(x) dx
                                    0                            ǫ
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



Approximation error


      Error = Z − Z
                        j                                 1                          ǫ
                             (xi−1 − xi )ϕi −                 ϕ(x) dx = −
                   =                                                                     ϕ(x) dx
                                                      0                          0
                       i=1
                         j                                         1
                              (xi−1 − xi )ϕ(xi ) −
                   +                                                   ϕ(x) dx       (Quadrature Error)
                                                               ǫ
                        i=1
                         j
                              (xi−1 − xi ) {ϕi − ϕ(xi )}
                   +                                                                     (Stochastic Error)
                        i=1



                                                                       [Dominated by Monte Carlo!]
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



A CLT for the Stochastic Error

      The (dominating) stochastic error is OP (N −1/2 ):
                                                                  D
                               N 1/2 {Stochastic Error} → N (0, V )

      with
                                                  sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt.
                         V =−
                                      s,t∈[ǫ,1]

                                                          [Proof based on Donsker’s theorem]


      The number of simulated points equals the number of iterations j,
      and is a multiple of N : if one stops at first iteration j such that
      e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
On some computational methods for Bayesian model choice
  Nested sampling
     Error rates



A CLT for the Stochastic Error

      The (dominating) stochastic error is OP (N −1/2 ):
                                                                  D
                               N 1/2 {Stochastic Error} → N (0, V )

      with
                                                  sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt.
                         V =−
                                      s,t∈[ǫ,1]

                                                          [Proof based on Donsker’s theorem]


      The number of simulated points equals the number of iterations j,
      and is a multiple of N : if one stops at first iteration j such that
      e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
              asymptotic variance of the NS estimator;
          1


              number of iterations (necessary to reach a given truncation
          2

              error);
              cost of one simulated sample.
          3


      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
              asymptotic variance of the NS estimator;
          1


              number of iterations (necessary to reach a given truncation
          2

              error);
              cost of one simulated sample.
          3


      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
              asymptotic variance of the NS estimator;
          1


              number of iterations (necessary to reach a given truncation
          2

              error);
              cost of one simulated sample.
          3


      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Impact of dimension



Curse of dimension


      For a simple Gaussian-Gaussian model of dimension dim(θ) = d,
      the following 3 quantities are O(d):
              asymptotic variance of the NS estimator;
          1


              number of iterations (necessary to reach a given truncation
          2

              error);
              cost of one simulated sample.
          3


      Therefore, CPU time necessary for achieving error level e is

                                                    O(d3 /e2 )
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Sampling from constr’d priors

      Exact simulation from the constrained prior is intractable in most
      cases!

      Skilling (2007) proposes to use MCMC, but:
              this introduces a bias (stopping rule).
              if MCMC stationary distribution is unconst’d prior, more and
              more difficult to sample points such that L(θ) > l as l
              increases.

      If implementable, then slice sampler can be devised at the same
      cost!
On some computational methods for Bayesian model choice
  Nested sampling
     Constraints



Illustration of MCMC bias




      Log-relative error against d (left), avg. number of iterations (right)
      vs dimension d, for a Gaussian-Gaussian model with d parameters,
      when using T = 10 iterations of the Gibbs sampler.
On some computational methods for Bayesian model choice
  Nested sampling
     Importance variant



A IS variant of nested sampling

                                                   ˜
      Consider instrumental prior π and likelihood L, weight function

                                                          π(θ)L(θ)
                                             w(θ) =
                                                          π(θ)L(θ)

      and weighted NS estimator
                                               j
                                                   (xi−1 − xi )ϕi w(θi ).
                                     Z=
                                             i=1

      Then choose (π, L) so that sampling from π constrained to
      L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
On some computational methods for Bayesian model choice
  Nested sampling
     Importance variant



A IS variant of nested sampling

                                                   ˜
      Consider instrumental prior π and likelihood L, weight function

                                                          π(θ)L(θ)
                                             w(θ) =
                                                          π(θ)L(θ)

      and weighted NS estimator
                                               j
                                                   (xi−1 − xi )ϕi w(θi ).
                                     Z=
                                             i=1

      Then choose (π, L) so that sampling from π constrained to
      L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
On some computational methods for Bayesian model choice
  Mixture example




Benchmark: Target distribution




      Posterior distribution on (µ, σ) associated with the mixture

                                     pN (0, 1) + (1 − p)N (µ, σ) ,

      when p is known
On some computational methods for Bayesian model choice
  Mixture example




Experiment


           n observations with
           µ = 2 and σ = 3/2,
           Use of a uniform prior
           both on (−2, 6) for µ
           and on (.001, 16) for
           log σ 2 .
           occurrences of posterior
           bursts for µ = xi
           computation of the
           various estimates of Z
On some computational methods for Bayesian model choice
  Mixture example




Experiment (cont’d)




                                                          Nested sampling sequence
   MCMC sample for n = 16
                                                          with M = 1000 starting points.
   observations from the mixture.
On some computational methods for Bayesian model choice
  Mixture example




Experiment (cont’d)




                                                          Nested sampling sequence
   MCMC sample for n = 50
                                                          with M = 1000 starting points.
   observations from the mixture.
On some computational methods for Bayesian model choice
  Mixture example




Comparison


      Monte Carlo and MCMC (=Gibbs) outputs based on T = 104
      simulations and numerical integration based on a 850 × 950 grid in
      the (µ, σ) parameter space.
      Nested sampling approximation based on a starting sample of
      M = 1000 points followed by at least 103 further simulations from
      the constr’d prior and a stopping rule at 95% of the observed
      maximum likelihood.
      Constr’d prior simulation based on 50 values simulated by random
      walk accepting only steps leading to a lik’hood higher than the
      bound
On some computational methods for Bayesian model choice
  Mixture example




Comparison (cont’d)




      Graph based on a sample of 10 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Mixture example




Comparison (cont’d)




      Graph based on a sample of 50 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Mixture example




Comparison (cont’d)




      Graph based on a sample of 100 observations for µ = 2 and
      σ = 3/2 (150 replicas).
On some computational methods for Bayesian model choice
  Mixture example




Comparison (cont’d)



      Nested sampling gets less reliable as sample size increases
      Most reliable approach is mixture Z3 although harmonic solution
      Z1 close to Chib’s solution [taken as golden standard]
      Monte Carlo method Z2 also producing poor approximations to Z
      (Kernel φ used in Z2 is a t non-parametric kernel estimate with
      standard bandwidth estimation.)

More Related Content

What's hot

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
Christian Robert
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
zukun
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
Gabriel Peyré
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
Yi-Hsin Liu
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
Christian Robert
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
Christian Robert
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
Christian Robert
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherches
Pierre Pudlo
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Lecture1
Lecture1Lecture1
Lecture1
paradoox
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
Christian Robert
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
Gabriel Peyré
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
Christian Robert
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
Christian Robert
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
Pierre Pudlo
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
Frank Nielsen
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
Erika G. G.
 

What's hot (20)

Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
Object Recognition with Deformable Models
Object Recognition with Deformable ModelsObject Recognition with Deformable Models
Object Recognition with Deformable Models
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Mesh Processing Course : Differential Calculus
Mesh Processing Course : Differential CalculusMesh Processing Course : Differential Calculus
Mesh Processing Course : Differential Calculus
 
Team meeting 100325
Team meeting 100325Team meeting 100325
Team meeting 100325
 
Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017Monte Carlo in Montréal 2017
Monte Carlo in Montréal 2017
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
ABC in London, May 5, 2011
ABC in London, May 5, 2011ABC in London, May 5, 2011
ABC in London, May 5, 2011
 
Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
Habilitation à diriger des recherches
Habilitation à diriger des recherchesHabilitation à diriger des recherches
Habilitation à diriger des recherches
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Lecture1
Lecture1Lecture1
Lecture1
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Mesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic SamplingMesh Processing Course : Geodesic Sampling
Mesh Processing Course : Geodesic Sampling
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Bayesian Core: Chapter 6
Bayesian Core: Chapter 6Bayesian Core: Chapter 6
Bayesian Core: Chapter 6
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 

Similar to Computational tools for Bayesian model choice

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
Christian Robert
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
Christian Robert
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
Christian Robert
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton University
Christian Robert
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Christian Robert
 
Link-based document classification using Bayesian Networks
Link-based document classification using Bayesian NetworksLink-based document classification using Bayesian Networks
Link-based document classification using Bayesian Networks
Alfonso E. Romero
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
Christian Robert
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
Albert Orriols-Puig
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
uddingias
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
Christian Robert
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
Christian Robert
 
Boston talk
Boston talkBoston talk
Boston talk
Christian Robert
 

Similar to Computational tools for Bayesian model choice (12)

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
BIRS 12w5105 meeting
BIRS 12w5105 meetingBIRS 12w5105 meeting
BIRS 12w5105 meeting
 
seminar at Princeton University
seminar at Princeton Universityseminar at Princeton University
seminar at Princeton University
 
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with ApplicationsWorkshop on Bayesian Inference for Latent Gaussian Models with Applications
Workshop on Bayesian Inference for Latent Gaussian Models with Applications
 
Link-based document classification using Bayesian Networks
Link-based document classification using Bayesian NetworksLink-based document classification using Bayesian Networks
Link-based document classification using Bayesian Networks
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
von Mises lecture, Berlin
von Mises lecture, Berlinvon Mises lecture, Berlin
von Mises lecture, Berlin
 
Boston talk
Boston talkBoston talk
Boston talk
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
Christian Robert
 

More from Christian Robert (20)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
the ABC of ABC
the ABC of ABCthe ABC of ABC
the ABC of ABC
 

Recently uploaded

Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
nitinpv4ai
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 

Recently uploaded (20)

Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10Haunted Houses by H W Longfellow for class 10
Haunted Houses by H W Longfellow for class 10
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 

Computational tools for Bayesian model choice

  • 1. On some computational methods for Bayesian model choice On some computational methods for Bayesian model choice Christian P. Robert CREST-INSEE and Universit´ Paris Dauphine e http://www.ceremade.dauphine.fr/~xian Joint work with Nicolas Chopin and Jean-Michel Marin
  • 2. On some computational methods for Bayesian model choice Outline Introduction 1 Importance sampling solutions 2 Cross-model solutions 3 Nested sampling 4 Mixture example 5
  • 3. On some computational methods for Bayesian model choice Introduction Bayes factor Bayes factor Definition (Bayes factors) For testing hypotheses H0 : θ ∈ Θ0 vs. Ha : θ ∈ Θ0 , under prior π(Θ0 )π0 (θ) + π(Θc )π1 (θ) , 0 central quantity f (x|θ)π0 (θ)dθ π(Θ0 |x) π(Θ0 ) Θ0 B01 = = π(Θc |x) π(Θc ) 0 0 f (x|θ)π1 (θ)dθ Θc 0 [Jeffreys, 1939]
  • 4. On some computational methods for Bayesian model choice Introduction Bayes factor Self-contained concept Outside decision-theoretic environment: eliminates impact of π(Θ0 ) but depends on the choice of (π0 , π1 ) Bayesian/marginal equivalent to the likelihood ratio Jeffreys’ scale of evidence: if log10 (B10 ) between 0 and 0.5, evidence against H0 weak, π if log10 (B10 ) 0.5 and 1, evidence substantial, π if log10 (B10 ) 1 and 2, evidence strong and π if log10 (B10 ) above 2, evidence decisive π Requires the computation of the marginal/evidence under both hypotheses/models
  • 5. On some computational methods for Bayesian model choice Introduction Model choice Model choice and model comparison Choice between models Several models available for the same observation Mi : x ∼ fi (x|θi ), i∈I where I can be finite or infinite
  • 6. On some computational methods for Bayesian model choice Introduction Model choice Bayesian resolution Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive fj (x′ |θj )πj (θj |x)dθj π(Mj |x) Θj j
  • 7. On some computational methods for Bayesian model choice Introduction Model choice Bayesian resolution Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive fj (x′ |θj )πj (θj |x)dθj π(Mj |x) Θj j
  • 8. On some computational methods for Bayesian model choice Introduction Model choice Bayesian resolution Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive fj (x′ |θj )πj (θj |x)dθj π(Mj |x) Θj j
  • 9. On some computational methods for Bayesian model choice Introduction Model choice Bayesian resolution Probabilise the entire model/parameter space allocate probabilities pi to all models Mi define priors πi (θi ) for each parameter space Θi compute pi fi (x|θi )πi (θi )dθi Θi π(Mi |x) = pj fj (x|θj )πj (θj )dθj Θj j take largest π(Mi |x) to determine “best” model, or use averaged predictive fj (x′ |θj )πj (θj |x)dθj π(Mj |x) Θj j
  • 10. On some computational methods for Bayesian model choice Introduction Evidence Evidence All these problems end up with a similar quantity, the evidence Z= π(θ)L(θ) dθ, aka the marginal likelihood.
  • 11. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Bridge sampling If π1 (θ1 |x) ∝ π1 (θ1 |x) ˜ π2 (θ2 |x) ∝ π2 (θ2 |x) ˜ live on the same space, then n π1 (θi |x) 1 ˜ ≈ θi ∼ π2 (θ|x) B12 π2 (θi |x) n ˜ i=1 [Gelman & Meng, 1998; Chen, Shao & Ibrahim, 2000]
  • 12. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance (Further) bridge sampling In addition π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ ∀ α(·) B12 = π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ θji ∼ πj (θ|x) n2 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1
  • 13. On some computational methods for Bayesian model choice Importance sampling solutions Regular importance Optimal bridge sampling The optimal choice of auxiliary function α n1 + n2 α⋆ = n1 π1 (θ|x) + n2 π2 (θ|x) ˜ ˜ leading to n1 π2 (θ1i |x) 1 ˜ n1 π1 (θ1i |x) + n2 π2 (θ1i |x) n1 ˜ ˜ i=1 B12 ≈ n2 π1 (θ2i |x) 1 ˜ n1 π1 (θ2i |x) + n2 π2 (θ2i |x) n2 ˜ ˜ i=1 Back later!
  • 14. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z from a posterior sample Use of the identity ϕ(θ) ϕ(θ) π(θ)L(θ) 1 Eπ x= dθ = π(θ)L(θ) π(θ)L(θ) Z Z no matter what the proposal ϕ(θ) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of MCMC output RB-RJ
  • 15. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z from a posterior sample Use of the identity ϕ(θ) ϕ(θ) π(θ)L(θ) 1 Eπ x= dθ = π(θ)L(θ) π(θ)L(θ) Z Z no matter what the proposal ϕ(θ) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of MCMC output RB-RJ
  • 16. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than π(θ)L(θ) for the approximation T ϕ(θ(t) ) 1 Z1 = 1 π(θ(t) )L(θ(t) ) T t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 17. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than π(θ)L(θ) for the approximation T ϕ(θ(t) ) 1 Z1 = 1 π(θ(t) )L(θ(t) ) T t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 18. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Comparison with regular importance sampling (cont’d) Compare Z1 with a standard importance sampling approximation T π(θ(t) )L(θ(t) ) 1 Z2 = ϕ(θ(t) ) T t=1 where the θ(t) ’s are generated from the density ϕ(θ) (with fatter tails like t’s)
  • 19. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) , ˜ where ϕ(θ) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 20. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation Bridge sampling redux Design a specific mixture for simulation [importance sampling] purposes, with density ϕ(θ) ∝ ω1 π(θ)L(θ) + ϕ(θ) , ˜ where ϕ(θ) is arbitrary (but normalised) Note: ω1 is not a probability weight
  • 21. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) ) and δ (t) = 2 otherwise; If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where 2 MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated with the posterior π(θ|x) ∝ π(θ)L(θ); If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently 3
  • 22. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) ) and δ (t) = 2 otherwise; If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where 2 MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated with the posterior π(θ|x) ∝ π(θ)L(θ); If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently 3
  • 23. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Approximating Z using a mixture representation (cont’d) Corresponding MCMC (=Gibbs) sampler At iteration t Take δ (t) = 1 with probability 1 ω1 π(θ(t−1) )L(θ(t−1) ) ω1 π(θ(t−1) )L(θ(t−1) ) + ϕ(θ(t−1) ) and δ (t) = 2 otherwise; If δ (t) = 1, generate θ(t) ∼ MCMC(θ(t−1) , θ(t) ) where 2 MCMC(θ, θ′ ) denotes an arbitrary MCMC kernel associated with the posterior π(θ|x) ∝ π(θ)L(θ); If δ (t) = 2, generate θ(t) ∼ ϕ(θ) independently 3
  • 24. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 ω1 π(θ(t) )L(θ(t) ) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) , ξ= T t=1 converges to ω1 Z/{ω1 Z + 1} ˆ ˆ ˆ ˆ Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie T (t) (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) t=1 ω1 π(θ )L(θ ) ˆ Z3 = T (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) t=1 ϕ(θ ) [Bridge sampler]
  • 25. On some computational methods for Bayesian model choice Importance sampling solutions Harmonic means Evidence approximation by mixtures Rao-Blackwellised estimate T ˆ1 ω1 π(θ(t) )L(θ(t) ) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) , ξ= T t=1 converges to ω1 Z/{ω1 Z + 1} ˆ ˆ ˆ ˆ Deduce Z3 from ω1 Z3 /{ω1 Z3 + 1} = ξ ie T (t) (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) t=1 ω1 π(θ )L(θ ) ˆ Z3 = T (t) ω1 π(θ(t) )L(θ(t) ) + ϕ(θ(t) ) t=1 ϕ(θ ) [Bridge sampler]
  • 26. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) mk (x) = , πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) mk (x) = ˆ . ˆ∗ πk (θk |x)
  • 27. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) mk (x) = , πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) mk (x) = ˆ . ˆ∗ πk (θk |x)
  • 28. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T 1 (t) ˆ∗ ∗ πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are the latent variables simulated by a Gibbs sampler.
  • 29. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ˜∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 30. On some computational methods for Bayesian model choice Importance sampling solutions Chib’s solution Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T 1 (t) ˜∗ ∗ πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 31. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 32. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Reversible jump Idea: Set up a proper measure–theoretic framework for designing moves between models Mk [Green, 1995] Create a reversible kernel K on H = k {k} × Θk such that K(x, dy)π(x)dx = K(y, dx)π(y)dy A B B A for the invariant density π [x is of the form (k, θ(k) )]
  • 33. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 34. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves For a move between two models, M1 and M2 , the Markov chain being in state θ1 ∈ M1 , denote by K1→2 (θ1 , dθ) and K2→1 (θ2 , dθ) the corresponding kernels, under the detailed balance condition π(dθ1 ) K1→2 (θ1 , dθ) = π(dθ2 ) K2→1 (θ2 , dθ) , and take, wlog, dim(M2 ) > dim(M1 ). Proposal expressed as θ2 = Ψ1→2 (θ1 , v1→2 ) where v1→2 is a random variable of dimension dim(M2 ) − dim(M1 ), generated as v1→2 ∼ ϕ1→2 (v1→2 ) .
  • 35. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability ̟1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 36. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability ̟1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 37. On some computational methods for Bayesian model choice Cross-model solutions Reversible jump Local moves (2) In this case, q1→2 (θ1 , dθ2 ) has density −1 ∂Ψ1→2 (θ1 , v1→2 ) ϕ1→2 (v1→2 ) , ∂(θ1 , v1→2 ) by the Jacobian rule. Reverse importance link If probability ̟1→2 of choosing move to M2 while in M1 , acceptance probability reduces to π(M2 , θ2 ) ̟2→1 ∂Ψ1→2 (θ1 , v1→2 ) α(θ1 , v1→2 ) = 1∧ . π(M1 , θ1 ) ̟1→2 ϕ1→2 (v1→2 ) ∂(θ1 , v1→2 ) c Difficult calibration
  • 38. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes Alternative Saturation of the parameter space H = k {k} × Θk by creating a model index M pseudo-priors πj (θj |M = k) for j = k [Carlin & Chib, 1995] Validation by π(M = k|y) = P (M = k|y, θ)π(θ|y)dθ = Zk where the (marginal) posterior is D π(θ|y) = π(θ, M = k|y) k=1 D ̺k mk (y) πk (θk |y) πj (θj |M = k) . = k=1 j=k
  • 39. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes Alternative Saturation of the parameter space H = k {k} × Θk by creating a model index M pseudo-priors πj (θj |M = k) for j = k [Carlin & Chib, 1995] Validation by π(M = k|y) = P (M = k|y, θ)π(θ|y)dθ = Zk where the (marginal) posterior is D π(θ|y) = π(θ, M = k|y) k=1 D ̺k mk (y) πk (θk |y) πj (θj |M = k) . = k=1 j=k
  • 40. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes MCMC implementation (t) (t) Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary distribution π(θ, M = k|y) by Pick M (t) = k with probability P (θ(t−1) , M = k|y) 1 (t−1) from the posterior πk (θk |y) [or MCMC step] Generate θk 2 (t−1) (j = k) from the pseudo-prior πj (θj |M = k) Generate θj 3 Approximate π(M = k|y) = Zk by T (t) (t) (t) ̺k (y) ∝ ̺k πj (θj |M = k) ˇ fk (y|θk ) πk (θk ) t=1 j=k D (t) (t) (t) πj (θj |M = ℓ) ̺ℓ fℓ (y|θℓ ) πℓ (θℓ ) ℓ=1 j=ℓ
  • 41. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes MCMC implementation (t) (t) Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary distribution π(θ, M = k|y) by Pick M (t) = k with probability P (θ(t−1) , M = k|y) 1 (t−1) from the posterior πk (θk |y) [or MCMC step] Generate θk 2 (t−1) (j = k) from the pseudo-prior πj (θj |M = k) Generate θj 3 Approximate π(M = k|y) = Zk by T (t) (t) (t) ̺k (y) ∝ ̺k πj (θj |M = k) ˇ fk (y|θk ) πk (θk ) t=1 j=k D (t) (t) (t) πj (θj |M = ℓ) ̺ℓ fℓ (y|θℓ ) πℓ (θℓ ) ℓ=1 j=ℓ
  • 42. On some computational methods for Bayesian model choice Cross-model solutions Saturation schemes MCMC implementation (t) (t) Run a Markov chain (M (t) , θ1 , . . . , θD ) with stationary distribution π(θ, M = k|y) by Pick M (t) = k with probability P (θ(t−1) , M = k|y) 1 (t−1) from the posterior πk (θk |y) [or MCMC step] Generate θk 2 (t−1) (j = k) from the pseudo-prior πj (θj |M = k) Generate θj 3 Approximate π(M = k|y) = Zk by T (t) (t) (t) ̺k (y) ∝ ̺k πj (θj |M = k) ˇ fk (y|θk ) πk (θk ) t=1 j=k D (t) (t) (t) πj (θj |M = ℓ) ̺ℓ fℓ (y|θℓ ) πℓ (θℓ ) ℓ=1 j=ℓ
  • 43. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Scott’s (2002) proposal Suggest estimating P (M = k|y) by   T  D (t) (t) ̺k (y) ∝ ̺k ˜ f (y|θk ) ̺j fj (y|θj ) , k  t=1 j=1 based on D simultaneous and independent MCMC chains (t) 1 ≤ k ≤ D, (θk )t , with stationary distributions πk (θk |y) [instead of above joint]
  • 44. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Scott’s (2002) proposal Suggest estimating P (M = k|y) by   T  D (t) (t) ̺k (y) ∝ ̺k ˜ f (y|θk ) ̺j fj (y|θj ) , k  t=1 j=1 based on D simultaneous and independent MCMC chains (t) 1 ≤ k ≤ D, (θk )t , with stationary distributions πk (θk |y) [instead of above joint]
  • 45. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Congdon’s (2006) extension Selecting flat [prohibited!] pseudo-priors, uses instead   T  D (t) (t) (t) (t) ̺k (y) ∝ ̺k ˆ fk (y|θk )πk (θk ) ̺j fj (y|θj )πj (θj ) ,   t=1 j=1 (t) where again the θk ’s are MCMC chains with stationary distributions πk (θk |y)
  • 46. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Examples Example (Model choice) Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on both models: ̺1 = ̺2 = 0.5. Approximations of π(M = 1|y): Scott’s (2002) (green), and Congdon’s (2006) (brown) (N = 106 simulations).
  • 47. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Examples Example (Model choice) Model M1 : y|θ ∼ U(0, θ) with prior θ ∼ Exp(1) is versus model M2 : y|θ ∼ Exp(θ) with prior θ ∼ Exp(1). Equal prior weights on both models: ̺1 = ̺2 = 0.5. Approximations of π(M = 1|y): Scott’s (2002) (green), and Congdon’s (2006) (brown) (N = 106 simulations).
  • 48. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Examples (2) Example (Model choice (2)) Normal model M1 : y ∼ N (θ, 1) with θ ∼ N (0, 1) vs. normal model M2 : y ∼ N (θ, 1) with θ ∼ N (5, 1) Comparison of both approximations with π(M = 1|y): Scott’s (2002) (green and mixed dashes) and Congdon’s (2006) (brown and long dashes) (N = 104 simulations).
  • 49. On some computational methods for Bayesian model choice Cross-model solutions Implementation error Examples (3) Example (Model choice (3)) Model M1 : y ∼ N (0, 1/ω) with ω ∼ Exp(a) vs. M2 : exp(y) ∼ Exp(λ) with λ ∼ Exp(b). Comparison of Congdon’s (2006) (brown and dashed lines) with π(M = 1|y) when (a, b) is equal to (.24, 8.9), (.56, .7), (4.1, .46) and (.98, .081), resp. (N = 104 simulations).
  • 50. On some computational methods for Bayesian model choice Nested sampling Purpose Nested sampling: Goal Skilling’s (2007) technique using the one-dimensional representation: 1 Z = Eπ [L(θ)] = ϕ(x) dx 0 with ϕ−1 (l) = P π (L(θ) > l). Note; ϕ(·) is intractable in most cases.
  • 51. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: First approximation Approximate Z by a Riemann sum: j (xi−1 − xi )ϕ(xi ) Z= i=1 where the xi ’s are either: xi = e−i/N deterministic: or random: ti ∼ Be(N, 1) x0 = 0, xi+1 = ti xi , so that E[log xi ] = −i/N .
  • 52. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ e−θ dθ = Z= e e = Eδ e δ δ N 1 ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) Z= N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 53. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ e−θ dθ = Z= e e = Eδ e δ δ N 1 ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) Z= N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 54. On some computational methods for Bayesian model choice Nested sampling Implementation Extraneous white noise Take 1 −(1−δ)θ −δθ 1 −(1−δ)θ e−θ dθ = Z= e e = Eδ e δ δ N 1 ˆ δ −1 e−(1−δ)θi (xi−1 − xi ) , θi ∼ E(δ) I(θi ≤ θi−1 ) Z= N i=1 N deterministic random 50 4.64 10.5 4.65 10.5 100 2.47 4.9 Comparison of variances and MSEs 2.48 5.02 500 .549 1.01 .550 1.14
  • 55. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, Take ϕi = L(θk ), where θk is the point with smallest 1 likelihood in the pool of θi ’s Replace θk with a sample from the prior constrained to 2 L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 56. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, Take ϕi = L(θk ), where θk is the point with smallest 1 likelihood in the pool of θi ’s Replace θk with a sample from the prior constrained to 2 L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 57. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Second approximation Replace (intractable) ϕ(xi ) by ϕi , obtained by Nested sampling Start with N values θ1 , . . . , θN sampled from π At iteration i, Take ϕi = L(θk ), where θk is the point with smallest 1 likelihood in the pool of θi ’s Replace θk with a sample from the prior constrained to 2 L(θ) > ϕi : the current N points are sampled from prior constrained to L(θ) > ϕi .
  • 58. On some computational methods for Bayesian model choice Nested sampling Implementation Nested sampling: Third approximation Iterate the above steps until a given stopping iteration j is reached: e.g., observe very small changes in the approximation Z; reach the maximal value of L(θ) when the likelihood is bounded and its maximum is known; truncate the integral Z at level ǫ, i.e. replace 1 1 ϕ(x) dx with ϕ(x) dx 0 ǫ
  • 59. On some computational methods for Bayesian model choice Nested sampling Error rates Approximation error Error = Z − Z j 1 ǫ (xi−1 − xi )ϕi − ϕ(x) dx = − = ϕ(x) dx 0 0 i=1 j 1 (xi−1 − xi )ϕ(xi ) − + ϕ(x) dx (Quadrature Error) ǫ i=1 j (xi−1 − xi ) {ϕi − ϕ(xi )} + (Stochastic Error) i=1 [Dominated by Monte Carlo!]
  • 60. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 {Stochastic Error} → N (0, V ) with sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt. V =− s,t∈[ǫ,1] [Proof based on Donsker’s theorem] The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
  • 61. On some computational methods for Bayesian model choice Nested sampling Error rates A CLT for the Stochastic Error The (dominating) stochastic error is OP (N −1/2 ): D N 1/2 {Stochastic Error} → N (0, V ) with sϕ′ (s)tϕ′ (t) log(s ∨ t) ds dt. V =− s,t∈[ǫ,1] [Proof based on Donsker’s theorem] The number of simulated points equals the number of iterations j, and is a multiple of N : if one stops at first iteration j such that e−j/N < ǫ, then: j = N ⌈− log ǫ⌉.
  • 62. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): asymptotic variance of the NS estimator; 1 number of iterations (necessary to reach a given truncation 2 error); cost of one simulated sample. 3 Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 63. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): asymptotic variance of the NS estimator; 1 number of iterations (necessary to reach a given truncation 2 error); cost of one simulated sample. 3 Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 64. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): asymptotic variance of the NS estimator; 1 number of iterations (necessary to reach a given truncation 2 error); cost of one simulated sample. 3 Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 65. On some computational methods for Bayesian model choice Nested sampling Impact of dimension Curse of dimension For a simple Gaussian-Gaussian model of dimension dim(θ) = d, the following 3 quantities are O(d): asymptotic variance of the NS estimator; 1 number of iterations (necessary to reach a given truncation 2 error); cost of one simulated sample. 3 Therefore, CPU time necessary for achieving error level e is O(d3 /e2 )
  • 66. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost!
  • 67. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost!
  • 68. On some computational methods for Bayesian model choice Nested sampling Constraints Sampling from constr’d priors Exact simulation from the constrained prior is intractable in most cases! Skilling (2007) proposes to use MCMC, but: this introduces a bias (stopping rule). if MCMC stationary distribution is unconst’d prior, more and more difficult to sample points such that L(θ) > l as l increases. If implementable, then slice sampler can be devised at the same cost!
  • 69. On some computational methods for Bayesian model choice Nested sampling Constraints Illustration of MCMC bias Log-relative error against d (left), avg. number of iterations (right) vs dimension d, for a Gaussian-Gaussian model with d parameters, when using T = 10 iterations of the Gibbs sampler.
  • 70. On some computational methods for Bayesian model choice Nested sampling Importance variant A IS variant of nested sampling ˜ Consider instrumental prior π and likelihood L, weight function π(θ)L(θ) w(θ) = π(θ)L(θ) and weighted NS estimator j (xi−1 − xi )ϕi w(θi ). Z= i=1 Then choose (π, L) so that sampling from π constrained to L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
  • 71. On some computational methods for Bayesian model choice Nested sampling Importance variant A IS variant of nested sampling ˜ Consider instrumental prior π and likelihood L, weight function π(θ)L(θ) w(θ) = π(θ)L(θ) and weighted NS estimator j (xi−1 − xi )ϕi w(θi ). Z= i=1 Then choose (π, L) so that sampling from π constrained to L(θ) > l is easy; e.g. N (c, Id ) constrained to c − θ < r.
  • 72. On some computational methods for Bayesian model choice Mixture example Benchmark: Target distribution Posterior distribution on (µ, σ) associated with the mixture pN (0, 1) + (1 − p)N (µ, σ) , when p is known
  • 73. On some computational methods for Bayesian model choice Mixture example Experiment n observations with µ = 2 and σ = 3/2, Use of a uniform prior both on (−2, 6) for µ and on (.001, 16) for log σ 2 . occurrences of posterior bursts for µ = xi computation of the various estimates of Z
  • 74. On some computational methods for Bayesian model choice Mixture example Experiment (cont’d) Nested sampling sequence MCMC sample for n = 16 with M = 1000 starting points. observations from the mixture.
  • 75. On some computational methods for Bayesian model choice Mixture example Experiment (cont’d) Nested sampling sequence MCMC sample for n = 50 with M = 1000 starting points. observations from the mixture.
  • 76. On some computational methods for Bayesian model choice Mixture example Comparison Monte Carlo and MCMC (=Gibbs) outputs based on T = 104 simulations and numerical integration based on a 850 × 950 grid in the (µ, σ) parameter space. Nested sampling approximation based on a starting sample of M = 1000 points followed by at least 103 further simulations from the constr’d prior and a stopping rule at 95% of the observed maximum likelihood. Constr’d prior simulation based on 50 values simulated by random walk accepting only steps leading to a lik’hood higher than the bound
  • 77. On some computational methods for Bayesian model choice Mixture example Comparison (cont’d) Graph based on a sample of 10 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 78. On some computational methods for Bayesian model choice Mixture example Comparison (cont’d) Graph based on a sample of 50 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 79. On some computational methods for Bayesian model choice Mixture example Comparison (cont’d) Graph based on a sample of 100 observations for µ = 2 and σ = 3/2 (150 replicas).
  • 80. On some computational methods for Bayesian model choice Mixture example Comparison (cont’d) Nested sampling gets less reliable as sample size increases Most reliable approach is mixture Z3 although harmonic solution Z1 close to Chib’s solution [taken as golden standard] Monte Carlo method Z2 also producing poor approximations to Z (Kernel φ used in Z2 is a t non-parametric kernel estimate with standard bandwidth estimation.)