SlideShare a Scribd company logo
On resolving the Savage–Dickey paradox




                   On resolving the Savage–Dickey paradox

                                         Christian P. Robert

                                 Universit´ Paris Dauphine & CREST-INSEE
                                          e
                                 http://www.ceremade.dauphine.fr/~xian


                                          Frontiers...
                                  San Antonio, March 19, 2010
                                  Joint work with J.-M. Marin
On resolving the Savage–Dickey paradox




Outline



      1    Importance sampling solutions compared

      2    The Savage–Dickey ratio
On resolving the Savage–Dickey paradox




Outline


      1    Importance sampling solutions compared

      2    The Savage–Dickey ratio




                                         Happy B’day, Jim!!!
On resolving the Savage–Dickey paradox




Evidence



      Bayesian model choice and hypothesis testing relies on a similar
      quantity, the evidence

                        Zk =             πk (θk )Lk (θk ) dθk ,   k = 1, 2, . . .
                                    Θk

      aka the marginal likelihood.
                                                                           [Jeffreys, 1939]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared




Importance sampling solutions


      1    Importance sampling solutions compared
             Regular importance
             Bridge sampling
             Harmonic means
             Chib’s representation

      2    The Savage–Dickey ratio
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Bayes factor approximation

      When approximating the Bayes factor

                                                    f0 (x|θ0 )π0 (θ0 )dθ0
                                               Θ0
                                     B01 =
                                                    f1 (x|θ1 )π1 (θ1 )dθ1
                                               Θ1

      use of importance functions ϕ0 and ϕ1 and

                                         n−1
                                          0
                                               n0         i       i        i
                                               i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 )
                            B01 =
                                         n−1
                                          1
                                               n1         i       i        i
                                               i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Probit modelling on Pima Indian women
      Example (R benchmark)
      200 Pima Indian women with observed variables
              plasma glucose concentration in oral glucose tolerance test
              diastolic blood pressure
              diabetes pedigree function
              presence/absence of diabetes


      Probability of diabetes function of above variables
                              P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
      Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
      g-prior modelling:
                                           β ∼ N3 (0, n XT X)−1
      Use of importance function inspired from the MLE estimate
      distribution
                                         ˆ ˆ
                                β ∼ N (β, Σ)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Regular importance



Diabetes in Pima Indian women
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations from the prior and
      the above MLE importance sampler
                                     5
                                     4




                                                                 q
                                     3
                                     2




                                           Monte Carlo   Importance sampling
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Bridge sampling

      General identity:

                                         π2 (θ|x)α(θ)π1 (θ|x)dθ
                                         ˜
                  B12 =                                                 ∀ α(·)
                                         π1 (θ|x)α(θ)π2 (θ|x)dθ
                                         ˜

                                           n1
                                   1
                                                 π2 (θ1i |x)α(θ1i )
                                                 ˜
                                   n1
                                           i=1
                            ≈               n2                        θji ∼ πj (θ|x)
                                   1
                                                 π1 (θ2i |x)α(θ2i )
                                                 ˜
                                   n2
                                           i=1

                                                                                   Back later!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Optimal bridge sampling

      The optimal choice of auxiliary function is
                                                           1
                                     α ∝
                                               n1 π1 (θ|x) + n2 π2 (θ|x)

      leading to
                                              n1
                                         1                    π2 (θ1i |x)
                                                               ˜
                                         n1         n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
                                              i=1
                           B12 ≈               n2
                                         1                    π1 (θ2i |x)
                                                               ˜
                                         n2         n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
                                              i=1
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                               π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                               ˜
                B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Extension to varying dimensions

      When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
      pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
      joint distribution
                             π1 (θ1 |x) × ω(ψ|θ1 , x)
      on Θ2 so that

                               π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
                               ˜
                B12 =
                              π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
                              ˜

                                  π1 (θ1 )ω(ψ|θ1 )
                                  ˜                  Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
                                                         π
                       = Eπ2                       =
                                     π2 (θ1 , ψ)
                                     ˜                 Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
                                                            π

        for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset




      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Bridge sampling



Illustration for the Pima Indian dataset

      Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
      pseudo-posterior and mixture of both MLE approximations on β3
      in bridge sampling estimate
                                         5
                                         4




                                                    q


                                                           q
                                         3




                                                    q
                                         2




                                             MC   Bridge   IS
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                               T
                                           1            1
                                           T         L(θkt |x)
                                               t=1

      is an unbiased estimator of 1/mk (x)
                                                                 [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



The original harmonic mean estimator



      When θki ∼ πk (θ|x),
                                               T
                                           1            1
                                           T         L(θkt |x)
                                               t=1

      is an unbiased estimator of 1/mk (x)
                                                                 [Newton & Raftery, 1994]

      Highly dangerous: Most often leads to an infinite variance!!!
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



“The Worst Monte Carlo Method Ever”


      “The good news is that the Law of Large Numbers guarantees that this
      estimator is consistent ie, it will very likely be very close to the correct
      answer if you use a sufficiently large number of points from the posterior
      distribution.
      The bad news is that the number of points required for this estimator to
      get close to the right answer will often be greater than the number of
      atoms in the observable universe. The even worse news is that it’s easy
      for people to not realize this, and to na¨  ıvely accept estimates that are
      nowhere close to the correct value of the marginal likelihood.”
                                           [Radford Neal’s blog, Aug. 23, 2008]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                     ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                         dθk =
                πk (θk )Lk (θk )           πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Approximating Zk from a posterior sample



      Use of the [harmonic mean] identity

                    ϕ(θk )                     ϕ(θk )       πk (θk )Lk (θk )       1
      Eπk                        x =                                         dθk =
                πk (θk )Lk (θk )           πk (θk )Lk (θk )       Zk               Zk

      no matter what the proposal ϕ(·) is.
                          [Gelfand & Dey, 1994; Bartolucci et al., 2006]
      Direct exploitation of the MCMC output
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                               T               (t)
                                           1             ϕ(θk )
                                 Z1k = 1                 (t)         (t)
                                           T         πk (θk )Lk (θk )
                                               t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Comparison with regular importance sampling


      Harmonic mean: Constraint opposed to usual importance sampling
      constraints: ϕ(θ) must have lighter (rather than fatter) tails than
      πk (θk )Lk (θk ) for the approximation
                                               T               (t)
                                           1             ϕ(θk )
                                 Z1k = 1                 (t)         (t)
                                           T         πk (θk )Lk (θk )
                                               t=1

      to have a finite variance.
      E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



HPD indicator as ϕ


      Use the convex hull of MCMC simulations corresponding to the
      10% HPD region (easily derived!) and ϕ as indicator:



                    10
       ϕ(θ) =                      Id(θ,θ(t) )≤
                    T
                          t∈HPD
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Harmonic means



Diabetes in Pima Indian women (cont’d)

      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above harmonic mean sampler and importance samplers
                                         3.116
                                                       q
                                         3.114
                                         3.112
                                         3.110
                                         3.108
                                         3.106
                                         3.104




                                                                         q
                                         3.102




                                                 Harmonic mean   Importance sampling
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                          ∗       ∗
                                                  fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                       .
                                                       ˆ ∗
                                                      πk (θk |x)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Chib’s representation


      Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
      θk ∼ πk (θk ),
                                        fk (x|θk ) πk (θk )
                        Zk = mk (x) =
                                            πk (θk |x)
      Use of an approximation to the posterior
                                                          ∗       ∗
                                                  fk (x|θk ) πk (θk )
                                  Zk = mk (x) =                       .
                                                       ˆ ∗
                                                      πk (θk |x)
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Case of latent variables



      For missing variable z as in mixture models, natural Rao-Blackwell
      estimate
                                       T
                            ∗       1          ∗      (t)
                       πk (θk |x) =       πk (θk |x, zk ) ,
                                    T
                                           t=1
                             (t)
      where the zk ’s are Gibbs sampled latent variables
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                             1
                       πk (θk |x) = πk (σ(θk )|x) =                     πk (σ(θk )|x)
                                                             k!
                                                                  σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                         T
                                  ∗           1                       ∗       (t)
                             πk (θk |x) =                      πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Compensation for label switching
                                           (t)
      For mixture models, zk usually fails to visit all configurations in a
      balanced way, despite the symmetry predicted by the theory
                                                             1
                       πk (θk |x) = πk (σ(θk )|x) =                     πk (σ(θk )|x)
                                                             k!
                                                                  σ∈S

      for all σ’s in Sk , set of all permutations of {1, . . . , k}.
      Consequences on numerical approximation, biased by an order k!
      Recover the theoretical symmetry by using
                                                         T
                                  ∗           1                       ∗       (t)
                             πk (θk |x) =                      πk (σ(θk )|x, zk ) .
                                             T k!
                                                    σ∈Sk t=1

                                                    [Berkhof, Mechelen, & Gelman, 2003]
On resolving the Savage–Dickey paradox
  Importance sampling solutions compared
     Chib’s representation



Case of the probit model
      For the completion by z,
                                                             1
                                         π (θ|x) =
                                         ˆ                              π(θ|x, z (t) )
                                                             T      t

      is a simple average of normal densities

                                                                                 q
                                           0.0255




                                                         q
                                                         q                       q
                                                         q
                                           0.0250
                                           0.0245
                                           0.0240




                                                         q
                                                         q



                                                    Chib's method       importance sampling
On resolving the Savage–Dickey paradox
      The Savage–Dickey ratio




1    Importance sampling solutions compared

2    The Savage–Dickey ratio
       Measure-theoretic aspects
       Computational implications
On resolving the Savage–Dickey paradox
      The Savage–Dickey ratio




1    Importance sampling solutions compared

2    The Savage–Dickey ratio
       Measure-theoretic aspects
       Computational implications
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



The Savage–Dickey ratio representation




      Special representation of the Bayes factor used for simulation

      Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



The Savage–Dickey ratio representation


      Special representation of the Bayes factor used for simulation

      Original version (Dickey, AoMS, 1971)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage’s density ratio theorem
      Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
      parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that

                                            π1 (ψ|θ0 ) = π0 (ψ)

      then
                                                    π1 (θ0 |x)
                                            B01 =              ,
                                                     π1 (θ0 )
      with the obvious notations

                 π1 (θ) =           π1 (θ, ψ)dψ ,   π1 (θ|x) =     π1 (θ, ψ|x)dψ ,

                                         [Dickey, 1971; Verdinelli & Wasserman, 1995]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Rephrased
      “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),

                                       Z
                            f0 (x) =       f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,


      i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
      of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
      Applying Bayes’ theorem to the right-hand side of [the above] we get

                                                                        ‹
                                               f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )


      and hence the Bayes factor is given by

                                                     ‹                   ‹
                                           B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .



      the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

                                                                          [O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Rephrased
      “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),

                                       Z
                            f0 (x) =       f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,


      i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
      of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
      Applying Bayes’ theorem to the right-hand side of [the above] we get

                                                                        ‹
                                               f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )


      and hence the Bayes factor is given by

                                                     ‹                   ‹
                                           B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .



      the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”

                                                                          [O’Hagan & Forster, 1996]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                       and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                        [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                        [version dependent]
                 π1 (θ0 )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Measure-theoretic difficulty
      Representation depends on the choice of versions of conditional
      densities:
                       π0 (ψ)f (x|θ0 , ψ) dψ
      B01 =                                             [by definition]
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ
                     π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
              =                                         [specific version of π1 (ψ|θ0 )
                     π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
                                                       and arbitrary version of π1 (θ0 )]
                  π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
              =                                        [specific version of π1 (θ0 , ψ)]
                        m1 (x)π1 (θ0 )
                π1 (θ0 |x)
              =                                        [version dependent]
                 π1 (θ0 )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                  π1 (θ0 |x)     π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π1 (θ0 )            m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Choice of density version



                       c Dickey’s (1971) condition is not a condition:
      If
                                  π1 (θ0 |x)     π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π1 (θ0 )            m1 (x)
      is chosen as a version, then Savage–Dickey’s representation holds
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Measure-theoretic aspects



Savage–Dickey paradox


      Verdinelli-Wasserman extension:
                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

      similarly depends on choices of versions...
      ...but Monte Carlo implementation relies on specific versions of all
      densities without making mention of it
                                          [Chen, Shao & Ibrahim, 2000]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



A computational exploitation
      Starting from the (instrumental) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                            π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                            ˜                                    ˜

      and impose the choice of version

                                  π1 (θ0 |x)
                                  ˜               π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )             m1 (x)
                                                        ˜

      Then
                                                 π1 (θ0 |x) m1 (x)
                                                 ˜          ˜
                                         B01 =
                                                  π1 (θ0 ) m1 (x)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



A computational exploitation
      Starting from the (instrumental) prior

                                         π1 (θ, ψ) = π1 (θ)π0 (ψ)
                                         ˜

      define the associated posterior

                            π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
                            ˜                                    ˜

      and impose the choice of version

                                  π1 (θ0 |x)
                                  ˜               π0 (ψ)f (x|θ0 , ψ) dψ
                                             =
                                   π0 (θ0 )             m1 (x)
                                                        ˜

      Then
                                                 π1 (θ0 |x) m1 (x)
                                                 ˜          ˜
                                         B01 =
                                                  π1 (θ0 ) m1 (x)
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



First ratio


      If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
                                                      ˜
                                         1
                                                    π1 (θ0 |x, ψ (t) )
                                                    ˜
                                         T    t

      converges to π1 (θ0 |x) provided the right version is used in θ0
                   ˜

                                                      π1 (θ0 )f (x|θ0 , ψ)
                                  π1 (θ0 |x, ψ) =
                                  ˜
                                                      π1 (θ)f (x|θ, ψ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                             T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                     f (x|θ, ψ) =          ˜
                                                           f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                ˜
                                                            f (x, z|θ0 , ψ)
                                             =                                  .
                                π1 (θ0 )                 ˜
                                                         f (x, z|θ, ψ)π1 (θ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Rao–Blackwellisation with latent variables
      When π1 (θ0 |x, ψ) unavailable, replace with
           ˜
                                             T
                                         1
                                                   π1 (θ0 |x, z (t) , ψ (t) )
                                                   ˜
                                         T
                                             t=1

      via data completion by latent variable z such that

                                     f (x|θ, ψ) =          ˜
                                                           f (x, z|θ, ψ) dz

                                            ˜
      and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
                ˜
      form, including the normalising constant, based on version

                            π1 (θ0 |x, z, ψ)
                            ˜                                ˜
                                                            f (x, z|θ0 , ψ)
                                             =                                  .
                                π1 (θ0 )                 ˜
                                                         f (x, z|θ, ψ)π1 (θ) dθ
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                                = Eπ1 (θ,ψ|x)
                                                   ˜
                                                                       =
                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                             T
                                                   π1 (ψ (t) |θ(t) )
                                         T
                                             t=1
                                                     π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (1)

      Since m1 (x)/m1 (x) is unknown, apparent failure!
            ˜
      Use of the bridge identity

                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)
      Eπ1 (θ,ψ|x)
       ˜
                                                = Eπ1 (θ,ψ|x)
                                                   ˜
                                                                       =
                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜

      to (biasedly) estimate m1 (x)/m1 (x) by
                             ˜
                                             T
                                                   π1 (ψ (t) |θ(t) )
                                         T
                                             t=1
                                                     π0 (ψ (t) )

      based on the same sample from π1 .
                                    ˜
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (2)
      Alternative identity

                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜
      Eπ1 (θ,ψ|x)                               = Eπ1 (θ,ψ|x)          =
                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)

                                      ¯ ¯
      suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
      the ratio estimate
                                             T
                                         1             ¯           ¯ ¯
                                                   π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
                                         T
                                             t=1

      Resulting unbiased estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1      t π1 (θ0 |x, z
                                           ˜                              1           π0 (ψ (t) )
                   B01 =                                                                ¯ ¯
                                  T              π1 (θ0 )                 T
                                                                              t=1
                                                                                    π1 (ψ (t) |θ(t) )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Bridge revival (2)
      Alternative identity

                         π0 (ψ)π1 (θ)f (x|θ, ψ)                π0 (ψ)    m1 (x)
                                                                         ˜
      Eπ1 (θ,ψ|x)                               = Eπ1 (θ,ψ|x)          =
                          π1 (θ, ψ)f (x|θ, ψ)                 π1 (ψ|θ)   m1 (x)

                                      ¯ ¯
      suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
      the ratio estimate
                                             T
                                         1             ¯           ¯ ¯
                                                   π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
                                         T
                                             t=1

      Resulting unbiased estimate:

                                                          (t) , ψ (t) )       T            ¯
                                  1      t π1 (θ0 |x, z
                                           ˜                              1           π0 (ψ (t) )
                   B01 =                                                                ¯ ¯
                                  T              π1 (θ0 )                 T
                                                                              t=1
                                                                                    π1 (ψ (t) |θ(t) )
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Difference with Verdinelli–Wasserman representation


      The above leads to the representation

                                          π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
                                          ˜
                                  B01 =             E
                                           π1 (θ0 )             π1 (ψ|θ)

      shows how our approach differs from Verdinelli and Wasserman’s

                                         π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
                            B01 =                  E
                                          π1 (θ0 )                 π1 (ψ|θ0 )

                                                                        [for referees only!!]
On resolving the Savage–Dickey paradox
  The Savage–Dickey ratio
     Computational implications



Diabetes in Pima Indian women (cont’d)
      Comparison of the variation of the Bayes factor approximations
      based on 100 replicas for 20, 000 simulations for a simulation from
      the above importance, Chib’s, Savage–Dickey’s and bridge
      samplers

                                                                             q
                                         3.4
                                         3.2
                                         3.0
                                         2.8




                                                                             q



                                               IS   Chib   Savage−Dickey   Bridge

More Related Content

What's hot

Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
ABC model choice
ABC model choiceABC model choice
ABC model choice
Christian Robert
 
Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wedVin Voro
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Christian Robert
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
Christian Robert
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
Christian Robert
 
12 - Overview
12 - Overview12 - Overview
12 - Overview
Tudor Girba
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
Christian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
Christian Robert
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
Christian Robert
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed model
KyusonLim
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
Christian Robert
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
Julyan Arbel
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
Christian Robert
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
KyusonLim
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
Valentin De Bortoli
 

What's hot (20)

Nested sampling
Nested samplingNested sampling
Nested sampling
 
ABC model choice
ABC model choiceABC model choice
ABC model choice
 
Tele3113 wk1wed
Tele3113 wk1wedTele3113 wk1wed
Tele3113 wk1wed
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
Edinburgh, Bayes-250
Edinburgh, Bayes-250Edinburgh, Bayes-250
Edinburgh, Bayes-250
 
Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]Columbia workshop [ABC model choice]
Columbia workshop [ABC model choice]
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
12 - Overview
12 - Overview12 - Overview
12 - Overview
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
BlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed modelBlUP and BLUE- REML of linear mixed model
BlUP and BLUE- REML of linear mixed model
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Colloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi KünschColloquium in honor of Hans Ruedi Künsch
Colloquium in honor of Hans Ruedi Künsch
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 

Similar to Savage-Dickey paradox

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
Christian Robert
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
Nikita Zhiltsov
 
Robustness under Independent Contamination Model
Robustness under Independent Contamination ModelRobustness under Independent Contamination Model
Robustness under Independent Contamination Model
rusmike
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Alex (Oleksiy) Varfolomiyev
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
SSA KPI
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
Pierre Jacob
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingUSC
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfgrssieee
 
Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi
 
Pr 2-bayesian decision
Pr 2-bayesian decisionPr 2-bayesian decision
Pr 2-bayesian decisionshivamsoni123
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
岳華 杜
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
Michael Stumpf
 

Similar to Savage-Dickey paradox (20)

CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Robustness under Independent Contamination Model
Robustness under Independent Contamination ModelRobustness under Independent Contamination Model
Robustness under Independent Contamination Model
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
T tests anovas and regression
T tests anovas and regressionT tests anovas and regression
T tests anovas and regression
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
Integration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modelingIntegration of biological annotations using hierarchical modeling
Integration of biological annotations using hierarchical modeling
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012Luca Pozzi 5thBCC 2012
Luca Pozzi 5thBCC 2012
 
Pr 2-bayesian decision
Pr 2-bayesian decisionPr 2-bayesian decision
Pr 2-bayesian decision
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve FittingScientific Computing with Python Webinar 9/18/2009:Curve Fitting
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 

More from Christian Robert (20)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
Kartik Tiwari
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
Mohammed Sikander
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Chapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdfChapter -12, Antibiotics (One Page Notes).pdf
Chapter -12, Antibiotics (One Page Notes).pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Multithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race conditionMultithreading_in_C++ - std::thread, race condition
Multithreading_in_C++ - std::thread, race condition
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 

Savage-Dickey paradox

  • 1. On resolving the Savage–Dickey paradox On resolving the Savage–Dickey paradox Christian P. Robert Universit´ Paris Dauphine & CREST-INSEE e http://www.ceremade.dauphine.fr/~xian Frontiers... San Antonio, March 19, 2010 Joint work with J.-M. Marin
  • 2. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio
  • 3. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Happy B’day, Jim!!!
  • 4. On resolving the Savage–Dickey paradox Evidence Bayesian model choice and hypothesis testing relies on a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , k = 1, 2, . . . Θk aka the marginal likelihood. [Jeffreys, 1939]
  • 5. On resolving the Savage–Dickey paradox Importance sampling solutions compared Importance sampling solutions 1 Importance sampling solutions compared Regular importance Bridge sampling Harmonic means Chib’s representation 2 The Savage–Dickey ratio
  • 6. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions ϕ0 and ϕ1 and n−1 0 n0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 ) B01 = n−1 1 n1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
  • 7. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 8. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 9. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 10. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 11. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 12. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 Back later!
  • 13. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is 1 α ∝ n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1
  • 14. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 15. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 16. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate
  • 17. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate 5 4 q q 3 q 2 MC Bridge IS
  • 18. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 19. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 20. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 21. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 22. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 23. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 24. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 25. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ T t∈HPD
  • 26. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.116 q 3.114 3.112 3.110 3.108 3.106 3.104 q 3.102 Harmonic mean Importance sampling
  • 27. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 28. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 29. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 30. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 31. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 32. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 33. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 34. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 35. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 36. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 37. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage’s density ratio theorem Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 38. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 39. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 40. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 41. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 42. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 43. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 44. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 45. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 46. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 47. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 48. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) provided the right version is used in θ0 ˜ π1 (θ0 )f (x|θ0 , ψ) π1 (θ0 |x, ψ) = ˜ π1 (θ)f (x|θ, ψ) dθ
  • 49. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 50. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 51. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 52. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 53. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 54. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 55. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Difference with Verdinelli–Wasserman representation The above leads to the representation π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ) ˜ B01 = E π1 (θ0 ) π1 (ψ|θ) shows how our approach differs from Verdinelli and Wasserman’s π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) [for referees only!!]
  • 56. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge