Your SlideShare is downloading. ×
  • Like
Savage-Dickey paradox
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Savage-Dickey paradox

  • 1,483 views
Published

This is my talk in San Antonio, March 19, 2010

This is my talk in San Antonio, March 19, 2010

Published in Education , Health & Medicine
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,483
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. On resolving the Savage–Dickey paradox On resolving the Savage–Dickey paradox Christian P. Robert Universit´ Paris Dauphine & CREST-INSEE e http://www.ceremade.dauphine.fr/~xian Frontiers... San Antonio, March 19, 2010 Joint work with J.-M. Marin
  • 2. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio
  • 3. On resolving the Savage–Dickey paradox Outline 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Happy B’day, Jim!!!
  • 4. On resolving the Savage–Dickey paradox Evidence Bayesian model choice and hypothesis testing relies on a similar quantity, the evidence Zk = πk (θk )Lk (θk ) dθk , k = 1, 2, . . . Θk aka the marginal likelihood. [Jeffreys, 1939]
  • 5. On resolving the Savage–Dickey paradox Importance sampling solutions compared Importance sampling solutions 1 Importance sampling solutions compared Regular importance Bridge sampling Harmonic means Chib’s representation 2 The Savage–Dickey ratio
  • 6. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Bayes factor approximation When approximating the Bayes factor f0 (x|θ0 )π0 (θ0 )dθ0 Θ0 B01 = f1 (x|θ1 )π1 (θ1 )dθ1 Θ1 use of importance functions ϕ0 and ϕ1 and n−1 0 n0 i i i i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 ) B01 = n−1 1 n1 i i i i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
  • 7. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 8. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 9. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 10. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Probit modelling on Pima Indian women Example (R benchmark) 200 Pima Indian women with observed variables plasma glucose concentration in oral glucose tolerance test diastolic blood pressure diabetes pedigree function presence/absence of diabetes Probability of diabetes function of above variables P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) , Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a g-prior modelling: β ∼ N3 (0, n XT X)−1 Use of importance function inspired from the MLE estimate distribution ˆ ˆ β ∼ N (β, Σ)
  • 11. On resolving the Savage–Dickey paradox Importance sampling solutions compared Regular importance Diabetes in Pima Indian women Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations from the prior and the above MLE importance sampler 5 4 q 3 2 Monte Carlo Importance sampling
  • 12. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Bridge sampling General identity: π2 (θ|x)α(θ)π1 (θ|x)dθ ˜ B12 = ∀ α(·) π1 (θ|x)α(θ)π2 (θ|x)dθ ˜ n1 1 π2 (θ1i |x)α(θ1i ) ˜ n1 i=1 ≈ n2 θji ∼ πj (θ|x) 1 π1 (θ2i |x)α(θ2i ) ˜ n2 i=1 Back later!
  • 13. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Optimal bridge sampling The optimal choice of auxiliary function is 1 α ∝ n1 π1 (θ|x) + n2 π2 (θ|x) leading to n1 1 π2 (θ1i |x) ˜ n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x) i=1 B12 ≈ n2 1 π1 (θ2i |x) ˜ n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x) i=1
  • 14. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 15. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Extension to varying dimensions When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into joint distribution π1 (θ1 |x) × ω(ψ|θ1 , x) on Θ2 so that π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ ˜ B12 = π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ ˜ π1 (θ1 )ω(ψ|θ1 ) ˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)] π = Eπ2 = π2 (θ1 , ψ) ˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)] π for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
  • 16. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate
  • 17. On resolving the Savage–Dickey paradox Importance sampling solutions compared Bridge sampling Illustration for the Pima Indian dataset Use of the MLE induced conditional of β3 given (β1 , β2 ) as a pseudo-posterior and mixture of both MLE approximations on β3 in bridge sampling estimate 5 4 q q 3 q 2 MC Bridge IS
  • 18. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 19. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means The original harmonic mean estimator When θki ∼ πk (θ|x), T 1 1 T L(θkt |x) t=1 is an unbiased estimator of 1/mk (x) [Newton & Raftery, 1994] Highly dangerous: Most often leads to an infinite variance!!!
  • 20. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means “The Worst Monte Carlo Method Ever” “The good news is that the Law of Large Numbers guarantees that this estimator is consistent ie, it will very likely be very close to the correct answer if you use a sufficiently large number of points from the posterior distribution. The bad news is that the number of points required for this estimator to get close to the right answer will often be greater than the number of atoms in the observable universe. The even worse news is that it’s easy for people to not realize this, and to na¨ ıvely accept estimates that are nowhere close to the correct value of the marginal likelihood.” [Radford Neal’s blog, Aug. 23, 2008]
  • 21. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 22. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Approximating Zk from a posterior sample Use of the [harmonic mean] identity ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1 Eπk x = dθk = πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk no matter what the proposal ϕ(·) is. [Gelfand & Dey, 1994; Bartolucci et al., 2006] Direct exploitation of the MCMC output
  • 23. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 24. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Comparison with regular importance sampling Harmonic mean: Constraint opposed to usual importance sampling constraints: ϕ(θ) must have lighter (rather than fatter) tails than πk (θk )Lk (θk ) for the approximation T (t) 1 ϕ(θk ) Z1k = 1 (t) (t) T πk (θk )Lk (θk ) t=1 to have a finite variance. E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
  • 25. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means HPD indicator as ϕ Use the convex hull of MCMC simulations corresponding to the 10% HPD region (easily derived!) and ϕ as indicator: 10 ϕ(θ) = Id(θ,θ(t) )≤ T t∈HPD
  • 26. On resolving the Savage–Dickey paradox Importance sampling solutions compared Harmonic means Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above harmonic mean sampler and importance samplers 3.116 q 3.114 3.112 3.110 3.108 3.106 3.104 q 3.102 Harmonic mean Importance sampling
  • 27. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 28. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Chib’s representation Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and θk ∼ πk (θk ), fk (x|θk ) πk (θk ) Zk = mk (x) = πk (θk |x) Use of an approximation to the posterior ∗ ∗ fk (x|θk ) πk (θk ) Zk = mk (x) = . ˆ ∗ πk (θk |x)
  • 29. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of latent variables For missing variable z as in mixture models, natural Rao-Blackwell estimate T ∗ 1 ∗ (t) πk (θk |x) = πk (θk |x, zk ) , T t=1 (t) where the zk ’s are Gibbs sampled latent variables
  • 30. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 31. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Compensation for label switching (t) For mixture models, zk usually fails to visit all configurations in a balanced way, despite the symmetry predicted by the theory 1 πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x) k! σ∈S for all σ’s in Sk , set of all permutations of {1, . . . , k}. Consequences on numerical approximation, biased by an order k! Recover the theoretical symmetry by using T ∗ 1 ∗ (t) πk (θk |x) = πk (σ(θk )|x, zk ) . T k! σ∈Sk t=1 [Berkhof, Mechelen, & Gelman, 2003]
  • 32. On resolving the Savage–Dickey paradox Importance sampling solutions compared Chib’s representation Case of the probit model For the completion by z, 1 π (θ|x) = ˆ π(θ|x, z (t) ) T t is a simple average of normal densities q 0.0255 q q q q 0.0250 0.0245 0.0240 q q Chib's method importance sampling
  • 33. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 34. On resolving the Savage–Dickey paradox The Savage–Dickey ratio 1 Importance sampling solutions compared 2 The Savage–Dickey ratio Measure-theoretic aspects Computational implications
  • 35. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 36. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects The Savage–Dickey ratio representation Special representation of the Bayes factor used for simulation Original version (Dickey, AoMS, 1971)
  • 37. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage’s density ratio theorem Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that π1 (ψ|θ0 ) = π0 (ψ) then π1 (θ0 |x) B01 = , π1 (θ0 ) with the obvious notations π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ , [Dickey, 1971; Verdinelli & Wasserman, 1995]
  • 38. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 39. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Rephrased “Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ), Z f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) , i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model. Applying Bayes’ theorem to the right-hand side of [the above] we get ‹ f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 ) and hence the Bayes factor is given by ‹ ‹ B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) . the ratio of the posterior to prior densities at φ = φ0 under the augmented model.” [O’Hagan & Forster, 1996]
  • 40. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 41. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Measure-theoretic difficulty Representation depends on the choice of versions of conditional densities: π0 (ψ)f (x|θ0 , ψ) dψ B01 = [by definition] π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 ) = [specific version of π1 (ψ|θ0 ) π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 ) and arbitrary version of π1 (θ0 )] π1 (θ0 , ψ)f (x|θ0 , ψ) dψ = [specific version of π1 (θ0 , ψ)] m1 (x)π1 (θ0 ) π1 (θ0 |x) = [version dependent] π1 (θ0 )
  • 42. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 43. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Choice of density version c Dickey’s (1971) condition is not a condition: If π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ = π1 (θ0 ) m1 (x) is chosen as a version, then Savage–Dickey’s representation holds
  • 44. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 45. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Measure-theoretic aspects Savage–Dickey paradox Verdinelli-Wasserman extension: π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) similarly depends on choices of versions... ...but Monte Carlo implementation relies on specific versions of all densities without making mention of it [Chen, Shao & Ibrahim, 2000]
  • 46. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 47. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications A computational exploitation Starting from the (instrumental) prior π1 (θ, ψ) = π1 (θ)π0 (ψ) ˜ define the associated posterior π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x) ˜ ˜ and impose the choice of version π1 (θ0 |x) ˜ π0 (ψ)f (x|θ0 , ψ) dψ = π0 (θ0 ) m1 (x) ˜ Then π1 (θ0 |x) m1 (x) ˜ ˜ B01 = π1 (θ0 ) m1 (x)
  • 48. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications First ratio If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then ˜ 1 π1 (θ0 |x, ψ (t) ) ˜ T t converges to π1 (θ0 |x) provided the right version is used in θ0 ˜ π1 (θ0 )f (x|θ0 , ψ) π1 (θ0 |x, ψ) = ˜ π1 (θ)f (x|θ, ψ) dθ
  • 49. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 50. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Rao–Blackwellisation with latent variables When π1 (θ0 |x, ψ) unavailable, replace with ˜ T 1 π1 (θ0 |x, z (t) , ψ (t) ) ˜ T t=1 via data completion by latent variable z such that f (x|θ, ψ) = ˜ f (x, z|θ, ψ) dz ˜ and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed ˜ form, including the normalising constant, based on version π1 (θ0 |x, z, ψ) ˜ ˜ f (x, z|θ0 , ψ) = . π1 (θ0 ) ˜ f (x, z|θ, ψ)π1 (θ) dθ
  • 51. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 52. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (1) Since m1 (x)/m1 (x) is unknown, apparent failure! ˜ Use of the bridge identity π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) Eπ1 (θ,ψ|x) ˜ = Eπ1 (θ,ψ|x) ˜ = π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ to (biasedly) estimate m1 (x)/m1 (x) by ˜ T π1 (ψ (t) |θ(t) ) T t=1 π0 (ψ (t) ) based on the same sample from π1 . ˜
  • 53. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 54. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Bridge revival (2) Alternative identity π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x) ˜ Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) = π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x) ¯ ¯ suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and the ratio estimate T 1 ¯ ¯ ¯ π0 (ψ (t) ) π1 (ψ (t) |θ(t) ) T t=1 Resulting unbiased estimate: (t) , ψ (t) ) T ¯ 1 t π1 (θ0 |x, z ˜ 1 π0 (ψ (t) ) B01 = ¯ ¯ T π1 (θ0 ) T t=1 π1 (ψ (t) |θ(t) )
  • 55. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Difference with Verdinelli–Wasserman representation The above leads to the representation π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ) ˜ B01 = E π1 (θ0 ) π1 (ψ|θ) shows how our approach differs from Verdinelli and Wasserman’s π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ) B01 = E π1 (θ0 ) π1 (ψ|θ0 ) [for referees only!!]
  • 56. On resolving the Savage–Dickey paradox The Savage–Dickey ratio Computational implications Diabetes in Pima Indian women (cont’d) Comparison of the variation of the Bayes factor approximations based on 100 replicas for 20, 000 simulations for a simulation from the above importance, Chib’s, Savage–Dickey’s and bridge samplers q 3.4 3.2 3.0 2.8 q IS Chib Savage−Dickey Bridge