This document discusses various importance sampling methods for approximating marginal likelihoods, including regular importance sampling, bridge sampling, and harmonic means. It compares these methods on a probit model example using data on diabetes in Pima Indian women. Regular importance sampling uses the MLE distribution as an importance function. Bridge sampling introduces a pseudo-posterior to handle models with different parameter dimensions. Harmonic means directly uses the posterior sample but requires a proposal distribution with lighter tails than the posterior.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
I used this set of slides for an Overview lecture I gave at the University of Zurich for the 1st year students following the course of Formale Grundlagen der Informatik.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
I used this set of slides for an Overview lecture I gave at the University of Zurich for the 1st year students following the course of Formale Grundlagen der Informatik.
The first report of Machine Learning Seminar organized by Computational Linguistics Laboratory at Kazan Federal University. See http://cll.niimm.ksu.ru/cms/lang/en_US/main/seminars/mlseminar
In many applications one observes rapid change of the solution in the boundary region. Accurate and numerically efficient resolution of the solution close to the moving boundaries is considered to be an important problem. We develop an approach to the optimization of the discretization grids for finite-difference scheme. Using the suggested approach we are able to achieve the exponential convergence of the boundary Neumann- to-Dirichlet maps. It increases the convergence order without increasing the stencil size of the finite-difference scheme and preserves stability.
Basics of probability in statistical simulation and stochastic programmingSSA KPI
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 2.
More info at http://summerschool.ssa.org.ua
Scientific Computing with Python Webinar 9/18/2009:Curve FittingEnthought, Inc.
This webinar will provide an overview of the tools that SciPy and NumPy provide for regression analysis including linear and non-linear least-squares and a brief look at handling other error metrics. We will also demonstrate simple GUI tools that can make some problems easier and provide a quick overview of the new Scikits package statsmodels whose API is maturing in a separate package but should be incorporated into SciPy in the future.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
1. On resolving the Savage–Dickey paradox
On resolving the Savage–Dickey paradox
Christian P. Robert
Universit´ Paris Dauphine & CREST-INSEE
e
http://www.ceremade.dauphine.fr/~xian
Frontiers...
San Antonio, March 19, 2010
Joint work with J.-M. Marin
2. On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
3. On resolving the Savage–Dickey paradox
Outline
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Happy B’day, Jim!!!
4. On resolving the Savage–Dickey paradox
Evidence
Bayesian model choice and hypothesis testing relies on a similar
quantity, the evidence
Zk = πk (θk )Lk (θk ) dθk , k = 1, 2, . . .
Θk
aka the marginal likelihood.
[Jeffreys, 1939]
5. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Importance sampling solutions
1 Importance sampling solutions compared
Regular importance
Bridge sampling
Harmonic means
Chib’s representation
2 The Savage–Dickey ratio
6. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Bayes factor approximation
When approximating the Bayes factor
f0 (x|θ0 )π0 (θ0 )dθ0
Θ0
B01 =
f1 (x|θ1 )π1 (θ1 )dθ1
Θ1
use of importance functions ϕ0 and ϕ1 and
n−1
0
n0 i i i
i=1 f0 (x|θ0 )π0 (θ0 )/ϕ0 (θ0 )
B01 =
n−1
1
n1 i i i
i=1 f1 (x|θ1 )π1 (θ1 )/ϕ1 (θ1 )
7. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
8. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
9. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
10. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Probit modelling on Pima Indian women
Example (R benchmark)
200 Pima Indian women with observed variables
plasma glucose concentration in oral glucose tolerance test
diastolic blood pressure
diabetes pedigree function
presence/absence of diabetes
Probability of diabetes function of above variables
P(y = 1|x) = Φ(x1 β1 + x2 β2 + x3 β3 ) ,
Test of H0 : β3 = 0 for 200 observations of Pima.tr based on a
g-prior modelling:
β ∼ N3 (0, n XT X)−1
Use of importance function inspired from the MLE estimate
distribution
ˆ ˆ
β ∼ N (β, Σ)
11. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Regular importance
Diabetes in Pima Indian women
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations from the prior and
the above MLE importance sampler
5
4
q
3
2
Monte Carlo Importance sampling
13. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Optimal bridge sampling
The optimal choice of auxiliary function is
1
α ∝
n1 π1 (θ|x) + n2 π2 (θ|x)
leading to
n1
1 π2 (θ1i |x)
˜
n1 n1 π1 (θ1i |x) + n2 π2 (θ1i |x)
i=1
B12 ≈ n2
1 π1 (θ2i |x)
˜
n2 n1 π1 (θ2i |x) + n2 π2 (θ2i |x)
i=1
14. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
15. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Extension to varying dimensions
When dim(Θ1 ) = dim(Θ2 ), e.g. θ2 = (θ1 , ψ), introduction of a
pseudo-posterior density, ω(ψ|θ1 , x), augmenting π1 (θ1 |x) into
joint distribution
π1 (θ1 |x) × ω(ψ|θ1 , x)
on Θ2 so that
π1 (θ1 |x)α(θ1 , ψ)π2 (θ1 , ψ|x)dθ1 ω(ψ|θ1 , x) dψ
˜
B12 =
π2 (θ1 , ψ|x)α(θ1 , ψ)π1 (θ1 |x)dθ1 ω(ψ|θ1 , x) dψ
˜
π1 (θ1 )ω(ψ|θ1 )
˜ Eϕ [˜1 (θ1 )ω(ψ|θ1 )/ϕ(θ1 , ψ)]
π
= Eπ2 =
π2 (θ1 , ψ)
˜ Eϕ [˜2 (θ1 , ψ)/ϕ(θ1 , ψ)]
π
for any conditional density ω(ψ|θ1 ) and any joint density ϕ.
16. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
17. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Bridge sampling
Illustration for the Pima Indian dataset
Use of the MLE induced conditional of β3 given (β1 , β2 ) as a
pseudo-posterior and mixture of both MLE approximations on β3
in bridge sampling estimate
5
4
q
q
3
q
2
MC Bridge IS
18. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
19. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
The original harmonic mean estimator
When θki ∼ πk (θ|x),
T
1 1
T L(θkt |x)
t=1
is an unbiased estimator of 1/mk (x)
[Newton & Raftery, 1994]
Highly dangerous: Most often leads to an infinite variance!!!
20. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
“The Worst Monte Carlo Method Ever”
“The good news is that the Law of Large Numbers guarantees that this
estimator is consistent ie, it will very likely be very close to the correct
answer if you use a sufficiently large number of points from the posterior
distribution.
The bad news is that the number of points required for this estimator to
get close to the right answer will often be greater than the number of
atoms in the observable universe. The even worse news is that it’s easy
for people to not realize this, and to na¨ ıvely accept estimates that are
nowhere close to the correct value of the marginal likelihood.”
[Radford Neal’s blog, Aug. 23, 2008]
21. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
22. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Approximating Zk from a posterior sample
Use of the [harmonic mean] identity
ϕ(θk ) ϕ(θk ) πk (θk )Lk (θk ) 1
Eπk x = dθk =
πk (θk )Lk (θk ) πk (θk )Lk (θk ) Zk Zk
no matter what the proposal ϕ(·) is.
[Gelfand & Dey, 1994; Bartolucci et al., 2006]
Direct exploitation of the MCMC output
23. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
24. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Comparison with regular importance sampling
Harmonic mean: Constraint opposed to usual importance sampling
constraints: ϕ(θ) must have lighter (rather than fatter) tails than
πk (θk )Lk (θk ) for the approximation
T (t)
1 ϕ(θk )
Z1k = 1 (t) (t)
T πk (θk )Lk (θk )
t=1
to have a finite variance.
E.g., use finite support kernels (like Epanechnikov’s kernel) for ϕ
25. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
HPD indicator as ϕ
Use the convex hull of MCMC simulations corresponding to the
10% HPD region (easily derived!) and ϕ as indicator:
10
ϕ(θ) = Id(θ,θ(t) )≤
T
t∈HPD
26. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Harmonic means
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above harmonic mean sampler and importance samplers
3.116
q
3.114
3.112
3.110
3.108
3.106
3.104
q
3.102
Harmonic mean Importance sampling
27. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
28. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Chib’s representation
Direct application of Bayes’ theorem: given x ∼ fk (x|θk ) and
θk ∼ πk (θk ),
fk (x|θk ) πk (θk )
Zk = mk (x) =
πk (θk |x)
Use of an approximation to the posterior
∗ ∗
fk (x|θk ) πk (θk )
Zk = mk (x) = .
ˆ ∗
πk (θk |x)
29. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of latent variables
For missing variable z as in mixture models, natural Rao-Blackwell
estimate
T
∗ 1 ∗ (t)
πk (θk |x) = πk (θk |x, zk ) ,
T
t=1
(t)
where the zk ’s are Gibbs sampled latent variables
30. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
31. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Compensation for label switching
(t)
For mixture models, zk usually fails to visit all configurations in a
balanced way, despite the symmetry predicted by the theory
1
πk (θk |x) = πk (σ(θk )|x) = πk (σ(θk )|x)
k!
σ∈S
for all σ’s in Sk , set of all permutations of {1, . . . , k}.
Consequences on numerical approximation, biased by an order k!
Recover the theoretical symmetry by using
T
∗ 1 ∗ (t)
πk (θk |x) = πk (σ(θk )|x, zk ) .
T k!
σ∈Sk t=1
[Berkhof, Mechelen, & Gelman, 2003]
32. On resolving the Savage–Dickey paradox
Importance sampling solutions compared
Chib’s representation
Case of the probit model
For the completion by z,
1
π (θ|x) =
ˆ π(θ|x, z (t) )
T t
is a simple average of normal densities
q
0.0255
q
q q
q
0.0250
0.0245
0.0240
q
q
Chib's method importance sampling
33. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Measure-theoretic aspects
Computational implications
34. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
1 Importance sampling solutions compared
2 The Savage–Dickey ratio
Measure-theoretic aspects
Computational implications
35. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
36. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
The Savage–Dickey ratio representation
Special representation of the Bayes factor used for simulation
Original version (Dickey, AoMS, 1971)
37. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage’s density ratio theorem
Given a test H0 : θ = θ0 in a model f (x|θ, ψ) with a nuisance
parameter ψ, under priors π0 (ψ) and π1 (θ, ψ) such that
π1 (ψ|θ0 ) = π0 (ψ)
then
π1 (θ0 |x)
B01 = ,
π1 (θ0 )
with the obvious notations
π1 (θ) = π1 (θ, ψ)dψ , π1 (θ|x) = π1 (θ, ψ|x)dψ ,
[Dickey, 1971; Verdinelli & Wasserman, 1995]
38. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
39. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Rephrased
“Suppose that f0 (θ) = f1 (θ|φ = φ0 ). As f0 (x|θ) = f1 (x|θ, φ = φ0 ),
Z
f0 (x) = f1 (x|θ, φ = φ0 )f1 (θ|φ = φ0 ) dθ = f1 (x|φ = φ0 ) ,
i.e., the denumerator of the Bayes factor is the value of f1 (x|φ) at φ = φ0 , while the denominator is an average
of the values of f1 (x|φ) for φ = φ0 , weighted by the prior distribution f1 (φ) under the augmented model.
Applying Bayes’ theorem to the right-hand side of [the above] we get
‹
f0 (x) = f1 (φ0 |x)f1 (x) f1 (φ0 )
and hence the Bayes factor is given by
‹ ‹
B = f0 (x) f1 (x) = f1 (φ0 |x) f1 (φ0 ) .
the ratio of the posterior to prior densities at φ = φ0 under the augmented model.”
[O’Hagan & Forster, 1996]
40. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
41. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Measure-theoretic difficulty
Representation depends on the choice of versions of conditional
densities:
π0 (ψ)f (x|θ0 , ψ) dψ
B01 = [by definition]
π1 (θ, ψ)f (x|θ, ψ) dψdθ
π1 (ψ|θ0 )f (x|θ0 , ψ) dψ π1 (θ0 )
= [specific version of π1 (ψ|θ0 )
π1 (θ, ψ)f (x|θ, ψ) dψdθ π1 (θ0 )
and arbitrary version of π1 (θ0 )]
π1 (θ0 , ψ)f (x|θ0 , ψ) dψ
= [specific version of π1 (θ0 , ψ)]
m1 (x)π1 (θ0 )
π1 (θ0 |x)
= [version dependent]
π1 (θ0 )
42. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
43. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Choice of density version
c Dickey’s (1971) condition is not a condition:
If
π1 (θ0 |x) π0 (ψ)f (x|θ0 , ψ) dψ
=
π1 (θ0 ) m1 (x)
is chosen as a version, then Savage–Dickey’s representation holds
44. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
45. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Measure-theoretic aspects
Savage–Dickey paradox
Verdinelli-Wasserman extension:
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
similarly depends on choices of versions...
...but Monte Carlo implementation relies on specific versions of all
densities without making mention of it
[Chen, Shao & Ibrahim, 2000]
46. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose the choice of version
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
47. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
A computational exploitation
Starting from the (instrumental) prior
π1 (θ, ψ) = π1 (θ)π0 (ψ)
˜
define the associated posterior
π1 (θ, ψ|x) = π0 (ψ)π1 (θ)f (x|θ, ψ) m1 (x)
˜ ˜
and impose the choice of version
π1 (θ0 |x)
˜ π0 (ψ)f (x|θ0 , ψ) dψ
=
π0 (θ0 ) m1 (x)
˜
Then
π1 (θ0 |x) m1 (x)
˜ ˜
B01 =
π1 (θ0 ) m1 (x)
48. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
First ratio
If (θ(1) , ψ (1) ), . . . , (θ(T ) , ψ (T ) ) ∼ π (θ, ψ|x), then
˜
1
π1 (θ0 |x, ψ (t) )
˜
T t
converges to π1 (θ0 |x) provided the right version is used in θ0
˜
π1 (θ0 )f (x|θ0 , ψ)
π1 (θ0 |x, ψ) =
˜
π1 (θ)f (x|θ, ψ) dθ
49. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
50. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Rao–Blackwellisation with latent variables
When π1 (θ0 |x, ψ) unavailable, replace with
˜
T
1
π1 (θ0 |x, z (t) , ψ (t) )
˜
T
t=1
via data completion by latent variable z such that
f (x|θ, ψ) = ˜
f (x, z|θ, ψ) dz
˜
and that π1 (θ, ψ, z|x) ∝ π0 (ψ)π1 (θ)f (x, z|θ, ψ) available in closed
˜
form, including the normalising constant, based on version
π1 (θ0 |x, z, ψ)
˜ ˜
f (x, z|θ0 , ψ)
= .
π1 (θ0 ) ˜
f (x, z|θ, ψ)π1 (θ) dθ
51. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
Use of the bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
52. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (1)
Since m1 (x)/m1 (x) is unknown, apparent failure!
˜
Use of the bridge identity
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
Eπ1 (θ,ψ|x)
˜
= Eπ1 (θ,ψ|x)
˜
=
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
to (biasedly) estimate m1 (x)/m1 (x) by
˜
T
π1 (ψ (t) |θ(t) )
T
t=1
π0 (ψ (t) )
based on the same sample from π1 .
˜
53. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
the ratio estimate
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
54. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Bridge revival (2)
Alternative identity
π0 (ψ)π1 (θ)f (x|θ, ψ) π0 (ψ) m1 (x)
˜
Eπ1 (θ,ψ|x) = Eπ1 (θ,ψ|x) =
π1 (θ, ψ)f (x|θ, ψ) π1 (ψ|θ) m1 (x)
¯ ¯
suggests using a second sample (θ(t) , ψ (t) , z (t) ) ∼ π1 (θ, ψ|x) and
the ratio estimate
T
1 ¯ ¯ ¯
π0 (ψ (t) ) π1 (ψ (t) |θ(t) )
T
t=1
Resulting unbiased estimate:
(t) , ψ (t) ) T ¯
1 t π1 (θ0 |x, z
˜ 1 π0 (ψ (t) )
B01 = ¯ ¯
T π1 (θ0 ) T
t=1
π1 (ψ (t) |θ(t) )
55. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Difference with Verdinelli–Wasserman representation
The above leads to the representation
π1 (θ0 |x) π1 (θ,ψ|x) π0 (ψ)
˜
B01 = E
π1 (θ0 ) π1 (ψ|θ)
shows how our approach differs from Verdinelli and Wasserman’s
π1 (θ0 |x) π1 (ψ|x,θ0 ,x) π0 (ψ)
B01 = E
π1 (θ0 ) π1 (ψ|θ0 )
[for referees only!!]
56. On resolving the Savage–Dickey paradox
The Savage–Dickey ratio
Computational implications
Diabetes in Pima Indian women (cont’d)
Comparison of the variation of the Bayes factor approximations
based on 100 replicas for 20, 000 simulations for a simulation from
the above importance, Chib’s, Savage–Dickey’s and bridge
samplers
q
3.4
3.2
3.0
2.8
q
IS Chib Savage−Dickey Bridge