This document discusses Bayesian methods for model choice and calculating Bayes factors. It explains that the Bayes factor is used to compare two models given data, and is equal to the ratio of the marginal likelihoods of the two models. Analytical and Monte Carlo methods are described for approximating the marginal likelihoods, including importance sampling. Interpreting the log Bayes factor using Jeffrey's scale is also covered.
1. Bayesian Case Studies, week 2
Robin J. Ryder
14 January 2013
Robin J. Ryder Bayesian Case Studies, week 2
2. Reminder: Poisson model, Conjugate Gamma prior
For the Poisson model Yi ∼ Poisson(λ) with a Γ(a, b) prior on λ,
the posterior is
n
π(λ|Y ) ∼ Γ(a + yi , b + n)
1
Robin J. Ryder Bayesian Case Studies, week 2
3. Model choice
We have an extra binary variable Zi . We would like to check
whether Yi depends on Zi , and therefore need to choose between
two models:
M1 M2
Yi |Zi = k ∼i.i.d P(λk )
Yi ∼i.i.d P(λ)
λ1 ∼ Γ(a, b)
λ ∼ Γ(a, b)
λ2 ∼ Γ(a, b)
Robin J. Ryder Bayesian Case Studies, week 2
4. The model index is a parameter
We now consider an extra parameter M ∈ {1, 2} which indicates
the model index. We can put a prior on M, for example a uniform
prior: P[M = k] = 1/2. Inside model k, we note the parameters
θk and the prior on θk is noted πk .
We are then interested in the posterior distribution
P[M = k|y ] ∝ P[M = k] L(θk |y )πk (θk )dθk
Robin J. Ryder Bayesian Case Studies, week 2
5. Bayes factor
The evidence for or against a model given data is summarized in
the Bayes factor:
P[M = 2|y ]/P[M = 1|y ]
B21 (y ) = ]
P[M = 2]/P[M = 1]
m2 (y )
=
m1 (y )
where
mk (y ) = L(θk |y )πk (θk )dθk
Θk
Robin J. Ryder Bayesian Case Studies, week 2
6. Bayes factor
Note that the quantity
mk (y ) = L(θk |y )πk (θk )dθk
Θk
corresponds to the normalizing constant of the posterior when we
write
π(θk |y ) ∝ L(θk |y )πk (θk )
Robin J. Ryder Bayesian Case Studies, week 2
7. Interpreting the Bayes factor
Jeffrey’s scale of evidenc states that
If log10 (B21 ) is between 0 and 0.5, then the evidence in favor
of model 2 is weak
between 0.5 and 1, it is substantial
between 1 and 2, it is strong
above 2, it is decisive
(and symmetrically for negative values)
Robin J. Ryder Bayesian Case Studies, week 2
8. Analytical value
Remember that ∞
Γ(a)
λa−1 e −bλ dλ =
0 ba
Robin J. Ryder Bayesian Case Studies, week 2
9. Analytical value
Remember that ∞
Γ(a)
λa−1 e −bλ dλ =
0 ba
Thus
b a Γ(a + yi )
m1 (y ) =
Γ(a) (b + n)a+ yi
and
b 2a Γ(a + yiH ) Γ(a + yiF )
m2 (y ) =
Γ(a)2 (b + nH )a+ yiH (b + nF )a+ yiF
Robin J. Ryder Bayesian Case Studies, week 2
10. Monte Carlo
Let
I= h(x)g (x)dx
where g is a density. Then take x1 , . . . , xN iid from g and we have
ˆ 1
IMC =
N h(xi )
N
which converges (almost surely) to I.
When implementing this, you need to check convergence!
Robin J. Ryder Bayesian Case Studies, week 2
11. Harmonic mean estimator
Take a sample from the posterior distribution π1 (θ1 |y ). Note that
1 1
Eπ 1 |y = π1 (θ1 |y )dθ1
L(θ1 |y ) L(θ1 |y )
1 π1 (θ1 )L(θ1 |y )
= dθ1
L(θ1 |y ) m1 (y )
1
=
m1 (y )
thus giving an easy way to estimate m1 (y ) by Monte Carlo.
However, this method is in general not advised, since the
associated estimator has infinite variance.
Robin J. Ryder Bayesian Case Studies, week 2
12. Importance sampling
I= h(x)g (x)dx
If we wish to perform Monte Carlo but cannot easily sample from
g , we can re-write
h(x)g (x)
I= γ(x)dx
γ(x)
where γ is easy to sample from. Then take x1 , . . . , xN iid from γ
and we have
ˆ 1 h(xi )g (xi )
IIS =
N
N γ(xi )
Robin J. Ryder Bayesian Case Studies, week 2