Bayesian case studies, practical 2

•

1 like•6,584 views

This document discusses Bayesian methods for model choice and calculating Bayes factors. It explains that the Bayes factor is used to compare two models given data, and is equal to the ratio of the marginal likelihoods of the two models. Analytical and Monte Carlo methods are described for approximating the marginal likelihoods, including importance sampling. Interpreting the log Bayes factor using Jeffrey's scale is also covered.

Bayesian Case Studies, week 2

Robin J. Ryder

14 January 2013

Robin J. Ryder Bayesian Case Studies, week 2

Reminder: Poisson model, Conjugate Gamma prior

For the Poisson model Yi ∼ Poisson(λ) with a Γ(a, b) prior on λ,
the posterior is
n
π(λ|Y ) ∼ Γ(a + yi , b + n)
1

Robin J. Ryder Bayesian Case Studies, week 2

Model choice

We have an extra binary variable Zi . We would like to check
whether Yi depends on Zi , and therefore need to choose between
two models:
M1 M2
Yi |Zi = k ∼i.i.d P(λk )
Yi ∼i.i.d P(λ)
λ1 ∼ Γ(a, b)
λ ∼ Γ(a, b)
λ2 ∼ Γ(a, b)

Robin J. Ryder Bayesian Case Studies, week 2

The model index is a parameter

We now consider an extra parameter M ∈ {1, 2} which indicates
the model index. We can put a prior on M, for example a uniform
prior: P[M = k] = 1/2. Inside model k, we note the parameters
θk and the prior on θk is noted πk .
We are then interested in the posterior distribution

P[M = k|y ] ∝ P[M = k] L(θk |y )πk (θk )dθk

Robin J. Ryder Bayesian Case Studies, week 2

Bayes factor

The evidence for or against a model given data is summarized in
the Bayes factor:

P[M = 2|y ]/P[M = 1|y ]
B21 (y ) = ]
P[M = 2]/P[M = 1]
m2 (y )
=
m1 (y )
where
mk (y ) = L(θk |y )πk (θk )dθk
Θk

Robin J. Ryder Bayesian Case Studies, week 2

Bayes factor

Note that the quantity

mk (y ) = L(θk |y )πk (θk )dθk
Θk

corresponds to the normalizing constant of the posterior when we
write
π(θk |y ) ∝ L(θk |y )πk (θk )

Robin J. Ryder Bayesian Case Studies, week 2

Interpreting the Bayes factor

Jeﬀrey’s scale of evidenc states that
If log10 (B21 ) is between 0 and 0.5, then the evidence in favor
of model 2 is weak
between 0.5 and 1, it is substantial
between 1 and 2, it is strong
above 2, it is decisive
(and symmetrically for negative values)

Robin J. Ryder Bayesian Case Studies, week 2

Analytical value

Remember that ∞
Γ(a)
λa−1 e −bλ dλ =
0 ba

Robin J. Ryder Bayesian Case Studies, week 2

Analytical value

Remember that ∞
Γ(a)
λa−1 e −bλ dλ =
0 ba
Thus
b a Γ(a + yi )
m1 (y ) =
Γ(a) (b + n)a+ yi
and
b 2a Γ(a + yiH ) Γ(a + yiF )
m2 (y ) =
Γ(a)2 (b + nH )a+ yiH (b + nF )a+ yiF

Robin J. Ryder Bayesian Case Studies, week 2

Monte Carlo

Let
I= h(x)g (x)dx

where g is a density. Then take x1 , . . . , xN iid from g and we have

ˆ 1
IMC =
N h(xi )
N
which converges (almost surely) to I.
When implementing this, you need to check convergence!

Robin J. Ryder Bayesian Case Studies, week 2

Harmonic mean estimator

Take a sample from the posterior distribution π1 (θ1 |y ). Note that

1 1
Eπ 1 |y = π1 (θ1 |y )dθ1
L(θ1 |y ) L(θ1 |y )
1 π1 (θ1 )L(θ1 |y )
= dθ1
L(θ1 |y ) m1 (y )
1
=
m1 (y )

thus giving an easy way to estimate m1 (y ) by Monte Carlo.
However, this method is in general not advised, since the
associated estimator has inﬁnite variance.

Robin J. Ryder Bayesian Case Studies, week 2

Importance sampling

I= h(x)g (x)dx

If we wish to perform Monte Carlo but cannot easily sample from
g , we can re-write

h(x)g (x)
I= γ(x)dx
γ(x)

where γ is easy to sample from. Then take x1 , . . . , xN iid from γ
and we have
ˆ 1 h(xi )g (xi )
IIS =
N
N γ(xi )

Robin J. Ryder Bayesian Case Studies, week 2

What's hot

Modular representation theory of finite groupsSpringer

Lattices of Lie groups acting on the complex projective spaceRene García

Signal Processing Course : Sparse Regularization of Inverse ProblemsGabriel Peyré

Principal component analysis and matrix factorizations for learning (part 2) ...zukun

IntegrationOladokun Sulaiman Olanrewaju

Comparing estimation algorithms for block clustering modelsBigMC

Supplement to local voatilityIlya Gikhman

Computation of the marginal likelihoodBigMC

Log charlesurhommemega

Lesson 6 1nscross40

Csr2011 june16 12_00_wagnerCSR2011

An introduction to quantum stochastic calculusSpringer

Gentle Introduction to Dirichlet ProcessesYap Wooi Hen

Igv2008shimpeister

Matrix Models of 2D String Theory in Non-trivial BackgroundsUtrecht University

On Convolution of Graph Signals and Deep Learning on Graph DomainsJean-Charles Vialatte

Montpellier Math ColloquiumChristian Robert

Practical computation of Hecke operatorsMathieu Dutour Sikiric

Lista exercintegraisUniversidade de São Paulo USP

What's hot (19)

Modular representation theory of finite groups

Lattices of Lie groups acting on the complex projective space

Signal Processing Course : Sparse Regularization of Inverse Problems

Principal component analysis and matrix factorizations for learning (part 2) ...

Integration

Comparing estimation algorithms for block clustering models

Supplement to local voatility

Computation of the marginal likelihood

Log char

Lesson 6 1

Csr2011 june16 12_00_wagner

An introduction to quantum stochastic calculus

Gentle Introduction to Dirichlet Processes

Igv2008

Matrix Models of 2D String Theory in Non-trivial Backgrounds

On Convolution of Graph Signals and Deep Learning on Graph Domains

Montpellier Math Colloquium

Practical computation of Hecke operators

Lista exercintegrais

Similar to Bayesian case studies, practical 2

Lagrangechenaren

Holographic Cotton TensorSebastian De Haro

A new class of a stable implicit schemes for treatment of stiffAlexander Decker

Physics of Algorithms Talkjasonj383

YSC 2013Adrien Ickowicz

Application of matrix algebra to multivariate data using standardize scoresAlexander Decker

11.application of matrix algebra to multivariate data using standardize scoresAlexander Decker

Multiple integralsTarun Gehlot

Fuzzy directed divergence and image segmentationSurender Singh

04 structured prediction and energy minimization part 1zukun

01 graphical modelszukun

Fractional hot deck imputation - Jae KimJae-kwang Kim

Dual Gravitons in AdS4/CFT3 and the Holographic Cotton TensorSebastian De Haro

Bayesian regression models and treed Gaussian process modelsTommaso Rigon

www.ijerd.comIJERD Editor

M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...SEENET-MTP

1 cb02e45d01nehagurjar

Gz3113501354IJERA Editor

Similar to Bayesian case studies, practical 2 (20)

Lagrange

Holographic Cotton Tensor

A new class of a stable implicit schemes for treatment of stiff

Physics of Algorithms Talk

YSC 2013

Application of matrix algebra to multivariate data using standardize scores

11.application of matrix algebra to multivariate data using standardize scores

Multiple integrals

Fuzzy directed divergence and image segmentation

04 structured prediction and energy minimization part 1

01 graphical models

Fractional hot deck imputation - Jae Kim

Dual Gravitons in AdS4/CFT3 and the Holographic Cotton Tensor

Bayesian regression models and treed Gaussian process models

www.ijerd.com

M. Visinescu - Higher Order First Integrals, Killing Tensors, Killing-Maxwell...

1 cb02e45d01

Gz3113501354

More from Robin Ryder

Bayesian Methods for Historical LinguisticsRobin Ryder

A phylogenetic model of language diversificationRobin Ryder

Statistical Methods in Historical LinguisticsRobin Ryder

Introduction à ABCRobin Ryder

On the convergence properties of the Wang-Landau algorithmRobin Ryder

Bayesian case studies, practical 1Robin Ryder

Modèles phylogéniques de la diversification des languesRobin Ryder

Talk at Institut Jean Nicod on 6 October 2010Robin Ryder

Phylogenetic models and MCMC methods for the reconstruction of language historyRobin Ryder

Modèles phylogénétiques de la diversification des languesRobin Ryder

Approximate Bayesian Computation (ABC)Robin Ryder

More from Robin Ryder (11)

Bayesian Methods for Historical Linguistics

A phylogenetic model of language diversification

Statistical Methods in Historical Linguistics

Introduction à ABC

On the convergence properties of the Wang-Landau algorithm

Bayesian case studies, practical 1

Modèles phylogéniques de la diversification des langues

Talk at Institut Jean Nicod on 6 October 2010

Phylogenetic models and MCMC methods for the reconstruction of language history

Modèles phylogénétiques de la diversification des langues

Approximate Bayesian Computation (ABC)

Bayesian case studies, practical 2

1. Bayesian Case Studies, week 2 Robin J. Ryder 14 January 2013 Robin J. Ryder Bayesian Case Studies, week 2

2. Reminder: Poisson model, Conjugate Gamma prior For the Poisson model Yi ∼ Poisson(λ) with a Γ(a, b) prior on λ, the posterior is n π(λ|Y ) ∼ Γ(a + yi , b + n) 1 Robin J. Ryder Bayesian Case Studies, week 2

3. Model choice We have an extra binary variable Zi . We would like to check whether Yi depends on Zi , and therefore need to choose between two models: M1 M2 Yi |Zi = k ∼i.i.d P(λk ) Yi ∼i.i.d P(λ) λ1 ∼ Γ(a, b) λ ∼ Γ(a, b) λ2 ∼ Γ(a, b) Robin J. Ryder Bayesian Case Studies, week 2

4. The model index is a parameter We now consider an extra parameter M ∈ {1, 2} which indicates the model index. We can put a prior on M, for example a uniform prior: P[M = k] = 1/2. Inside model k, we note the parameters θk and the prior on θk is noted πk . We are then interested in the posterior distribution P[M = k|y ] ∝ P[M = k] L(θk |y )πk (θk )dθk Robin J. Ryder Bayesian Case Studies, week 2

5. Bayes factor The evidence for or against a model given data is summarized in the Bayes factor: P[M = 2|y ]/P[M = 1|y ] B21 (y ) = ] P[M = 2]/P[M = 1] m2 (y ) = m1 (y ) where mk (y ) = L(θk |y )πk (θk )dθk Θk Robin J. Ryder Bayesian Case Studies, week 2

6. Bayes factor Note that the quantity mk (y ) = L(θk |y )πk (θk )dθk Θk corresponds to the normalizing constant of the posterior when we write π(θk |y ) ∝ L(θk |y )πk (θk ) Robin J. Ryder Bayesian Case Studies, week 2

7. Interpreting the Bayes factor Jeﬀrey’s scale of evidenc states that If log10 (B21 ) is between 0 and 0.5, then the evidence in favor of model 2 is weak between 0.5 and 1, it is substantial between 1 and 2, it is strong above 2, it is decisive (and symmetrically for negative values) Robin J. Ryder Bayesian Case Studies, week 2

8. Analytical value Remember that ∞ Γ(a) λa−1 e −bλ dλ = 0 ba Robin J. Ryder Bayesian Case Studies, week 2

9. Analytical value Remember that ∞ Γ(a) λa−1 e −bλ dλ = 0 ba Thus b a Γ(a + yi ) m1 (y ) = Γ(a) (b + n)a+ yi and b 2a Γ(a + yiH ) Γ(a + yiF ) m2 (y ) = Γ(a)2 (b + nH )a+ yiH (b + nF )a+ yiF Robin J. Ryder Bayesian Case Studies, week 2

10. Monte Carlo Let I= h(x)g (x)dx where g is a density. Then take x1 , . . . , xN iid from g and we have ˆ 1 IMC = N h(xi ) N which converges (almost surely) to I. When implementing this, you need to check convergence! Robin J. Ryder Bayesian Case Studies, week 2

11. Harmonic mean estimator Take a sample from the posterior distribution π1 (θ1 |y ). Note that 1 1 Eπ 1 |y = π1 (θ1 |y )dθ1 L(θ1 |y ) L(θ1 |y ) 1 π1 (θ1 )L(θ1 |y ) = dθ1 L(θ1 |y ) m1 (y ) 1 = m1 (y ) thus giving an easy way to estimate m1 (y ) by Monte Carlo. However, this method is in general not advised, since the associated estimator has inﬁnite variance. Robin J. Ryder Bayesian Case Studies, week 2

12. Importance sampling I= h(x)g (x)dx If we wish to perform Monte Carlo but cannot easily sample from g , we can re-write h(x)g (x) I= γ(x)dx γ(x) where γ is easy to sample from. Then take x1 , . . . , xN iid from γ and we have ˆ 1 h(xi )g (xi ) IIS = N N γ(xi ) Robin J. Ryder Bayesian Case Studies, week 2

Bayesian case studies, practical 2

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Bayesian case studies, practical 2

Similar to Bayesian case studies, practical 2 (20)

More from Robin Ryder

More from Robin Ryder (11)

Bayesian case studies, practical 2