Computing Bayesian posterior with empirical likelihood in population genetics

Computing Bayesian posterior with empirical
likelihood in population genetics
Pierre Pudlo
INRA & U. Montpellier 2
SSB, 18 sept. 2012
Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 1 / 29

Table of contents
1 Likelihood free methods
2 Empirical likelihood
3 Population genetics
4 Numerical experiments

Table of contents

When the likelihood is not completely known
Likelihood intractable: `(yj) =
Z
U
`(y,uj)du
where u is a latent process
Pudlo (INRA U. Montpellier 2) ABCel GenPop SSB 09/2012 4 / 29

Z
U
`(y,uj)du
I Hidden Markov and other dynamic models: latent process
which is not observed
,! Classical answer: Markov chain Monte Carlo,. . .

Z
U
`(y,uj)du
I Hidden Markov and other dynamic models: latent process
which is not observed
,! Classical answer: Markov chain Monte Carlo,. . .
I Population genetics: the whole gene genealogy is unobserved
Likelihood is an integral over
I all possible gene genealogies
I all possible mutations along the genealogies
,! Classical answer: Approximate Bayesian computation (ABC)

ABC in a nutshell
Posterior distribution is the conditional distribution of
()`(xj) ()
knowing that x = xobs
Methodology
Draw a (large) set of particles (i, xi) from () and use a
nonparametric estimate of the conditional density
(jxobs) / ()`(xobsj)
Seminal papers
I Tavar ´ e, Balding, Griffith and Donnelly (1997, Genetics)
I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular
Biology and Evolution)

ABC in a nutshell
()`(xj) ()
Methodology
Shortcomings.
I time consuming – If simulation of the latent process is not
straightforward

ABC in a nutshell
()`(xj) ()
Methodology
Shortcomings.
I time consuming – If simulation of the latent process is not
straightforward
I curse of dimensionality vs. lost of information –
I If x lies in a high dimensional space X (often), we are unable to
estimate of the conditional density
I Hence, we project the (observed and simulated) datasets on a
space with smaller dimension (trough summary statistics)
: X ! Rd (summary statistics)

You went to school to learn, girl (. . . )
Why 2 plus 2 makes four
Now, now, now, I’m gonna teach you (. . . )
All you gotta do is repeat after me!
A, B, C!
It’s easy as 1, 2, 3!
Or simple as Do, re, mi! (. . . )
Surveys
I Beaumont (2010, Annual Review of Ecology, Evolution, and
Systematics)
I Lopes, Beaumont (2010, Genetics and Evolution)
I Csill ` ery, Blum, Gaggiotti, Franc¸ois (2010, Trends in Ecology and
Evolution)
I Marin, Pudlo, Robert, Ryder (to appear, Statis. and Comput.)

Table of contents

Empirical likelihood (EL)
Owen (1988, Biometrika),
Owen (2001, Chapman
Hall)
Data
Empirical
Likelihood
Parameters'
definition in
term of a
moment
constraint
x = (x1, ..., xK)
Profiled likelihood
x
xi
Black box
Input:
I Independent data
x = (x1, . . . , xK)
I Definition of the parameters
E(h(xi,)) = 0
Output:
I A profiled “likelihood function”
Lel(jx)
EL do not require a fully defined and
often complex (hence debatable)
parametric model

Owen (1988, Biometrika), Owen (2001, Chapman Hall)
Assume that the dataset x is composed of n independent replicates
x = (x1, . . . , xn) of some X F
Generalized moment condition model
The law F of X satisfy
EF

h(X,)

= 0,
where h is a known function, and an unknown parameter
Empirical likelihood
Lel(jx) = max
p
Yn
i=1
pi ()
for all p such that 0 6 pi 6 1,
P
pi = 1,
P
i pih(xi,) = 0.

Empirical likelihood
Lel(jx) = max
p
Yn
i=1
pi ()
for all p such that 0 6 pi 6 1,
P
pi = 1,
P
i pih(xi,) = 0.
Why ()?
(1) L(p) :=
Yn
i=1
pi is the likehood of
discrete distribution p =
P
i pixi
(2)
X
i
pih(xi,) = 0 implies that p
fulfils the moment condition
How to compute?
Newton-Lagrange scheme.
The constraint is linear (in p)
See Owen (2001) chap. 12
package emplik in R

EL examples
Data
Empirical
Likelihood
Parameters'
definition in
term of a
moment
constraint
x = (x1, ..., xK)
Profiled likelihood
x
xi
Simple case: = mean
h(xi,) = xi -
I Lel(jx) is maximal at
^
= 1
K(x1 + + xK)
I if enough data (K large),
if data from a true parametric
model,
then Lel true likelihood

EL examples
Data
Empirical
Likelihood
Parameters'
definition in
term of a
moment
constraint
x = (x1, ..., xK)
Profiled likelihood
x
xi
h(xi,) = xi -
red = param. likelihood
black = Lel
Example 1: Gaussian model
K = 20
1.0 1.5 2.0 2.5 3.0
0.0 0.2 0.4 0.6 0.8 1.0
theta
el.normal
K = 200
1.0 1.5 2.0 2.5 3.0
0.0 0.2 0.4 0.6 0.8 1.0
theta
el.normal

EL examples
Data
Empirical
Likelihood
Parameters'
definition in
term of a
moment
constraint
x = (x1, ..., xK)
Profiled likelihood
x
xi
red = true likelihood
blue = Gaussian likelihood
black = Lel
Example 2: non Gaussian model
K = 200
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0 0.2 0.4 0.6 0.8 1.0
theta
el.normal

EL examples
Data
Empirical
Likelihood
Parameters'
definition in
term of a
moment
constraint
x = (x1, ..., xK)
Profiled likelihood
x
xi
red = true likelihood
blue = Gaussian likelihood
black = Lel
Example 2: non Gaussian model
K = 200
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0.0 0.2 0.4 0.6 0.8 1.0
theta
el.normal
I Lel is better than a false model.

Table of contents

Neutral model at a given microsatellite locus, in a
closed panmictic population at equilibrium
Sample of 8 genes

Kingman’s genealogy
When time axis is
normalized,
T(k) Exp(k(k - 1)=2)

When time axis is
normalized,
T(k) Exp(k(k - 1)=2)
Mutations according to
the Simple stepwise
Mutation Model (SMM)
date of the mutations
Poisson process with
intensity =2 over the
branches

Observations: leafs of the tree
^ = ?
When time axis is
normalized,
T(k) Exp(k(k - 1)=2)
Mutations according to
the Simple stepwise
Mutation Model (SMM)
date of the mutations
Poisson process with
intensity =2 over the
branches
MRCA = 100
independent mutations:
1 with pr. 1=2

Much more interesting models. . .
I several independent locus
Independent gene genealogies and mutations
I different populations
linked by an evolutionary scenario made of divergences,
admixtures, migrations between populations, etc.
I larger sample size
usually between 50 and 100 genes
A typical evolutionary scenario:
MRCA
POP 0 POP 1 POP 2
2
1

Moment condition in population genetics?
Main difficulty
Derive a constraint
EF

h(X,)

= 0,
on the parameters of interest when X is the genotypes of the sample
of individuals at a given locus
E.g., in phylogeography, is composed of
I dates of divergence between populations,
I ratio of population sizes,
I mutation rates, etc.
None of them are moments of the distribution of the allelic states of the
sample

Moment condition in population genetics?
Main difficulty
Derive a constraint
EF

h(X,)

= 0,
on the parameters of interest when X is the genotypes of the sample
of individuals at a given locus
E.g., in phylogeography, is composed of
I dates of divergence between populations,
I ratio of population sizes,
I mutation rates, etc.
None of them are moments of the distribution of the allelic states of the
sample
,! h = pairwise composite scores whose zero is the pairwise
maximum likelihood estimator

Pairwise composite likelihood?
The intra-locus pairwise likelihood
`2(xkj) =
Y
ij
k, xj
`2(xi
kj)
with x1
k, . . . , xnk
: allelic states of the gene sample at the k-th locus
Sample of size 2
) coalescent tree is a single line joining xi
k to xj
k
k, xj
) closed form of `2(xi
kj)
The pairwise score function
r log `2(xkj) =
X
ij
k, xj
r log `2(xi
kj)

Composite lik , approximation of lik
Composite likelihoods are often much more narrow than the
distribution of the model (See, e.g., Cooley et al., 2012)
Hence,
I produces too narrow posterior distributions
I confidence intervals or credibility intervals are not reliable
But,
E(r log `2(xj)) =
Z
r

log `2(xj)

`(xj)dx
=
X
ij
Z
r

log `2(xi, xjj)

`2(xi, xjj)dxidxj
=
X
ij
Z
r`2(xi, xjj)
`2(xi, xjj)
`2(xi, xjj)dxidxj
=
X
ij
r
Z
`2(xi, xjj)dxidxj

= 0
SPaufdelo (wINRitAh UE. MLonbtpeellicera2)use we only AuBsCeel GpenoPospition of its mode SSB 09/2012 19 / 29

Pairwise likelihood: a simple case
Assumptions
I sample closed, panmictic
population at equilibrium
I marker: microsatellite
I mutation rate: =2
k et xj
if xi
k are two genes of the
sample,
`2(xi
k, xj
kj) depends only on
k - xj
= xi
k

Assumptions
I mutation rate: =2
k et xj
if xi
sample,
`2(xi
k, xj
kj) depends only on
k - xj
= xi
k
`2(j) =
1
p
1 + 2
()jj
with
() =

1 + +
p
1 + 2

Assumptions
I mutation rate: =2
k et xj
if xi
sample,
`2(xi
k, xj
kj) depends only on
k - xj
= xi
k
`2(j) =
1
p
1 + 2
()jj
with
() =

1 + +
p
1 + 2
Pairwise score function
@ log `2(j) =
1
-
1 + 2
+
jj

p
1 + 2

Pairwise likelihood: 2 diverging populations
MRCA
POP a POP b

Assumptions
I : divergence date of pop.
a and b
I =2: mutation rate
Let xi
k and xj
k be two genes
coming resp. from pop. a and b
Set = xi
k - xj
k.
Then `2(j, ) =
e-
p
1 + 2
X+1
k=-1
()jkjI-k().
where
In(z) nth-order modified Bessel
function of the first kind

Pairwise likelihood: 2 diverging populations
MRCA
POP a POP b

Assumptions
I : divergence date of pop.
a and b
I =2: mutation rate
Let xi
k and xj
k be two genes
coming resp. from pop. a and b
Set = xi
k - xj
k.
A 2-dim score function
@ log `2(j, ) =
- +

2
`2( - 1j, ) + `2( + 1j, )
`2(j, )
@ log `2(j, ) =
1
- -
1 + 2
+

2
`2( - 1j, ) + `2( + 1j, )
`2(j, )
+
q(j, )
`2(j, )
where
q(j, ) :=
e-
p
1 + 2
0()
()
1X
k=-1
jkj()jkjI-k()

Table of contents

A first experiment
Evolutionary scenario:
MRCA
POP 0 POP 1

Dataset:
I 50 genes per populations,
I 100 microsat. loci
Assumptions:
I Ne identical over all
populations
I = (log10 , log10 )
I uniform prior over
(-1., 1.5) (-1., 1.)

A first experiment
MRCA
POP 0 POP 1

Dataset:
Assumptions:
populations
I = (log10 , log10 )
I uniform prior over
(-1., 1.5) (-1., 1.)
Comparison of the original
ABC with ABCel
ESS=7034
log(theta)
Density
0.00 0.05 0.10 0.15 0.20 0.25
0 5 10 15
log(tau1)
Density
−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3
0 1 2 3 4 5 6 7
histogram = ABCel
curve = original ABC
vertical line = “true” parameter

ABC vs. ABCel on 100 replicates of the 1st experiment
Accuracy:
log10 log10
ABC ABCel ABC ABCel
(1) 0.097 0.094 0.315 0.117
(2) 0.071 0.059 0.272 0.077
(3) 0.68 0.81 1.0 0.80
(1) Root Mean Square Error of the posterior mean
(2) Median Absolute Deviation of the posterior median
(3) Coverage of the credibility interval of probability 0.8
Computation time: on a recent 6-core computer (C++/OpenMP)
I ABC 4 hours
I ABCel 2 minutes

Second experiment
MRCA
POP 0 POP 1 POP 2
2
1
Dataset:
Assumptions:
populations
I = log10(, 1, 2)
I non-informative uniform prior

Second experiment
MRCA
POP 0 POP 1 POP 2
2
1
Dataset:
Assumptions:
populations
I = log10(, 1, 2)
I non-informative uniform prior
Comparison of the original ABC
with ABCel
histogram = ABCel
curve = original ABC
vertical line = “true” parameter

ABC vs. ABCel on 100 replicates of the 2nd
experiment
Accuracy:
log10 log10 1 log10 2
ABC ABCel ABC ABCel ABC ABCel
(1) 0.0059 0.0794 0.472 0.483 29.3 4.76
(2) 0.048 0.053 0.32 0.28 4.13 3.36
(3) 0.79 0.76 0.88 0.76 0.89 0.79
(1) Root Mean Square Error of the posterior mean
(2) Median Absolute Deviation of the posterior median
(3) Coverage of the credibility interval of probability 0.8
Computation time: on a recent 6-core computer (C++/OpenMP)
I ABC 6 hours
I ABCel 8 minutes

Why?
On large datasets, ABCel gives more accurate results than ABC
ABC simplifies the dataset through summary statistics
Due to the large dimension of x, the original ABC algorithm estimates

(xobs)

,
where (xobs) is some (non-linear) projection of the observed dataset
on a space with smaller dimension
,! Some information is lost
ABCelsimplifies the model through a generalized moment condition
model.
,! Here, the moment condition model is based on pairwise
composition likelihood

Computing Bayesian posterior with empirical likelihood in population genetics

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Computing Bayesian posterior with empirical likelihood in population genetics

Similar to Computing Bayesian posterior with empirical likelihood in population genetics (20)

Recently uploaded

Recently uploaded (20)

Computing Bayesian posterior with empirical likelihood in population genetics