1. ABC methodology and applications
Christian P. Robert
Universit´e Paris-Dauphine, University of Warwick, & IUF
´Ecole d’Hiver, Les Diablerets, CH, Feb. 4-8 2016
2. Outline
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
3. A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
4. A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
• sounds like an impossible task
• one observation x = 11 and two unknowns, nsocks and npairs
• writing the likelihood is a challenge [exercise]
5. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
6. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
Resolution as
pj =
n
j
22r−2j n − j
2r − 2j
2n
2r
being probability of obtaining js pairs among those 2r shoes, or for
an odd number t of shoes
pj = 2t−2j n
j
n − j
t − 2j
2n
t
7. Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
If one draws 11 socks out of m socks made of f orphans and g
pairs, with f + 2g = m, number k of socks from the orphan group
is hypergeometric H (11, m, f ) and probability to observe 11
orphan socks total is
11
k=0
f
k
2g
11−k
m
11
×
211−k g
11−k
2g
11−k
8. A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
9. A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
Rassmus’ reasoning
If you are a family of 3-4 persons then a guesstimate would be that
you have something like 15 pairs of socks in store. It is also
possible that you have much more than 30 socks. So as a prior for
nsocks I’m going to use a negative binomial with mean 30 and
standard deviation 15.
On npairs/2nsocks I’m going to put a Beta prior distribution that puts
most of the probability over the range 0.75 to 1.0,
[Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
10. Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1 generate new values
of nsocks and npairs,
2 generate a new
observation of X,
number of unique
socks out of 11.
11. Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1 generate new values
of nsocks and npairs,
2 generate a new
observation of X,
number of unique
socks out of 11.
3 accept the pair
(nsocks, npairs) if the
realisation of X is
equal to 11
12. Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Generations from π(nsocks, npairs) are accepted with probability
P {X = 11|(nsocks, npairs)}
13. Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Hence accepted values distributed from
π(nsocks, npairs) × P {X = 11|(nsocks, npairs)} = π(nsocks, npairs|X = 11)
14. Econ’ections
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
15. Usages of simulation in Econometrics
Similar exploration of simulation-based techniques in Econometrics
• Simulated method of moments
• Method of simulated moments
• Simulated pseudo-maximum-likelihood
• Indirect inference
[Gouri´eroux & Monfort, 1996]
16. Simulated method of moments
Given observations yo
1:n from a model
yt = r(y1:(t−1), t, θ) , t ∼ g(·)
simulate 1:n, derive
yt (θ) = r(y1:(t−1), t , θ)
and estimate θ by
arg min
θ
n
t=1
(yo
t − yt (θ))2
17. Simulated method of moments
Given observations yo
1:n from a model
yt = r(y1:(t−1), t, θ) , t ∼ g(·)
simulate 1:n, derive
yt (θ) = r(y1:(t−1), t , θ)
and estimate θ by
arg min
θ
n
t=1
yo
t −
n
t=1
yt (θ)
2
18. Method of simulated moments
Given a statistic vector K(y) with
Eθ[K(Yt)|y1:(t−1)] = k(y1:(t−1); θ)
find an unbiased estimator of k(y1:(t−1); θ),
˜k( t, y1:(t−1); θ)
Estimate θ by
arg min
θ
n
t=1
K(yt) −
S
s=1
˜k( s
t , y1:(t−1); θ)/S
[Pakes & Pollard, 1989]
19. Indirect inference
Minimise (in θ) the distance between estimators ˆβ based on
pseudo-models for genuine observations and for observations
simulated under the true model and the parameter θ.
[Gouri´eroux, Monfort, & Renault, 1993;
Smith, 1993; Gallant & Tauchen, 1996]
20. Indirect inference (PML vs. PSE)
Example of the pseudo-maximum-likelihood (PML)
ˆβ(y) = arg max
β
t
log f (yt|β, y1:(t−1))
leading to
arg min
θ
||ˆβ(yo
) − ˆβ(y1(θ), . . . , yS (θ))||2
when
ys(θ) ∼ f (y|θ) s = 1, . . . , S
21. Indirect inference (PML vs. PSE)
Example of the pseudo-score-estimator (PSE)
ˆβ(y) = arg min
β
t
∂ log f
∂β
(yt|β, y1:(t−1))
2
leading to
arg min
θ
||ˆβ(yo
) − ˆβ(y1(θ), . . . , yS (θ))||2
when
ys(θ) ∼ f (y|θ) s = 1, . . . , S
22. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
23. Consistent indirect inference
...in order to get a unique solution the dimension of
the auxiliary parameter β must be larger than or equal to
the dimension of the initial parameter θ. If the problem is
just identified the different methods become easier...
Consistency depending on the criterion and on the asymptotic
identifiability of θ
[Gouri´eroux, Monfort, 1996, p. 66]
24. AR(2) vs. MA(1) example
true (AR) model
yt = t − θ t−1
and [wrong!] auxiliary (MA) model
yt = β1yt−1 + β2yt−2 + ut
R code
x=eps=rnorm(250)
x[2:250]=x[2:250]-0.5*x[1:249]
simeps=rnorm(250)
propeta=seq(-.99,.99,le=199)
dist=rep(0,199)
bethat=as.vector(arima(x,c(2,0,0),incl=FALSE)$coef)
for (t in 1:199)
dist[t]=sum((as.vector(arima(c(simeps[1],simeps[2:250]-propeta[t]*
simeps[1:249]),c(2,0,0),incl=FALSE)$coef)-bethat)^2)
25. AR(2) vs. MA(1) example
One sample:
−1.0 −0.5 0.0 0.5 1.0
0.00.20.40.60.8
θ
distance
26. AR(2) vs. MA(1) example
Many samples:
0.2 0.4 0.6 0.8 1.0
0123456
27. Choice of pseudo-model
Pick model such that
1 ˆβ(θ) not flat
(i.e. sensitive to changes in θ)
2 ˆβ(θ) not dispersed (i.e. robust agains changes in ys(θ))
[Frigessi & Heggland, 2004]
28. ABC using indirect inference (1)
We present a novel approach for developing summary statistics
for use in approximate Bayesian computation (ABC) algorithms by
using indirect inference(...) In the indirect inference approach to
ABC the parameters of an auxiliary model fitted to the data become
the summary statistics. Although applicable to any ABC technique,
we embed this approach within a sequential Monte Carlo algorithm
that is completely adaptive and requires very little tuning(...)
[Drovandi, Pettitt & Faddy, 2011]
c Indirect inference provides summary statistics for ABC...
29. ABC using indirect inference (2)
...the above result shows that, in the limit as h → 0, ABC will
be more accurate than an indirect inference method whose auxiliary
statistics are the same as the summary statistic that is used for
ABC(...) Initial analysis showed that which method is more
accurate depends on the true value of θ.
[Fearnhead and Prangle, 2012]
c Indirect inference provides estimates rather than global inference...
30. Genetics of ABC
1 simulation-based methods in
Econometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
31. Genetic background of ABC
ABC is a recent computational technique that only requires a
generative model, i.e., being able to sample from the density f (·|θ)
This technique stemmed from population genetics models, about
15 years ago, and population geneticists still contribute
significantly to methodological developments of ABC.
[Griffith & al., 1997; Tavar´e & al., 1999]
32. Population genetics
[Part derived from the teaching material of Raphael Leblois, ENS Lyon, November 2010]
• Describe the genotypes, estimate the alleles frequencies,
determine their distribution among individuals, populations
and between populations;
• Predict and understand the evolution of gene frequencies in
populations as a result of various factors.
c Analyses the effect of various evolutive forces (mutation, drift,
migration, selection) on the evolution of gene frequencies in time
and space.
33. Wright-Fisher model
Le modèle de Wright-Fisher
•! En l’absence de mutation et de
sélection, les fréquences
alléliques dérivent (augmentent
et diminuent) inévitablement
jusqu’à la fixation d’un allèle
•! La dérive conduit donc à la
perte de variation génétique à
l’intérieur des populations
• A population of constant
size, in which individuals
reproduce at the same time.
• Each gene in a generation is
a copy of a gene of the
previous generation.
• In the absence of mutation
and selection, allele
frequencies derive inevitably
until the fixation of an
allele.
34. Coalescent theory
[Kingman, 1982; Tajima, Tavar´e, &tc]
!"#$%&'(('")**+$,-'".'"/010234%'".'5"*$*%()23$15"6"
!!"7**+$,-'",()5534%'" " "!"7**+$,-'"8",$)('5,'1,'"9"
"":";<;=>7?@<#" " " """"":"ABC7#?@>><#"
Coalescence theory interested in the genealogy of a sample of
genes back in time to the common ancestor of the sample.
35. Common ancestor
6
Timeofcoalescence
(T)
Modélisation du processus de dérive génétique
en “remontant dans le temps”
jusqu’à l’ancêtre commun d’un échantillon de gènes
Les différentes
lignées fusionnent
(coalescent) au fur
et à mesure que
l’on remonte vers le
passé
The different lineages merge when we go back in the past.
36. Neutral mutations
20
Sous l’hypothèse de neutralité des marqueurs génétiques étudiés,
les mutations sont indépendantes de la généalogie
i.e. la généalogie ne dépend que des processus démographiques
On construit donc la généalogie selon les paramètres
démographiques (ex. N),
puis on ajoute a posteriori les
mutations sur les différentes
branches, du MRCA au feuilles de
l’arbre
On obtient ainsi des données de
polymorphisme sous les modèles
démographiques et mutationnels
considérés
• Under the assumption of
neutrality, the mutations
are independent of the
genealogy.
• We construct the genealogy
according to the
demographic parameters,
then we add a posteriori the
mutations.
37. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
38. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
39. Neutral model at a given microsatellite locus, in a closed
panmictic population at equilibrium
Observations: leafs of the tree
ˆθ =?
Kingman’s genealogy
When time axis is
normalized,
T(k) ∼ Exp(k(k −1)/2)
Mutations according to
the Simple stepwise
Mutation Model
(SMM)
• date of the mutations ∼
Poisson process with
intensity θ/2 over the
branches
• MRCA = 100
• independent mutations:
±1 with pr. 1/2
40. Much more interesting models. . .
• several independent locus
Independent gene genealogies and mutations
• different populations
linked by an evolutionary scenario made of divergences,
admixtures, migrations between populations, selection
pressure, etc.
• larger sample size
usually between 50 and 100 genes
41. Available population scenarios
Between populations: three types of events, backward in time
• the divergence is the fusion between two populations,
• the admixture is the split of a population into two parts,
• the migration allows the move of some lineages of a
population to another.
•
4
•
2
•
5
•
3
•
1
Lignée ancestrale
Présent
T5
T4
T3
T2
FIGURE 2.2: Exemple de généalogie de cinq individus issus d’une seule population fermée à l’équilibre. Les
individus échantillonnés sont représentés par les feuilles du dendrogramme, les durées inter-coalescences
T2, . . . , T5 sont indépendantes, et Tk est de loi exponentielle de paramètre k k - 1 /2.
Pop1 Pop2
Pop1
Divergence
(a)
t
t0
Pop1 Pop3 Pop2
Admixture
(b)
1 - rr
t
t0
m12
m21
Pop1 Pop2
Migration
(c)
t
t0
FIGURE 2.3: Représentations graphiques des trois types d’évènements inter-populationnels d’un scénario
démographique. Il existe deux familles d’évènements inter-populationnels. La première famille est simple,
elle correspond aux évènement inter-populationnels instantanés. C’est le cas d’une divergence ou d’une
admixture. (a) Deux populations qui évoluent pour se fusionner dans le cas d’une divergence. (b) Trois po-
pulations qui évoluent en parallèle pour une admixture. Pour cette situation, chacun des tubes représente
(on peut imaginer qu’il porte à l’intérieur) la généalogie de la population qui évolue indépendamment des
42. A complex scenario
The goal is to discriminate between different population scenarios
from a dataset of polymorphism (DNA sample) y observed at the
present time.
2.5 Conclusion 37
Divergence
Pop1
Ne1
Pop4
Ne4
Admixture
Pop3
Ne3
Pop6Ne6
Pop2
Ne2
Pop5Ne5
Migration
m
m0
t = 0
t5
t4
t0
4
Ne4
Ne0
4
t3
t2
t1
r 1 - r
1 - ss
FIGURE 2.1: Exemple d’un scénario évolutif complexe composé d’évènements inter-populationnels. Ce
43. Demo-genetic inference
Each model is characterized by a set of parameters θ that cover
historical (time divergence, admixture time ...), demographics
(population sizes, admixture rates, migration rates, ...) and genetic
(mutation rate, ...) factors
The goal is to estimate these parameters from a dataset of
polymorphism (DNA sample) y observed at the present time
Problem: most of the time, we can not calculate the likelihood of
the polymorphism data f (y|θ).
44. Untractable likelihood
Missing (too missing!) data structure:
f (y|θ) =
G
f (y|G, θ)f (G|θ)dG
The genealogies are considered as nuisance parameters.
This problematic thus differs from the phylogenetic approach
where the tree is the parameter of interesst.
45. A genuine example of application
94
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03!
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.+
Pygmies populations: do they have a common origin? Is there a
lot of exchanges between pygmies and non-pygmies populations?
47. Simulation results
Différents scénarios possibles, choix de scenari
Le scenario 1a est largement soutenu par rap
autres ! plaide pour une origine commune
!""#$%&'()*+,(-*.&(/+0$'"1)()&$/+2!,03
1/+*%*'"4*+56(""4&7()&$/.+.1#+4*.+8-9':*.
Différents scénarios possibles, choix de scenario par ABC
Le scenario 1a est largement soutenu par rapport aux
autres ! plaide pour une origine commune des
populations pygmées d’Afrique de l’Ouest
Verdu e
c Scenario 1A is chosen.
49. Instance of ecological questions [message in a beetle]
• How the Asian Ladybird
beetle arrived in Europe?
• Why does they swarm right
now?
• What are the routes of
invasion?
• How to get rid of them?
• Why did the chicken cross
the road?
[Lombaert & al., 2010, PLoS ONE]
beetles in forests
50. Worldwide invasion routes of Harmonia Axyridis
For each outbreak, the arrow indicates the most likely invasion
pathway and the associated posterior probability, with 95% credible
intervals in brackets
[Estoup et al., 2012, Molecular Ecology Res.]
51. Worldwide invasion routes of Harmonia Axyridis
For each outbreak, the arrow indicates the most likely invasion
pathway and the associated posterior probability, with 95% credible
intervals in brackets
[Estoup et al., 2012, Molecular Ecology Res.]
52. A population genetic illustration of ABC model choice
Two populations (1 and 2) having diverged at a fixed known time
in the past and third population (3) which diverged from one of
those two populations (models 1 and 2, respectively).
Observation of 50 diploid individuals/population genotyped at 5,
50 or 100 independent microsatellite loci.
Model 2
53. A population genetic illustration of ABC model choice
Two populations (1 and 2) having diverged at a fixed known time
in the past and third population (3) which diverged from one of
those two populations (models 1 and 2, respectively).
Observation of 50 diploid individuals/population genotyped at 5,
50 or 100 independent microsatellite loci.
Stepwise mutation model: the number of repeats of the mutated
gene increases or decreases by one. Mutation rate µ common to all
loci set to 0.005 (single parameter) with uniform prior distribution
µ ∼ U[0.0001, 0.01]
54. A population genetic illustration of ABC model choice
Summary statistics associated to the (δµ)2 distance
xl,i,j repeated number of allele in locus l = 1, . . . , L for individual
i = 1, . . . , 100 within the population j = 1, 2, 3. Then
(δµ)2
j1,j2
=
1
L
L
l=1
1
100
100
i1=1
xl,i1,j1 −
1
100
100
i2=1
xl,i2,j2
2
.
55. A population genetic illustration of ABC model choice
For two copies of locus l with allele sizes xl,i,j1 and xl,i ,j2
, most
recent common ancestor at coalescence time τj1,j2 , gene genealogy
distance of 2τj1,j2 , hence number of mutations Poisson with
parameter 2µτj1,j2 . Therefore,
E xl,i,j1 − xl,i ,j2
2
|τj1,j2 = 2µτj1,j2
and
Model 1 Model 2
E (δµ)2
1,2 2µ1t 2µ2t
E (δµ)2
1,3 2µ1t 2µ2t
E (δµ)2
2,3 2µ1t 2µ2t
56. A population genetic illustration of ABC model choice
Thus,
• Bayes factor based only on distance (δµ)2
1,2 not convergent: if
µ1 = µ2, same expectation
• Bayes factor based only on distance (δµ)2
1,3 or (δµ)2
2,3 not
convergent: if µ1 = 2µ2 or 2µ1 = µ2 same expectation
• if two of the three distances are used, Bayes factor converges:
there is no (µ1, µ2) for which all expectations are equal
57. A population genetic illustration of ABC model choice
q
q q
5 50 100
0.00.40.8
DM2(12)
q
q
q
q
q
q
q
q
q
qq
q q
q
5 50 100
0.00.40.8
DM2(13)
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqq
q
qq
q
qqqq
q
q
q
q
q
q
5 50 100
0.00.40.8
DM2(13) & DM2(23)
Posterior probabilities that the data is from model 1 for 5, 50
and 100 loci