SlideShare a Scribd company logo
ABC: convergence and misspecification
Christian P. Robert
Universit´e Paris-Dauphine PSL, Paris & University of Warwick,
Coventry
Joint work with D. Frazier, G. Martin, and J. Rousseau
Outline
motivatoy example
Approximate Bayesian computation
ABC for model choice
[some] asymptotics of ABC
ABC under misspecification
A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
sounds like an impossible task
one observation x = 11 and two unknowns, nsocks and npairs
writing the likelihood is a challenge [exercise]
A motivating if pedestrian example
paired and orphan socks
A drawer contains an unknown number of socks, some of which
can be paired and some of which are orphans (single). One takes
at random 11 socks without replacement from this drawer: no pair
can be found among those. What can we infer about the total
number of socks in the drawer?
sounds like an impossible task
one observation x = 11 and two unknowns, nsocks and npairs
writing the likelihood is a challenge [exercise]
Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
Resolution as
pj =
n
j
22r−2j n − j
2r − 2j
2n
2r
being probability of obtaining js pairs among those 2r shoes, or for
an odd number t of shoes
pj = 2t−2j n
j
n − j
t − 2j
2n
t
Feller’s shoes
A closet contains n pairs of shoes. If 2r shoes are chosen
at random (with 2r < n), what is the probability that
there will be (a) no complete pair, (b) exactly one
complete pair, (c) exactly two complete pairs among
them?
[Feller, 1970, Chapter II, Exercise 26]
If one draws 11 socks out of m socks made of f orphans and g
pairs, with f + 2g = m, number k of socks from the orphan group
is hypergeometric H(11, m, f ) and probability to observe 11
orphan socks total is
11
k=0
f
k
2g
11−k
m
11
×
211−k g
11−k
2g
11−k
A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
Rassmus’ reasoning
If you are a family of 3-4 persons then a guesstimate would be that
you have something like 15 pairs of socks in store. It is also
possible that you have much more than 30 socks. So as a prior for
nsocks I’m going to use a negative binomial with mean 30 and
standard deviation 15.
On npairs/2nsocks I’m going to put a Beta prior distribution that puts
most of the probability over the range 0.75 to 1.0,
[Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
A prioris on socks
Given parameters nsocks and npairs, set of socks
S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks
and 11 socks picked at random from S give X unique socks.
Rassmus’ reasoning
If you are a family of 3-4 persons then a guesstimate would be that
you have something like 15 pairs of socks in store. It is also
possible that you have much more than 30 socks. So as a prior for
nsocks I’m going to use a negative binomial with mean 30 and
standard deviation 15.
On npairs/2nsocks I’m going to put a Beta prior distribution that puts
most of the probability over the range 0.75 to 1.0,
[Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1. generate new values
of nsocks and npairs,
2. generate a new
observation of X,
number of unique
socks out of 11.
3. accept the pair
(nsocks, npairs) if the
realisation of X is
equal to 11
Simulating the experiment
Given a prior distribution on nsocks and npairs,
nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2)
possible to
1. generate new values
of nsocks and npairs,
2. generate a new
observation of X,
number of unique
socks out of 11.
3. accept the pair
(nsocks, npairs) if the
realisation of X is
equal to 11
Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Generations from π(nsocks, npairs) are accepted with probability
P {X = 11|(nsocks, npairs)}
Meaning
ns
Density
0 10 20 30 40 50 60
0.000.010.020.030.040.050.06
The outcome of this simulation method returns a distribution on
the pair (nsocks, npairs) that is the conditional distribution of the
pair given the observation X = 11
Proof: Hence accepted values distributed from
π(nsocks, npairs) × P {X = 11|(nsocks, npairs)} ∝ π(nsocks, npairs|X = 11)
Approximate Bayesian computation
motivatoy example
Approximate Bayesian computation
ABC basics
Automated summary selection
ABC for model choice
[some] asymptotics of ABC
ABC under misspecification
Untractable likelihoods
Cases when the likelihood function
f (y|θ) is unavailable and when the
completion step
f (y|θ) =
Z
f (y, z|θ) dz
is impossible or too costly because of
the dimension of z
c MCMC cannot be implemented
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavar´e et al., 1997]
The ABC method
Bayesian setting: target is π(θ)f (x|θ)
When likelihood f (x|θ) not in closed form, likelihood-free rejection
technique:
ABC algorithm
For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly
simulating
θ ∼ π(θ) , z ∼ f (z|θ ) ,
until the auxiliary variable z is equal to the observed value, z = y.
[Tavar´e et al., 1997]
A as A...pproximative
When y is a continuous random variable, equality z = y is
replaced with a tolerance condition,
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
A as A...pproximative
When y is a continuous random variable, equality z = y is
replaced with a tolerance condition,
ρ(y, z)
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < )
[Pritchard et al., 1999]
ABC algorithm
Algorithm 1 Likelihood-free rejection sampler 2
for i = 1 to N do
repeat
generate θ from the prior distribution π(·)
generate z from the likelihood f (·|θ )
until ρ{η(z), η(y)}
set θi = θ
end for
where η(y) defines a (not necessarily sufficient) statistic
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
Output
The likelihood-free algorithm samples from the marginal in z of:
π (θ, z|y) =
π(θ)f (z|θ)IA ,y (z)
A ,y×Θ π(θ)f (z|θ)dzdθ
,
where A ,y = {z ∈ D|ρ(η(z), η(y)) < }.
The idea behind ABC is that the summary statistics coupled with a
small tolerance should provide a good approximation of the
posterior distribution:
π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
Dogger Bank re-enactment
Battle of Dogger Bank on Jan 24, 1915, between British and
German fleets : how likely was the British victory?
[MacKay, Price, and Wood, 2016]
Dogger Bank re-enactment
Battle of Dogger Bank on Jan 24, 1915, between British and
German fleets : how likely was the British victory?
[MacKay, Price, and Wood, 2016]
Dogger Bank re-enactment
Battle of Dogger Bank on Jan 24, 1915, between British and
German fleets : ABC simulation of posterior distribution
[MacKay, Price, and Wood, 2016]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC advances
Simulating from the prior is often poor in efficiency
Either modify the proposal distribution on θ to increase the density
of x’s within the vicinity of y...
[Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009]
...or by viewing the problem as a conditional density estimation
and by developing techniques to allow for larger
[Beaumont et al., 2002; Blum & Fran¸cois, 2009]
.....or even by including in the inferential framework [ABCµ]
[Ratmann et al., 2009]
ABC consistency
Recent studies on large sample properties of ABC posterior
distributions and ABC posterior means
[Li & Fearnhead, 2018a, b; Frazier et al., 2018]
Under regularity conditions on summary statistics,
incl. convergence at speed dT , characterisation of rate of posterior
concentration as a function of tolerance convergence
less stringent condition on tolerance decrease than for
asymptotic normality of posterior;
asymptotic normality of posterior mean does not require
asymptotic normality of posterior itself
Cases for limiting ABC distributions
1. dT T −→ +∞;
2. dT T −→ c;
3. dT T −→ 0
and limiting ABC mean convergent for 2
T = o(1/dT )
[Frazier et al., 2018]
ABC consistency
Recent studies on large sample properties of ABC posterior
distributions and ABC posterior means
[Li & Fearnhead, 2018a, b; Frazier et al., 2018]
Under regularity conditions on summary statistics,
incl. convergence at speed dT , characterisation of rate of posterior
concentration as a function of tolerance convergence
less stringent condition on tolerance decrease than for
asymptotic normality of posterior;
asymptotic normality of posterior mean does not require
asymptotic normality of posterior itself
Cases for limiting ABC distributions
1. dT T −→ +∞;
2. dT T −→ c;
3. dT T −→ 0
and limiting ABC mean convergent for 2
T = o(1/dT )
[Frazier et al., 2018]
Noisily exact ABC
Idea: Modify the data from the start
˜y = y0 + ζ1
with the same scale as ABC
[ see Fearnhead-Prangle ]
run ABC on ˜y
Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y)
[Dean et al., 2011; Fearnhead and Prangle, 2012]
Noisily exact ABC
Idea: Modify the data from the start
˜y = y0 + ζ1
with the same scale as ABC
[ see Fearnhead-Prangle ]
run ABC on ˜y
Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y)
[Dean et al., 2011; Fearnhead and Prangle, 2012]
Consistent noisy ABC
Degrading the data improves the estimation performances:
Noisy ABC-MLE is asymptotically (in n) consistent
under further assumptions, the noisy ABC-MLE is
asymptotically normal
increase in variance of order −2
likely degradation in precision or computing time due to the
lack of summary statistic [curse of dimensionality]
Semi-automatic ABC
Fearnhead and Prangle (2012) study ABC and the selection of the
summary statistic
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
˜η(y) = η(y) + τ
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
Optimality of the posterior expectation E[θ|y] of the parameter of
interest as summary statistics η(y)!
Semi-automatic ABC
Fearnhead and Prangle (2012) study ABC and the selection of the
summary statistic
ABC then considered from a purely inferential viewpoint and
calibrated for estimation purposes
Use of a randomised (or ‘noisy’) version of the summary statistics
˜η(y) = η(y) + τ
Derivation of a well-calibrated version of ABC, i.e. an algorithm
that gives proper predictions for the distribution associated with
this randomised summary statistic [calibration constraint: ABC
approximation with same posterior mean as the true randomised
posterior]
Optimality of the posterior expectation E[θ|y] of the parameter of
interest as summary statistics η(y)!
Fully automatic ABC
Implementation of ABC still requires input of collection of
summaries
Towards automation
statistical projection techniques (LDA, PCA, NP-GLS, &tc.)
variable selection
machine learning approaches
bypassing summaries altogether
ABC with Wasserstein distance
Use as distance between simulated and observed samples the
Wasserstein distance:
Wp(y1:n, z1:n)p
= inf
σ∈Sn
1
n
n
i=1
ρ(yi , zσ(i))p
, (1)
covers well- and mis-specified cases
only depends on data space distance ρ(·, ·)
covers iid and dynamic models (curve matching)
computional feasible (linear in dimension, cubic in sample size)
Hilbert curve approximation
[Bernton et al., 2017]
Consistent inference with Wasserstein distance
As ε → 0 [and n fixed]
If either
1. f
(n)
θ is n-exchangeable and D(y1:n, z1:n) = 0 if and only if
z1:n = yσ(1:n) for some σ ∈ Sn, or
2. D(y1:n, z1:n) = 0 if and only if z1:n = y1:n.
then, at y1:n fixed, ABC posterior converges strongly to posterior
as ε → 0.
[Bernton et al., 2017]
Consistent inference with Wasserstein distance
As n → ∞ [at ε fixed]
WABC distribution with a fixed ε does not converge in n to a
Dirac mass
[Bernton et al., 2017]
Consistent inference with Wasserstein distance
As εn → 0 and n → ∞
Under range of assumptions, if fn(εn) → 0, and
P(W(^µn, µ ) εn) → 1, then WABC posterior with threshold
εn + ε satisfies
πεn+ε
{θ ∈ H : W(µ , µθ) > ε + 4εn/3 + f −1
n (εL
n/R)} |y1:n
P
δ
[Bernton et al., 2017]
A bivariate Gaussian illustration
100 observations from bivariate Normal with variance 1 and
covariance 0.55
Compare WABC with ABC based on raw Euclidean distance and
Euclidean distance between sample means on 106 model
simulations.
ABC for model choice
motivatoy example
Approximate Bayesian computation
ABC for model choice
[some] asymptotics of ABC
ABC under misspecification
Bayesian model choice
Several models M1, M2, . . . are considered simultaneously for a
dataset y and the model index M is part of the inference.
Use of a prior distribution. π(M = m), plus a prior distribution on
the parameter conditional on the value m of the model index,
πm(θm)
Goal is to derive the posterior distribution of M, challenging
computational target when models are complex.
Generic ABC for model choice
Algorithm 2 Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm(θm)
Generate z from the model fm(z|θm)
until ρ{η(z), η(y)} <
Set m(t) = m and θ(t)
= θm
end for
[Cornuet et al., DIYABC, 2009]
ABC estimates
Posterior probability π(M = m|y) approximated by the frequency
of acceptances from model m
1
T
T
t=1
Im(t)=m .
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) =
Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
c Inability to evaluate loss brought by summary statistics
Limiting behaviour of B12 (under sufficiency)
If η(y) sufficient statistic for both models,
fi (y|θi ) = gi (y)f η
i (η(y)|θi )
Thus
B12(y) =
Θ1
π(θ1)g1(y)f η
1 (η(y)|θ1) dθ1
Θ2
π(θ2)g2(y)f η
2 (η(y)|θ2) dθ2
=
g1(y) π1(θ1)f η
1 (η(y)|θ1) dθ1
g2(y) π2(θ2)f η
2 (η(y)|θ2) dθ2
=
g1(y)
g2(y)
Bη
12(y) .
[Didelot, Everitt, Johansen & Lawson, 2011]
c No discrepancy only when cross-model sufficiency
c Inability to evaluate loss brought by summary statistics
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic
T(y) consistent?
Note/warnin: c drawn on T(y) through BT
12(y) necessarily differs
from c drawn on y through B12(y)
[Marin, Pillai, X, & Rousseau, JRSS B, 2013]
A stylised problem
Central question to the validation of ABC for model choice:
When is a Bayes factor based on an insufficient statistic
T(y) consistent?
Note/warnin: c drawn on T(y) through BT
12(y) necessarily differs
from c drawn on y through B12(y)
[Marin, Pillai, X, & Rousseau, JRSS B, 2013]
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
Four possible statistics
1. sample mean y (sufficient for M1 if not M2);
2. sample median med(y) (insufficient);
3. sample variance var(y) (ancillary);
4. median absolute deviation mad(y) = med(|y − med(y)|);
A benchmark if toy example
Comparison suggested by referee of PNAS paper [thanks!]:
[X, Cornuet, Marin, & Pillai, Aug. 2011]
Model M1: y ∼ N(θ1, 1) opposed
to model M2: y ∼ L(θ2, 1/
√
2), Laplace distribution with mean θ2
and scale parameter 1/
√
2 (variance one).
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.10.20.30.40.50.60.7
n=100
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
Gauss Laplace
0.00.20.40.60.81.0
n=100
Framework
Starting from sample
y = (y1, . . . , yn)
the observed sample, not necessarily iid with true distribution
y ∼ Pn
Summary statistics
T(y) = Tn
= (T1(y), T2(y), · · · , Td (y)) ∈ Rd
with true distribution Tn
∼ Gn.
Framework
c Comparison of
– under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1
– under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2
turned into
– under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn
)
– under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn
)
Assumptions
A collection of asymptotic “standard” assumptions:
[A1] is a standard central limit theorem under the true model with
asymptotic mean µ0
[A2] controls the large deviations of the estimator Tn
from the
model mean µ(θ)
[A3] is the standard prior mass condition found in Bayesian
asymptotics (di effective dimension of the parameter)
[A4] restricts the behaviour of the model density against the true
density
[Think CLT!]
Asymptotic marginals
Asymptotically, under [A1]–[A4]
mi (t) =
Θi
gi (t|θi ) πi (θi ) dθi
is such that
(i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0,
Cl vd−di
n mi (Tn
) Cuvd−di
n
and
(ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0
mi (Tn
) = oPn [vd−τi
n + vd−αi
n ].
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Cl v
−(d1−d2)
n m1(Tn
) m2(Tn
) Cuv
−(d1−d2)
n ,
where Cl , Cu = OPn (1), irrespective of the true model.
c Only depends on the difference d1 − d2: no consistency
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of Tn
under
both models. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
m1(Tn
)
m2(Tn
)
Cu min v
−(d1−α2)
n , v
−(d1−τ2)
n
Checking for adequate statistics
Run a practical check of the relevance (or non-relevance) of Tn
null hypothesis that both models are compatible with the statistic
Tn
H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0
against
H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0
testing procedure provides estimates of mean of Tn
under each
model and checks for equality
Checking in practice
Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L
For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn
(yi,l ) and
compute
^µi =
1
L
L
l=1
Tn
(yi,l ), i = 1, 2 .
Conditionally on Tn
(y),
√
L { ^µi − Eπ
[µi (θi )|Tn
(y)]} N(0, Vi ),
Test for a common mean
H0 : ^µ1 ∼ N(µ0, V1) , ^µ2 ∼ N(µ0, V2)
against the alternative of different means
H1 : ^µi ∼ N(µi , Vi ), with µ1 = µ2 .
Toy example: Laplace versus Gauss
qqqqqqqqqqqqqqq
qqqqqqqqqq
q
qq
q
q
Gauss Laplace Gauss Laplace
010203040
Normalised χ2 without and with mad
[some] asymptotics of ABC
motivatoy example
Approximate Bayesian computation
ABC for model choice
[some] asymptotics of ABC
asymptotic setup
consistency of ABC posteriors
asymptotic posterior shape
asymptotic behaviour of EABC [θ]
ABC under misspecification
asymptotic setup
asymptotic: y = y(n) ∼ Pn
θ and = n, n → +∞
parametric: θ ∈ Rk, k fixed
concentration of summary statistics η(zn):
∃b : θ → b(θ) η(zn
) − b(θ) = oP
θ
(1), ∀θ
Objects of interest:
posterior concentration and asymptotic shape of π (·|η(y(n)))
(normality?)
convergence of the posterior mean ^θ = EABC[θ|η(y(n))]
asymptotic acceptance rate
[Frazier et al., 2018]
asymptotic setup
asymptotic: y = y(n) ∼ Pn
θ and = n, n → +∞
parametric: θ ∈ Rk, k fixed
concentration of summary statistics η(zn):
∃b : θ → b(θ) η(zn
) − b(θ) = oP
θ
(1), ∀θ
Objects of interest:
posterior concentration and asymptotic shape of π (·|η(y(n)))
(normality?)
convergence of the posterior mean ^θ = EABC[θ|η(y(n))]
asymptotic acceptance rate
[Frazier et al., 2018]
consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ > 0,
Π ( θ − θ0 > δ| η(y) − η(z) ε) → 0
as n → +∞, ε → 0
Bayesian consistency implies that sets containing θ0 have posterior
probability tending to one as n → +∞, with implication being the
existence of a specific rate of concentration
consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ > 0,
Π ( θ − θ0 > δ| η(y) − η(z) ε) → 0
as n → +∞, ε → 0
Concentration around true value and Bayesian consistency
impose less stringent conditions on the convergence speed of
tolerance n to zero, when compared with asymptotic
normality of ABC posterior
asymptotic normality of ABC posterior mean does not require
asymptotic normality of ABC posterior
consistency of ABC posteriors
Concentration of summary η(z): there exists b(θ) such that
η(z) − b(θ) = oP
θ
(1)
Consistency:
Π n ( θ − θ0 δ|η(y)) = 1 + op(1)
Convergence rate: there exists δn = o(1) such that
Π n ( θ − θ0 δn|η(y)) = 1 + op(1)
consistency of ABC posteriors
Consistency:
Π n ( θ − θ0 δ|η(y)) = 1 + op(1)
Convergence rate: there exists δn = o(1) such that
Π n ( θ − θ0 δn|η(y)) = 1 + op(1)
Point estimator consistency
^θ = EABC [θ|η(y(n)
)], EABC [θ|η(y(n)
)] − θ0 = op(1)
vn(EABC [θ|η(y(n)
)] − θ0) ⇒ N(0, v)
Rate of convergence
Π (·| η(y) − η(z) ε) concentrates at rate λn → 0 if
lim sup
ε→0
lim sup
n→+∞
Π ( θ − θ0 > λnM| η(y)η(z) ε) → 0
in P0-probability when M goes to infinity.
Posterior rate of concentration related to rate at which information
accumulates about true parameter vector
Rate of convergence
Π (·| η(y) − η(z) ε) concentrates at rate λn → 0 if
lim sup
ε→0
lim sup
n→+∞
Π ( θ − θ0 > λnM| η(y)η(z) ε) → 0
in P0-probability when M goes to infinity.
Posterior rate of concentration related to rate at which information
accumulates about true parameter vector
Related results
existing studies on the large sample properties of ABC, in which
the asymptotic properties of point estimators derived from ABC
have been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
Related results
existing studies on the large sample properties of ABC, in which
the asymptotic properties of point estimators derived from ABC
have been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
assumption of CLT for summary statistic plus regularity
assumptions on convergence of its density of the summary
statistics to normal limit (incl. existence of Edgeworth
expansion with exponential tail control)
convergence rate of posterior mean if εT = o(1/v
3/5
T )
acceptance probability chosen as arbitrary density
η(y) − η(z)
Related results
existing studies on the large sample properties of ABC, in which
the asymptotic properties of point estimators derived from ABC
have been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
stronger conditions on η than our own weak convergence of
vT {η(z) − b(θ)}, with non-uniform polynomial deviations
excludes non compact parameter space
when εT v−1
T , only deviation bounds and weak convergence
are required (weaker than convergence of densities)
when εT = o(1/vT ) no requirement on convergence rate
Related results
existing studies on the large sample properties of ABC, in which
the asymptotic properties of point estimators derived from ABC
have been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
characterisation of asymptotic behavior of ABC posterior for
all εT = o(1) with posterior concentration
asymptotic normality and unbiasedness of posterior mean
remain achievable even when limT vT εT = ∞, provided
εT = o(1/v
1/2
T )
Related results
existing studies on the large sample properties of ABC, in which
the asymptotic properties of point estimators derived from ABC
have been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
Li and Fearnhead (2018a) point out that if kη > kθ 1 and if
εT = o(1/v
3/5
T ), posterior mean asymptotically normal, unbiased,
but not asymptotically efficient
if vT εT → ∞ and vT ε2
T = o(1), for kη > kθ 1,
EΠε {vT (θ−θ0)} = { θb(θ0) θb(θ0)}−1
θb(θ0) vT {η(y)−b(θ0)}+op(1)
Convergence when n σn
Under (main) assumptions
(A1) ∃σn → 0
Pθ σ−1
n η(z) − b(θ) > u c(θ)h(u), lim
u→+∞
h(u) = 0
(A2)
Π( b(θ) − b(θ0) u) uD
, u ≈ 0
posterior consistency
posterior concentration rate λn that depends on the deviation
control of d2{η(z), b(θ)}
posterior concentration rate for b(θ) bounded from below by O( n)
Convergence when n σn
Under (main) assumptions
(A1) ∃σn → 0
Pθ σ−1
n η(z) − b(θ) > u c(θ)h(u), lim
u→+∞
h(u) = 0
(A2)
Π( b(θ) − b(θ0) u) uD
, u ≈ 0
then
Π n b(θ) − b(θ0) n + σnh−1
( D
n )|η(y) = 1 + op0 (1)
If also θ − θ0 L b(θ) − c(θ0) α, locally and θ → b(θ) 1-1
Π n ( θ − θ0
α
n + σα
n (h−1
( D
n ))α
δn
|η(y)) = 1 + op0 (1)
Comments
(A1) : if Pθ σ−1
n η(z) − b(θ) > u c(θ)h(u), two cases
1. Polynomial tail: h(u) u−κ
, then δn = n + σn
−D/κ
n
2. Exponential tail: h(u) e−cu
, then δn = n + σn log(1/ n)
E.g., η(y) = n−1
i g(yi ) with moments on g (case 1) or
Laplace transform (case 2)
Comments
(A1) : if Pθ σ−1
n η(z) − b(θ) > u c(θ)h(u), two cases
1. Polynomial tail: h(u) u−κ
, then δn = n + σn
−D/κ
n
2. Exponential tail: h(u) e−cu
, then δn = n + σn log(1/ n)
E.g., η(y) = n−1
i g(yi ) with moments on g (case 1) or
Laplace transform (case 2)
Comments
(A2) : Π( b(θ) − b(θ0) u) uD : If Π regular enough then
D = dim(θ)
no need to approximate the density f (η(y)|θ).
Same results holds when n = o(σn) if (A1) replaced with
inf
|x| M
Pθ σ−1
n (η(z) − b(θ)) − x u uD
, u ≈ 0
proof
Simple enough proof: assume σn δ n and
η(y) − b(θ0) σn, η(y) − η(z) n
Hence
b(θ) − b(θ0) > δn ⇒ η(z) − b(θ) > δn − n − σn := tn
Also, if b(θ) − b(θ0) n/3
η(y) − η(z) η(z) − b(θ) + σn
n/3
+ n/3
and
Π n ( b(θ) − b(θ0) > δn|y)
b(θ)−b(θ0) >δn
Pθ ( η(z) − b(θ) > tn) dΠ(θ)
|b(θ)−b(θ0)| n/3
Pθ ( η(z) − b(θ) n/3) dΠ(θ)
−D
n h(tnσ−1
n )
Θ
c(θ)dΠ(θ)
proof
Simple enough proof: assume σn δ n and
η(y) − b(θ0) σn, η(y) − η(z) n
Hence
b(θ) − b(θ0) > δn ⇒ η(z) − b(θ) > δn − n − σn := tn
Also, if b(θ) − b(θ0) n/3
η(y) − η(z) η(z) − b(θ) + σn
n/3
+ n/3
and
Π n ( b(θ) − b(θ0) > δn|y)
b(θ)−b(θ0) >δn
Pθ ( η(z) − b(θ) > tn) dΠ(θ)
|b(θ)−b(θ0)| n/3
Pθ ( η(z) − b(θ) n/3) dΠ(θ)
−D
n h(tnσ−1
n )
Θ
c(θ)dΠ(θ)
Assumptions
Applicable to broad range of data structures
[A1] ensures that η(z) concentrates on b(θ), unescapable
[A2] controls degree of prior mass in a neighbourhood of θ0,
standard in Bayesian asymptotics
[A2] If Π absolutely continuous with prior density p bounded,
above and below, near θ0, then D = dim(θ) = kθ
[A3] identification condition critical for getting posterior
concentration around θ0, b being injective depending on true
structural model and particular choice of η.
Summary statistic and (in)consistency
Consider the moving average MA(2) model
yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N(0, 1)
and
−2 θ1 2, θ1 + θ2 −1, θ1 − θ2 1.
summary statistics equal to sample autocovariances
ηj (y) = T−1
T
t=1+j
yt yt−j j = 0, 1
with
η0(y)
P
→ E[y2
t ] = 1 + (θ01)2
+ (θ02)2
and η1(y)
P
→ E[yt yt−1] = θ01(1 + θ02)
For ABC target pε (θ|η(y)) to be degenerate at θ0
0 = b(θ0) − b (θ) =
1 + (θ01)2
+ (θ02)2
θ01(1 + θ02)
−
1 + (θ1)2
+ (θ2)2
θ1(1 + θ2)
must have unique solution θ = θ0
Take θ01 = .6, θ02 = .2: equation has two solutions
θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
Summary statistic and (in)consistency
Consider the moving average MA(2) model
yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N(0, 1)
and
−2 θ1 2, θ1 + θ2 −1, θ1 − θ2 1.
summary statistics equal to sample autocovariances
ηj (y) = T−1
T
t=1+j
yt yt−j j = 0, 1
with
η0(y)
P
→ E[y2
t ] = 1 + (θ01)2
+ (θ02)2
and η1(y)
P
→ E[yt yt−1] = θ01(1 + θ02)
For ABC target pε (θ|η(y)) to be degenerate at θ0
0 = b(θ0) − b (θ) =
1 + (θ01)2
+ (θ02)2
θ01(1 + θ02)
−
1 + (θ1)2
+ (θ2)2
θ1(1 + θ2)
must have unique solution θ = θ0
Take θ01 = .6, θ02 = .2: equation has two solutions
θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
Concentration for the MA(2) model
True value θ0 = (0.6, 0.2)
Summaries first three autocorrelations
Tolerance proportional to εT = 1/T0.4
Rejection of normality of these posteriors
Asymptotic shape of posterior distribution
Shape of
Π (·| η(y), η(z) εn)
for several connections between εn and rate at which η(yn) satisfy
CLT
Three different regimes:
1. σn = o( n) −→ Uniform limit
2. σn n −→ perturbated Gaussian limit
3. σn n −→ Gaussian limit
Asymptotic shape of posterior distribution
Shape of
Π (·| η(y), η(z) εn)
for several connections between εn and rate at which η(yn) satisfy
CLT
Three different regimes:
1. σn = o( n) −→ Uniform limit
2. σn n −→ perturbated Gaussian limit
3. σn n −→ Gaussian limit
scaling matrices
Introduction of sequence of (k, k) p.d. matrices Σn(θ) such that
for all θ near θ0
c1 Dn ∗ Σn(θ) ∗ c2 Dn ∗, Dn = diag(dn(1), · · · , dn(k)),
with 0 < c1, c2 < +∞, dn(j) → +∞ for all j’s
Possibly different convergence rates for components of η(z)
Reordering components so that
dn(1) · · · dn(k)
with assumption that
lim inf
n
dn(j)εn = lim sup
n
dn(j)εn
New assumptions
(B1) Concentration of summary η: Σn(θ) ∈ Rk1×k1 is o(1)
Σn(θ)−1
{η(z)−b(θ)} ⇒ Nk1 (0, Id), (Σn(θ)Σn(θ0)−1
)n = Co
(B2) b(θ) is C1 and
θ − θ0 b(θ) − b(θ0)
(B3) Dominated convergence and
lim
n
Pθ(Σn(θ)−1{η(z) − b(θ)} ∈ u + B(0, un))
j un(j)
= ϕ(u)
main result
Set Σn(θ) = σnD(θ) for θ ≈ θ0 and
Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2)
when nσ−1
n → +∞
Π n [ −1
n (θ−θ0) ∈ A|y] ⇒ UB0 (A), B0 = {x ∈ Rk
; b (θ0)T
x 1
main result
Set Σn(θ) = σnD(θ) for θ ≈ θ0 and
Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2)
when nσ−1
n → c
Π n [Σn(θ0)−1
(θ − θ0) − Zo
∈ A|y] ⇒ Qc(A), Qc = N
main result
Set Σn(θ) = σnD(θ) for θ ≈ θ0 and
Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2)
when nσ−1
n → 0 and (B3) holds, set
Vn = [b (θ0)]n
Σn(θ0)b (θ0)
then
Π n [V −1
n (θ − θ0) − ˜Zo
∈ A|y] ⇒ Φ(A),
intuition (?!)
Set x(θ) = σ−1
n (θ − θ0) − Zo (k = 1)
πn := Π n [ −1
n (θ − θ0) ∈ A|y]
=
|θ−θ0| un
Ix(θ)∈A
Pθ ( σ−1
n (η(z) − b(θ)) + x(θ) σ−1
n n)p(θ)dθ
|θ−θ0| un
Pθ ( σ−1
n (η(z) − b(θ)) + x(θ) σ−1
n n)p(θ)dθ
+ op(1)
If n/σn 1 :
Pθ σ−1
n (η(z) − b(θ)) + x(θ) σ−1
n n = 1+o(1), iff x σ−1
n n+o(1)
If n/σn = o(1)
Pθ σ−1
n (η(z) − b(θ)) + x σ−1
n n = φ(x)σn(1 + o(1))
more comments
Surprising : U(− n, n) limit when n σn but not that
surprising since n = o(1) means concentration around θ0
and σn = o( n) implies that b(θ) − b(θ0) ≈ η(z) − η(y)
again, no need to control approximation of f (η(y)|θ) by a
Gaussian density: merely a control of the distribution
generalisation to the case where eigenvalues of Σn are
dn,1 = · · · = dn,k
behaviour of EABC (θ|y) consistent with Li & Fearnhead
(2016)
more comments
Surprising : U(− n, n) limit when n σn but not that
surprising since n = o(1) means concentration around θ0
and σn = o( n) implies that b(θ) − b(θ0) ≈ η(z) − η(y)
again, no need to control approximation of f (η(y)|θ) by a
Gaussian density: merely a control of the distribution
generalisation to the case where eigenvalues of Σn are
dn,1 = · · · = dn,k
behaviour of EABC (θ|y) consistent with Li & Fearnhead
(2016)
even more comments
If (also) p(θ) is H¨older β
EABC (θ|y) − θ0 = σn
Zo
b(θ0)
score for f (η(y)|θ)
+
β/2
j=1
2j
n Hj (θ0, p, b)
bias from threshold approx
+o(σn) + O( β+1
n )
with
if 2
n = o(σn) : Efficiency
EABC (θ|y) − θ0 = σn
Zo
b(θ0)
+ o(σn)
the Hj (θ0, p, b)’s are deterministic
c we gain nothing by getting a first crude ^θ(y) = EABC (θ|y)
for some η(y) and then rerun ABC with ^θ(y)
Illustration in the MA(2) setting
Sample sizes of T = 500, 1000
Asymptotic normality rejected for εT = 1/T0.4 and for θ1,
T = 500 and εT = 1/T0.55
asymptotic behaviour of EABC [θ]
When p = dim(η(y)) = d = dim(θ) and n = o(n−3/10)
EABC [dT (θ − θ0)|yo
] ⇒ N(0, ( bo
)T
Σ−1
bo −1
[Li & Fearnhead (2018a)]
In fact, if β+1
n
√
n = o(1), with β H¨older-smoothness of π
EABC [(θ−θ0)|yo
] =
( bo)−1Zo
√
n
+
k
j=1
hj (θ0) 2j
n +op(1), 2k = β
Iterating for fixed p mildly interesting: if
˜η(y) = EABC [θ|yo
]
then
EABC [θ|˜η(y)] = θ0 +
( bo)−1Zo
√
n
+
π (θ0)
π(θ0)
2
n + o()
[Fearnhead & Prangle, 2012]
asymptotic behaviour of EABC [θ]
When p = dim(η(y)) = d = dim(θ) and n = o(n−3/10)
EABC [dT (θ − θ0)|yo
] ⇒ N(0, ( bo
)T
Σ−1
bo −1
[Li & Fearnhead (2018a)]
In fact, if β+1
n
√
n = o(1), with β H¨older-smoothness of π
EABC [(θ−θ0)|yo
] =
( bo)−1Zo
√
n
+
k
j=1
hj (θ0) 2j
n +op(1), 2k = β
Iterating for fixed p mildly interesting: if
˜η(y) = EABC [θ|yo
]
then
EABC [θ|˜η(y)] = θ0 +
( bo)−1Zo
√
n
+
π (θ0)
π(θ0)
2
n + o()
[Fearnhead & Prangle, 2012]
more asymptotic behaviour of EABC [θ]
Li and Fearnhead (2018a,b) consider that
EABC [dT (θ − θ0)|yo
]
not optimal when p > d
If
√
n 2
n = o(1) and n
√
n = o(1)
√
n[EABC (θ) − θ0] = P bo Zo
+ op(1)
Zo
=
√
n(η(y) − bo
)
P bo Zo
= (( bo
)T
bo
)−1
( bo
)T
Zo
and Vas(P bo Zo
) ( bo
)T
Vas(Zo
)−1
( bo
)
−1
If n
√
n = o(1)
√
n[EABC (θ)−θ0] = ( bo
)T
Σ−1
bo −1
( bo
)T
Σ−1
Zo
+op(1)
impact of the dimension of η
dimension of η(.) does not impact above result, but impacts
acceptance probability
if n = o(σn), k1 = dim(η(y)), k = dim(θ) & k1 k
αn := Pr ( y − z n) k1
n σ−k1+k
n
if n σn
αn := Pr ( y − z n) k
n
If we choose αn
αn = o(σk
n) leads to n = σn(αnσ−k
n )1/k1
= o(σn)
αn σn leads to n α
1/k
n .
Illustration in the MA(2) setting
Sample sizes of T = 500, 1000
Asymptotic normality accepted for all graphs
Practical implications
In practice, tolerance determined by quantile (nearest neighbours):
Select all θi associated with the α = δ/N smallest distances
d2{η(zi ), η(y)} for some δ
Then (i) if εT v−1
T or εT = o(v−1
T ), acceptance rate associated
with the threshold εT is
αT = pr ( η(z) − η(y) εT ) (vT εT )kη
× v−kθ
T v−kθ
T
(ii) if εT v−1
T ,
αT = pr ( η(z) − η(y) εT ) εkθ
T v−kθ
T
Practical implications
In practice, tolerance determined by quantile (nearest neighbours):
Select all θi associated with the α = δ/N smallest distances
d2{η(zi ), η(y)} for some δ
Then (i) if εT v−1
T or εT = o(v−1
T ), acceptance rate associated
with the threshold εT is
αT = pr ( η(z) − η(y) εT ) (vT εT )kη
× v−kθ
T v−kθ
T
(ii) if εT v−1
T ,
αT = pr ( η(z) − η(y) εT ) εkθ
T v−kθ
T
Curse of dimensionality
For reasonable statistical behavior, rate of decline of αT the faster
the larger the dimension of θ, kθ, but unaffected by dimension of
η, kη
Theoretical justification for dimension reduction methods that
process parameter components individually and independently of
other components
[Beaumont & al., 2002; Fearnhead & Prangle, 2012; Martin& al., 2016]
importance sampling approach of Li & Fearnhead (2018a) yields
acceptance rates αT = O(1), when εT = O(1/vT )
Curse of dimensionality
For reasonable statistical behavior, rate of decline of αT the faster
the larger the dimension of θ, kθ, but unaffected by dimension of
η, kη
Theoretical justification for dimension reduction methods that
process parameter components individually and independently of
other components
[Beaumont & al., 2002; Fearnhead & Prangle, 2012; Martin& al., 2016]
importance sampling approach of Li & Fearnhead (2018a) yields
acceptance rates αT = O(1), when εT = O(1/vT )
Monte Carlo error
Link the choice of εT to Monte Carlo error associated with NT
draws in Algorithm
Conditions (on εT ) under which
^αT = αT {1 + op(1)}
where ^αT = NT
i=1 1l [d{η(y), η(z)} εT ] /NT proportion of
accepted draws from NT simulated draws of θ
Either
(i) εT = o(v−1
T ) and (vT εT )−kη ε−kθ
T MNT
or
(ii) εT v−1
T and ε−kθ
T MNT
for M large enough;
conclusion on ABC consistency
asymptotic description of ABC: different regimes depending
on n & σn
no point in choosing n arbitrarily small: just n = o(σn)
no asymptotic gain in iterative ABC
results under weak(er) conditions by not studying g(η(z)|θ)
Mis-ABC-fied
motivatoy example
Approximate Bayesian computation
ABC for model choice
[some] asymptotics of ABC
ABC under misspecification
Misspecification
Consequences
Illustration
Assumed data generating process (DGP) is z ∼ N(θ, 1) but actual
DGP is y ∼ N(θ, ˜σ2)
Use of summaries
sample mean η1(y) = 1
n
n
i=1 yi
centered summary η2(y) = 1
n−1
n
i=1(yi − η1(y))2 − 1
Three ABC:
ABC-AR: accept/reject approach with
K (d{η(z), η(y)}) = 1l [d{η(z), η(y)} ] and
d{x, y} = x − y
ABC-K: smooth rejection approach, with K (d{η(z), η(y)})
univariate Gaussian kernel
ABC-Reg: post-processing ABC approach with weighted linear
regression adjustment
Illustration
Posterior means for ABC-AR, ABC-K and ABC-Reg as
misspecification σ2 increases (N = 50, 000 simulated data sets)
αn = n−5/9 quantile for ABC-AR
ABC-K and ABC-Reg bandwidth of n−5/9
Framework
Data y with true distribution P0
Model P := {θ ∈ Θ ⊂ Rkθ : Pθ}
Summary statistic η(y) = (η1(y), ..., ηkη (y))
Misspecification
inf
θ∈Θ
D(P0||Pθ) = inf
θ∈Θ
− log
dP0(y)
dPθ(y)
dP0(y) > 0,
with
θ∗
= arg inf
θ∈Θ
D(P0||Pθ)
[Muller, 2013]
ABC misspecification
for b0 (resp. b(θ)) limit of η(y) (resp. η(z))
inf
θ∈Θ
d{b0, b(θ)} > 0
ABC pseudo-true value
θ∗
= arg inf
θ∈Θ
d{b0, b(θ)}.
Compulsory tolerance
Under standard identification conditions on b(·) ∈ Rkη , there exists
∗ such that
∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
c For tolerances n = o(1), once n < ∗ no draw of θ to be
selected and posterior Π [A|η(y)] ill-conditioned
But appropriately chosen tolerance sequence ( n)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Compulsory tolerance
Under standard identification conditions on b(·) ∈ Rkη , there exists
∗ such that
∗
= inf
θ∈Θ
d{b0, b(θ)} > 0
c For tolerances n = o(1), once n < ∗ no draw of θ to be
selected and posterior Π [A|η(y)] ill-conditioned
But appropriately chosen tolerance sequence ( n)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
ABC Posterior Concentration Under Misspecification
Assumptions
[A0] There exist continuous b : Θ → B and decreasing ρn(·) such
that ρn(u) → 0 at infinity and
Pθ [d{η(θ), b(θ)} > u] c(θ)ρn(u),
Θ
c(θ)dΠ(θ) < +∞
with existence of either
(i) Polynomial deviations: positive sequence vn → +∞ and
u0, κ > 0 such that ρn(u) = v−κ
n u−κ
, for u u0.
(ii) Exponential deviations: hθ(·) > 0 such that
Pθ[d{η(z), b(θ)} > u] c(θ)e−hθ(uvn)
and there exist c, C > 0
such that
Θ
c(θ)e−hθ(uvn)
dΠ(θ) Ce−c(uvn)τ
ABC Posterior Concentration Under Misspecification
Assumptions
[A1] There exist D > 0 and M0, δ0 > 0 such that, for all
δ0 δ > 0 and M M0, there exists
Sδ ⊂ {θ ∈ Θ : d{b(θ), b0} − ∗ δ} for which
In case (i), D < κ and
Sδ
1 −
c(θ)
M
dΠ(θ) δD
.
In case (ii),
Sδ
1 − c(θ)e−hθ(M)
dΠ(θ) δD
.
ABC Posterior Concentration Under Misspecification
Under [A0] and [A1], for n ↓ ∗ with
n
∗
+ Mv−1
n + v−1
0,n
if Mn sequence going to infinity and
δn Mn{( n − ∗) + ρn( n − ∗)}, then
Π [d{b(θ), b0} ∗
+ δn|η(y)] = oP0 (1), (2)
provided
ρn( n − ∗
) ( n − ∗
)−D/κ
in case (i)
ρn( n − ∗
) | log( n − ∗
)|1/τ
in case (ii)
Further
Π [d{θ, θ∗
} > δ|η(y)] = oP0 (1)
[Berton & al., 2017; Frazier & al., 2017]
ABC Posterior Concentration Under Misspecification
Under [A0] and [A1], for n ↓ ∗ with
n
∗
+ Mv−1
n + v−1
0,n
if Mn sequence going to infinity and
δn Mn{( n − ∗) + ρn( n − ∗)}, then
Π [d{b(θ), b0} ∗
+ δn|η(y)] = oP0 (1), (2)
provided
ρn( n − ∗
) ( n − ∗
)−D/κ
in case (i)
ρn( n − ∗
) | log( n − ∗
)|1/τ
in case (ii)
Further
Π [d{θ, θ∗
} > δ|η(y)] = oP0 (1)
[Berton & al., 2017; Frazier & al., 2017]
ABC Asymptotic Posterior
Under further assumptions
existence of positive definite matrix Σn(θ0) such that for all
θ − θ0 δ and all 0 < u δvn
Pθ [ Σn(θ0){η(z) − b(θ)} > u] c0u−κ
parametric CLT: existence of sequence of positive definite
matrices Σn(θ) such that for all θ in a neighbourhood of θ0
Σn(θ) ∗ vn, with vn → +∞ and
Σn(θ){η(z) − b(θ)} ⇒ N(0, Ikη ),
classical concentration assumptions
If limn vn( n − ∗) = 0, then for Z0
n = Σn(θ∗){η(y) − b0} and
Φkη (·) standard Normal
lim
n→+∞
Π Σn(θ∗
){b(θ) − b(θ∗
)} − Z0
n ∈ B|η(y) = Φkη (B)
Accept/Reject ABC revisited
Given link between n and αn in Frazier & al. (2018), posterior
concentration achieved through sequence of αn converging to zero
ABC algorithm based on appropriate αn always yields (under
appropriate regularity) approach that concentrates asymptotically,
regardless of misspecification.
(i) if ( n − ∗) v−1
n or ( n − ∗) = o(v−1
n ), then
αn = Pr ( η(z) − η(y) n) (vn{ n − ∗
})kη
× v−kθ
n v−kθ
n
(ii) if ( n − ∗) v−1
n , then
αn = Pr ( η(z) − η(y) n) { n − ∗
}kθ
v−kθ
n
Regression adjustment
In case of scalar θ, ABC-Reg first runs ABC-AR, with tolerance n,
and obtains selected draws and summaries {θi , η(zi )}
ABC-Reg second uses linear regression to predict accepted values
of θ from η(z) through
θi
= α + β {η(y) − η(zi
)} + νi
ABC-Reg defines adjusted parameter draw as
˜θi
= θi
+ ^β {η(y) − η(zi
)}
where
^β =
1
N
N
i=1
η(zi
) − ¯η η(zi
) − ¯η
−1
1
N
N
i=1
η(zi
) − ¯η θi
− ¯θ =
ABC-Reg pseudo-consistency
If
n
∗
+ Mv−1
n + v−1
0,n ,
with
(i) ∗ = infθ∈Θ d{b(θ), b0} > 0
(ii) θ∗ = arg infθ∈Θ d{b(θ), b0} exists.
(iii) For some β0 with β0 > 0, ^β − β0 = oPθ
(1).
then
Π [|˜θ − ˜θ∗
| > δ|η(y)] = oP0 (1),
[Frazier & al., 2017]

More Related Content

What's hot

Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
Christian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
Christian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
Christian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
Christian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
Christian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
Christian Robert
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
Christian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
Christian Robert
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
Christian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
Christian Robert
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
Christian Robert
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
Christian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
Christian Robert
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
Christian Robert
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
Christian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
Christian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
Christian Robert
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
Christian Robert
 

What's hot (20)

Convergence of ABC methods
Convergence of ABC methodsConvergence of ABC methods
Convergence of ABC methods
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
ABC short course: introduction chapters
ABC short course: introduction chaptersABC short course: introduction chapters
ABC short course: introduction chapters
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: survey chapter
ABC short course: survey chapterABC short course: survey chapter
ABC short course: survey chapter
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 

Similar to ABC convergence under well- and mis-specified models

Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
Christian Robert
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
Christian Robert
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
Matt Moores
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
Christian Robert
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
Pierre Pudlo
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
Umberto Picchini
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
Christian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
Christian Robert
 
Boston talk
Boston talkBoston talk
Boston talk
Christian Robert
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
kuntalpatra420
 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use it
kuntalpatra420
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
Christian Robert
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
Matt Moores
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
Rajwinder Marock
 
Ab cancun
Ab cancunAb cancun
Ab cancun
Christian Robert
 
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Sylvain Hallé
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920Karl Rudeen
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Francesco Casalegno
 

Similar to ABC convergence under well- and mis-specified models (20)

Chapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysisChapter 4: Decision theory and Bayesian analysis
Chapter 4: Decision theory and Bayesian analysis
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Boston talk
Boston talkBoston talk
Boston talk
 
NB classifier to use your next exam aslo
NB classifier to use your next exam asloNB classifier to use your next exam aslo
NB classifier to use your next exam aslo
 
NB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use itNB classifier_Detailed pdf you can use it
NB classifier_Detailed pdf you can use it
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
Ab cancun
Ab cancunAb cancun
Ab cancun
 
Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...Graph Methods for Generating Test Cases with Universal and Existential Constr...
Graph Methods for Generating Test Cases with Universal and Existential Constr...
 
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
fb69b412-97cb-4e8d-8a28-574c09557d35-160618025920
 
Project Paper
Project PaperProject Paper
Project Paper
 
A
AA
A
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 

More from Christian Robert

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
Christian Robert
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
Christian Robert
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
Christian Robert
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
Christian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
Christian Robert
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
Christian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
Christian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
Christian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
Christian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
Christian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
Christian Robert
 

More from Christian Robert (15)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 

Recently uploaded

Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 

Recently uploaded (20)

Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 

ABC convergence under well- and mis-specified models

  • 1. ABC: convergence and misspecification Christian P. Robert Universit´e Paris-Dauphine PSL, Paris & University of Warwick, Coventry Joint work with D. Frazier, G. Martin, and J. Rousseau
  • 2. Outline motivatoy example Approximate Bayesian computation ABC for model choice [some] asymptotics of ABC ABC under misspecification
  • 3. A motivating if pedestrian example paired and orphan socks A drawer contains an unknown number of socks, some of which can be paired and some of which are orphans (single). One takes at random 11 socks without replacement from this drawer: no pair can be found among those. What can we infer about the total number of socks in the drawer? sounds like an impossible task one observation x = 11 and two unknowns, nsocks and npairs writing the likelihood is a challenge [exercise]
  • 4. A motivating if pedestrian example paired and orphan socks A drawer contains an unknown number of socks, some of which can be paired and some of which are orphans (single). One takes at random 11 socks without replacement from this drawer: no pair can be found among those. What can we infer about the total number of socks in the drawer? sounds like an impossible task one observation x = 11 and two unknowns, nsocks and npairs writing the likelihood is a challenge [exercise]
  • 5. Feller’s shoes A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r < n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them? [Feller, 1970, Chapter II, Exercise 26]
  • 6. Feller’s shoes A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r < n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them? [Feller, 1970, Chapter II, Exercise 26] Resolution as pj = n j 22r−2j n − j 2r − 2j 2n 2r being probability of obtaining js pairs among those 2r shoes, or for an odd number t of shoes pj = 2t−2j n j n − j t − 2j 2n t
  • 7. Feller’s shoes A closet contains n pairs of shoes. If 2r shoes are chosen at random (with 2r < n), what is the probability that there will be (a) no complete pair, (b) exactly one complete pair, (c) exactly two complete pairs among them? [Feller, 1970, Chapter II, Exercise 26] If one draws 11 socks out of m socks made of f orphans and g pairs, with f + 2g = m, number k of socks from the orphan group is hypergeometric H(11, m, f ) and probability to observe 11 orphan socks total is 11 k=0 f k 2g 11−k m 11 × 211−k g 11−k 2g 11−k
  • 8. A prioris on socks Given parameters nsocks and npairs, set of socks S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks and 11 socks picked at random from S give X unique socks. Rassmus’ reasoning If you are a family of 3-4 persons then a guesstimate would be that you have something like 15 pairs of socks in store. It is also possible that you have much more than 30 socks. So as a prior for nsocks I’m going to use a negative binomial with mean 30 and standard deviation 15. On npairs/2nsocks I’m going to put a Beta prior distribution that puts most of the probability over the range 0.75 to 1.0, [Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
  • 9. A prioris on socks Given parameters nsocks and npairs, set of socks S = s1, s1, . . . , snpairs , snpairs , snpairs+1, . . . , snsocks and 11 socks picked at random from S give X unique socks. Rassmus’ reasoning If you are a family of 3-4 persons then a guesstimate would be that you have something like 15 pairs of socks in store. It is also possible that you have much more than 30 socks. So as a prior for nsocks I’m going to use a negative binomial with mean 30 and standard deviation 15. On npairs/2nsocks I’m going to put a Beta prior distribution that puts most of the probability over the range 0.75 to 1.0, [Rassmus B˚a˚ath’s Research Blog, Oct 20th, 2014]
  • 10. Simulating the experiment Given a prior distribution on nsocks and npairs, nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2) possible to 1. generate new values of nsocks and npairs, 2. generate a new observation of X, number of unique socks out of 11. 3. accept the pair (nsocks, npairs) if the realisation of X is equal to 11
  • 11. Simulating the experiment Given a prior distribution on nsocks and npairs, nsocks ∼ Neg(30, 15) npairs|nsocks ∼ nsocks/2Be(15, 2) possible to 1. generate new values of nsocks and npairs, 2. generate a new observation of X, number of unique socks out of 11. 3. accept the pair (nsocks, npairs) if the realisation of X is equal to 11
  • 12. Meaning ns Density 0 10 20 30 40 50 60 0.000.010.020.030.040.050.06 The outcome of this simulation method returns a distribution on the pair (nsocks, npairs) that is the conditional distribution of the pair given the observation X = 11 Proof: Generations from π(nsocks, npairs) are accepted with probability P {X = 11|(nsocks, npairs)}
  • 13. Meaning ns Density 0 10 20 30 40 50 60 0.000.010.020.030.040.050.06 The outcome of this simulation method returns a distribution on the pair (nsocks, npairs) that is the conditional distribution of the pair given the observation X = 11 Proof: Hence accepted values distributed from π(nsocks, npairs) × P {X = 11|(nsocks, npairs)} ∝ π(nsocks, npairs|X = 11)
  • 14. Approximate Bayesian computation motivatoy example Approximate Bayesian computation ABC basics Automated summary selection ABC for model choice [some] asymptotics of ABC ABC under misspecification
  • 15. Untractable likelihoods Cases when the likelihood function f (y|θ) is unavailable and when the completion step f (y|θ) = Z f (y, z|θ) dz is impossible or too costly because of the dimension of z c MCMC cannot be implemented
  • 16. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  • 17. The ABC method Bayesian setting: target is π(θ)f (x|θ) When likelihood f (x|θ) not in closed form, likelihood-free rejection technique: ABC algorithm For an observation y ∼ f (y|θ), under the prior π(θ), keep jointly simulating θ ∼ π(θ) , z ∼ f (z|θ ) , until the auxiliary variable z is equal to the observed value, z = y. [Tavar´e et al., 1997]
  • 18. A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 19. A as A...pproximative When y is a continuous random variable, equality z = y is replaced with a tolerance condition, ρ(y, z) where ρ is a distance Output distributed from π(θ) Pθ{ρ(y, z) < } ∝ π(θ|ρ(y, z) < ) [Pritchard et al., 1999]
  • 20. ABC algorithm Algorithm 1 Likelihood-free rejection sampler 2 for i = 1 to N do repeat generate θ from the prior distribution π(·) generate z from the likelihood f (·|θ ) until ρ{η(z), η(y)} set θi = θ end for where η(y) defines a (not necessarily sufficient) statistic
  • 21. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
  • 22. Output The likelihood-free algorithm samples from the marginal in z of: π (θ, z|y) = π(θ)f (z|θ)IA ,y (z) A ,y×Θ π(θ)f (z|θ)dzdθ , where A ,y = {z ∈ D|ρ(η(z), η(y)) < }. The idea behind ABC is that the summary statistics coupled with a small tolerance should provide a good approximation of the posterior distribution: π (θ|y) = π (θ, z|y)dz ≈ π(θ|η(y)) .
  • 23. Dogger Bank re-enactment Battle of Dogger Bank on Jan 24, 1915, between British and German fleets : how likely was the British victory? [MacKay, Price, and Wood, 2016]
  • 24. Dogger Bank re-enactment Battle of Dogger Bank on Jan 24, 1915, between British and German fleets : how likely was the British victory? [MacKay, Price, and Wood, 2016]
  • 25. Dogger Bank re-enactment Battle of Dogger Bank on Jan 24, 1915, between British and German fleets : ABC simulation of posterior distribution [MacKay, Price, and Wood, 2016]
  • 26. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 27. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 28. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 29. ABC advances Simulating from the prior is often poor in efficiency Either modify the proposal distribution on θ to increase the density of x’s within the vicinity of y... [Marjoram et al, 2003; Bortot et al., 2007, Beaumont et al., 2009] ...or by viewing the problem as a conditional density estimation and by developing techniques to allow for larger [Beaumont et al., 2002; Blum & Fran¸cois, 2009] .....or even by including in the inferential framework [ABCµ] [Ratmann et al., 2009]
  • 30. ABC consistency Recent studies on large sample properties of ABC posterior distributions and ABC posterior means [Li & Fearnhead, 2018a, b; Frazier et al., 2018] Under regularity conditions on summary statistics, incl. convergence at speed dT , characterisation of rate of posterior concentration as a function of tolerance convergence less stringent condition on tolerance decrease than for asymptotic normality of posterior; asymptotic normality of posterior mean does not require asymptotic normality of posterior itself Cases for limiting ABC distributions 1. dT T −→ +∞; 2. dT T −→ c; 3. dT T −→ 0 and limiting ABC mean convergent for 2 T = o(1/dT ) [Frazier et al., 2018]
  • 31. ABC consistency Recent studies on large sample properties of ABC posterior distributions and ABC posterior means [Li & Fearnhead, 2018a, b; Frazier et al., 2018] Under regularity conditions on summary statistics, incl. convergence at speed dT , characterisation of rate of posterior concentration as a function of tolerance convergence less stringent condition on tolerance decrease than for asymptotic normality of posterior; asymptotic normality of posterior mean does not require asymptotic normality of posterior itself Cases for limiting ABC distributions 1. dT T −→ +∞; 2. dT T −→ c; 3. dT T −→ 0 and limiting ABC mean convergent for 2 T = o(1/dT ) [Frazier et al., 2018]
  • 32. Noisily exact ABC Idea: Modify the data from the start ˜y = y0 + ζ1 with the same scale as ABC [ see Fearnhead-Prangle ] run ABC on ˜y Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y) [Dean et al., 2011; Fearnhead and Prangle, 2012]
  • 33. Noisily exact ABC Idea: Modify the data from the start ˜y = y0 + ζ1 with the same scale as ABC [ see Fearnhead-Prangle ] run ABC on ˜y Then ABC produces an exact simulation from π(θ|˜y) = π(θ|˜y) [Dean et al., 2011; Fearnhead and Prangle, 2012]
  • 34. Consistent noisy ABC Degrading the data improves the estimation performances: Noisy ABC-MLE is asymptotically (in n) consistent under further assumptions, the noisy ABC-MLE is asymptotically normal increase in variance of order −2 likely degradation in precision or computing time due to the lack of summary statistic [curse of dimensionality]
  • 35. Semi-automatic ABC Fearnhead and Prangle (2012) study ABC and the selection of the summary statistic ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics ˜η(y) = η(y) + τ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior] Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
  • 36. Semi-automatic ABC Fearnhead and Prangle (2012) study ABC and the selection of the summary statistic ABC then considered from a purely inferential viewpoint and calibrated for estimation purposes Use of a randomised (or ‘noisy’) version of the summary statistics ˜η(y) = η(y) + τ Derivation of a well-calibrated version of ABC, i.e. an algorithm that gives proper predictions for the distribution associated with this randomised summary statistic [calibration constraint: ABC approximation with same posterior mean as the true randomised posterior] Optimality of the posterior expectation E[θ|y] of the parameter of interest as summary statistics η(y)!
  • 37. Fully automatic ABC Implementation of ABC still requires input of collection of summaries Towards automation statistical projection techniques (LDA, PCA, NP-GLS, &tc.) variable selection machine learning approaches bypassing summaries altogether
  • 38. ABC with Wasserstein distance Use as distance between simulated and observed samples the Wasserstein distance: Wp(y1:n, z1:n)p = inf σ∈Sn 1 n n i=1 ρ(yi , zσ(i))p , (1) covers well- and mis-specified cases only depends on data space distance ρ(·, ·) covers iid and dynamic models (curve matching) computional feasible (linear in dimension, cubic in sample size) Hilbert curve approximation [Bernton et al., 2017]
  • 39. Consistent inference with Wasserstein distance As ε → 0 [and n fixed] If either 1. f (n) θ is n-exchangeable and D(y1:n, z1:n) = 0 if and only if z1:n = yσ(1:n) for some σ ∈ Sn, or 2. D(y1:n, z1:n) = 0 if and only if z1:n = y1:n. then, at y1:n fixed, ABC posterior converges strongly to posterior as ε → 0. [Bernton et al., 2017]
  • 40. Consistent inference with Wasserstein distance As n → ∞ [at ε fixed] WABC distribution with a fixed ε does not converge in n to a Dirac mass [Bernton et al., 2017]
  • 41. Consistent inference with Wasserstein distance As εn → 0 and n → ∞ Under range of assumptions, if fn(εn) → 0, and P(W(^µn, µ ) εn) → 1, then WABC posterior with threshold εn + ε satisfies πεn+ε {θ ∈ H : W(µ , µθ) > ε + 4εn/3 + f −1 n (εL n/R)} |y1:n P δ [Bernton et al., 2017]
  • 42. A bivariate Gaussian illustration 100 observations from bivariate Normal with variance 1 and covariance 0.55 Compare WABC with ABC based on raw Euclidean distance and Euclidean distance between sample means on 106 model simulations.
  • 43. ABC for model choice motivatoy example Approximate Bayesian computation ABC for model choice [some] asymptotics of ABC ABC under misspecification
  • 44. Bayesian model choice Several models M1, M2, . . . are considered simultaneously for a dataset y and the model index M is part of the inference. Use of a prior distribution. π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm(θm) Goal is to derive the posterior distribution of M, challenging computational target when models are complex.
  • 45. Generic ABC for model choice Algorithm 2 Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(y)} < Set m(t) = m and θ(t) = θm end for [Cornuet et al., DIYABC, 2009]
  • 46. ABC estimates Posterior probability π(M = m|y) approximated by the frequency of acceptances from model m 1 T T t=1 Im(t)=m .
  • 47. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency c Inability to evaluate loss brought by summary statistics
  • 48. Limiting behaviour of B12 (under sufficiency) If η(y) sufficient statistic for both models, fi (y|θi ) = gi (y)f η i (η(y)|θi ) Thus B12(y) = Θ1 π(θ1)g1(y)f η 1 (η(y)|θ1) dθ1 Θ2 π(θ2)g2(y)f η 2 (η(y)|θ2) dθ2 = g1(y) π1(θ1)f η 1 (η(y)|θ1) dθ1 g2(y) π2(θ2)f η 2 (η(y)|θ2) dθ2 = g1(y) g2(y) Bη 12(y) . [Didelot, Everitt, Johansen & Lawson, 2011] c No discrepancy only when cross-model sufficiency c Inability to evaluate loss brought by summary statistics
  • 49. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? Note/warnin: c drawn on T(y) through BT 12(y) necessarily differs from c drawn on y through B12(y) [Marin, Pillai, X, & Rousseau, JRSS B, 2013]
  • 50. A stylised problem Central question to the validation of ABC for model choice: When is a Bayes factor based on an insufficient statistic T(y) consistent? Note/warnin: c drawn on T(y) through BT 12(y) necessarily differs from c drawn on y through B12(y) [Marin, Pillai, X, & Rousseau, JRSS B, 2013]
  • 51. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). Four possible statistics 1. sample mean y (sufficient for M1 if not M2); 2. sample median med(y) (insufficient); 3. sample variance var(y) (ancillary); 4. median absolute deviation mad(y) = med(|y − med(y)|);
  • 52. A benchmark if toy example Comparison suggested by referee of PNAS paper [thanks!]: [X, Cornuet, Marin, & Pillai, Aug. 2011] Model M1: y ∼ N(θ1, 1) opposed to model M2: y ∼ L(θ2, 1/ √ 2), Laplace distribution with mean θ2 and scale parameter 1/ √ 2 (variance one). q q q q q q q q q q q Gauss Laplace 0.00.10.20.30.40.50.60.7 n=100 q q q q q q q q q q q q q q q q q q Gauss Laplace 0.00.20.40.60.81.0 n=100
  • 53. Framework Starting from sample y = (y1, . . . , yn) the observed sample, not necessarily iid with true distribution y ∼ Pn Summary statistics T(y) = Tn = (T1(y), T2(y), · · · , Td (y)) ∈ Rd with true distribution Tn ∼ Gn.
  • 54. Framework c Comparison of – under M1, y ∼ F1,n(·|θ1) where θ1 ∈ Θ1 ⊂ Rp1 – under M2, y ∼ F2,n(·|θ2) where θ2 ∈ Θ2 ⊂ Rp2 turned into – under M1, T(y) ∼ G1,n(·|θ1), and θ1|T(y) ∼ π1(·|Tn ) – under M2, T(y) ∼ G2,n(·|θ2), and θ2|T(y) ∼ π2(·|Tn )
  • 55. Assumptions A collection of asymptotic “standard” assumptions: [A1] is a standard central limit theorem under the true model with asymptotic mean µ0 [A2] controls the large deviations of the estimator Tn from the model mean µ(θ) [A3] is the standard prior mass condition found in Bayesian asymptotics (di effective dimension of the parameter) [A4] restricts the behaviour of the model density against the true density [Think CLT!]
  • 56. Asymptotic marginals Asymptotically, under [A1]–[A4] mi (t) = Θi gi (t|θi ) πi (θi ) dθi is such that (i) if inf{|µi (θi ) − µ0|; θi ∈ Θi } = 0, Cl vd−di n mi (Tn ) Cuvd−di n and (ii) if inf{|µi (θi ) − µ0|; θi ∈ Θi } > 0 mi (Tn ) = oPn [vd−τi n + vd−αi n ].
  • 57. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value!
  • 58. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then Cl v −(d1−d2) n m1(Tn ) m2(Tn ) Cuv −(d1−d2) n , where Cl , Cu = OPn (1), irrespective of the true model. c Only depends on the difference d1 − d2: no consistency
  • 59. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of Tn under both models. And only by this mean value! Else, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} > inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then m1(Tn ) m2(Tn ) Cu min v −(d1−α2) n , v −(d1−τ2) n
  • 60. Checking for adequate statistics Run a practical check of the relevance (or non-relevance) of Tn null hypothesis that both models are compatible with the statistic Tn H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0 against H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} > 0 testing procedure provides estimates of mean of Tn under each model and checks for equality
  • 61. Checking in practice Under each model Mi , generate ABC sample θi,l , l = 1, · · · , L For each θi,l , generate yi,l ∼ Fi,n(·|ψi,l ), derive Tn (yi,l ) and compute ^µi = 1 L L l=1 Tn (yi,l ), i = 1, 2 . Conditionally on Tn (y), √ L { ^µi − Eπ [µi (θi )|Tn (y)]} N(0, Vi ), Test for a common mean H0 : ^µ1 ∼ N(µ0, V1) , ^µ2 ∼ N(µ0, V2) against the alternative of different means H1 : ^µi ∼ N(µi , Vi ), with µ1 = µ2 .
  • 62. Toy example: Laplace versus Gauss qqqqqqqqqqqqqqq qqqqqqqqqq q qq q q Gauss Laplace Gauss Laplace 010203040 Normalised χ2 without and with mad
  • 63. [some] asymptotics of ABC motivatoy example Approximate Bayesian computation ABC for model choice [some] asymptotics of ABC asymptotic setup consistency of ABC posteriors asymptotic posterior shape asymptotic behaviour of EABC [θ] ABC under misspecification
  • 64. asymptotic setup asymptotic: y = y(n) ∼ Pn θ and = n, n → +∞ parametric: θ ∈ Rk, k fixed concentration of summary statistics η(zn): ∃b : θ → b(θ) η(zn ) − b(θ) = oP θ (1), ∀θ Objects of interest: posterior concentration and asymptotic shape of π (·|η(y(n))) (normality?) convergence of the posterior mean ^θ = EABC[θ|η(y(n))] asymptotic acceptance rate [Frazier et al., 2018]
  • 65. asymptotic setup asymptotic: y = y(n) ∼ Pn θ and = n, n → +∞ parametric: θ ∈ Rk, k fixed concentration of summary statistics η(zn): ∃b : θ → b(θ) η(zn ) − b(θ) = oP θ (1), ∀θ Objects of interest: posterior concentration and asymptotic shape of π (·|η(y(n))) (normality?) convergence of the posterior mean ^θ = EABC[θ|η(y(n))] asymptotic acceptance rate [Frazier et al., 2018]
  • 66. consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ > 0, Π ( θ − θ0 > δ| η(y) − η(z) ε) → 0 as n → +∞, ε → 0 Bayesian consistency implies that sets containing θ0 have posterior probability tending to one as n → +∞, with implication being the existence of a specific rate of concentration
  • 67. consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ > 0, Π ( θ − θ0 > δ| η(y) − η(z) ε) → 0 as n → +∞, ε → 0 Concentration around true value and Bayesian consistency impose less stringent conditions on the convergence speed of tolerance n to zero, when compared with asymptotic normality of ABC posterior asymptotic normality of ABC posterior mean does not require asymptotic normality of ABC posterior
  • 68. consistency of ABC posteriors Concentration of summary η(z): there exists b(θ) such that η(z) − b(θ) = oP θ (1) Consistency: Π n ( θ − θ0 δ|η(y)) = 1 + op(1) Convergence rate: there exists δn = o(1) such that Π n ( θ − θ0 δn|η(y)) = 1 + op(1)
  • 69. consistency of ABC posteriors Consistency: Π n ( θ − θ0 δ|η(y)) = 1 + op(1) Convergence rate: there exists δn = o(1) such that Π n ( θ − θ0 δn|η(y)) = 1 + op(1) Point estimator consistency ^θ = EABC [θ|η(y(n) )], EABC [θ|η(y(n) )] − θ0 = op(1) vn(EABC [θ|η(y(n) )] − θ0) ⇒ N(0, v)
  • 70. Rate of convergence Π (·| η(y) − η(z) ε) concentrates at rate λn → 0 if lim sup ε→0 lim sup n→+∞ Π ( θ − θ0 > λnM| η(y)η(z) ε) → 0 in P0-probability when M goes to infinity. Posterior rate of concentration related to rate at which information accumulates about true parameter vector
  • 71. Rate of convergence Π (·| η(y) − η(z) ε) concentrates at rate λn → 0 if lim sup ε→0 lim sup n→+∞ Π ( θ − θ0 > λnM| η(y)η(z) ε) → 0 in P0-probability when M goes to infinity. Posterior rate of concentration related to rate at which information accumulates about true parameter vector
  • 72. Related results existing studies on the large sample properties of ABC, in which the asymptotic properties of point estimators derived from ABC have been the primary focus [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a]
  • 73. Related results existing studies on the large sample properties of ABC, in which the asymptotic properties of point estimators derived from ABC have been the primary focus [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a] assumption of CLT for summary statistic plus regularity assumptions on convergence of its density of the summary statistics to normal limit (incl. existence of Edgeworth expansion with exponential tail control) convergence rate of posterior mean if εT = o(1/v 3/5 T ) acceptance probability chosen as arbitrary density η(y) − η(z)
  • 74. Related results existing studies on the large sample properties of ABC, in which the asymptotic properties of point estimators derived from ABC have been the primary focus [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a] stronger conditions on η than our own weak convergence of vT {η(z) − b(θ)}, with non-uniform polynomial deviations excludes non compact parameter space when εT v−1 T , only deviation bounds and weak convergence are required (weaker than convergence of densities) when εT = o(1/vT ) no requirement on convergence rate
  • 75. Related results existing studies on the large sample properties of ABC, in which the asymptotic properties of point estimators derived from ABC have been the primary focus [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a] characterisation of asymptotic behavior of ABC posterior for all εT = o(1) with posterior concentration asymptotic normality and unbiasedness of posterior mean remain achievable even when limT vT εT = ∞, provided εT = o(1/v 1/2 T )
  • 76. Related results existing studies on the large sample properties of ABC, in which the asymptotic properties of point estimators derived from ABC have been the primary focus [Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2018a] Li and Fearnhead (2018a) point out that if kη > kθ 1 and if εT = o(1/v 3/5 T ), posterior mean asymptotically normal, unbiased, but not asymptotically efficient if vT εT → ∞ and vT ε2 T = o(1), for kη > kθ 1, EΠε {vT (θ−θ0)} = { θb(θ0) θb(θ0)}−1 θb(θ0) vT {η(y)−b(θ0)}+op(1)
  • 77. Convergence when n σn Under (main) assumptions (A1) ∃σn → 0 Pθ σ−1 n η(z) − b(θ) > u c(θ)h(u), lim u→+∞ h(u) = 0 (A2) Π( b(θ) − b(θ0) u) uD , u ≈ 0 posterior consistency posterior concentration rate λn that depends on the deviation control of d2{η(z), b(θ)} posterior concentration rate for b(θ) bounded from below by O( n)
  • 78. Convergence when n σn Under (main) assumptions (A1) ∃σn → 0 Pθ σ−1 n η(z) − b(θ) > u c(θ)h(u), lim u→+∞ h(u) = 0 (A2) Π( b(θ) − b(θ0) u) uD , u ≈ 0 then Π n b(θ) − b(θ0) n + σnh−1 ( D n )|η(y) = 1 + op0 (1) If also θ − θ0 L b(θ) − c(θ0) α, locally and θ → b(θ) 1-1 Π n ( θ − θ0 α n + σα n (h−1 ( D n ))α δn |η(y)) = 1 + op0 (1)
  • 79. Comments (A1) : if Pθ σ−1 n η(z) − b(θ) > u c(θ)h(u), two cases 1. Polynomial tail: h(u) u−κ , then δn = n + σn −D/κ n 2. Exponential tail: h(u) e−cu , then δn = n + σn log(1/ n) E.g., η(y) = n−1 i g(yi ) with moments on g (case 1) or Laplace transform (case 2)
  • 80. Comments (A1) : if Pθ σ−1 n η(z) − b(θ) > u c(θ)h(u), two cases 1. Polynomial tail: h(u) u−κ , then δn = n + σn −D/κ n 2. Exponential tail: h(u) e−cu , then δn = n + σn log(1/ n) E.g., η(y) = n−1 i g(yi ) with moments on g (case 1) or Laplace transform (case 2)
  • 81. Comments (A2) : Π( b(θ) − b(θ0) u) uD : If Π regular enough then D = dim(θ) no need to approximate the density f (η(y)|θ). Same results holds when n = o(σn) if (A1) replaced with inf |x| M Pθ σ−1 n (η(z) − b(θ)) − x u uD , u ≈ 0
  • 82. proof Simple enough proof: assume σn δ n and η(y) − b(θ0) σn, η(y) − η(z) n Hence b(θ) − b(θ0) > δn ⇒ η(z) − b(θ) > δn − n − σn := tn Also, if b(θ) − b(θ0) n/3 η(y) − η(z) η(z) − b(θ) + σn n/3 + n/3 and Π n ( b(θ) − b(θ0) > δn|y) b(θ)−b(θ0) >δn Pθ ( η(z) − b(θ) > tn) dΠ(θ) |b(θ)−b(θ0)| n/3 Pθ ( η(z) − b(θ) n/3) dΠ(θ) −D n h(tnσ−1 n ) Θ c(θ)dΠ(θ)
  • 83. proof Simple enough proof: assume σn δ n and η(y) − b(θ0) σn, η(y) − η(z) n Hence b(θ) − b(θ0) > δn ⇒ η(z) − b(θ) > δn − n − σn := tn Also, if b(θ) − b(θ0) n/3 η(y) − η(z) η(z) − b(θ) + σn n/3 + n/3 and Π n ( b(θ) − b(θ0) > δn|y) b(θ)−b(θ0) >δn Pθ ( η(z) − b(θ) > tn) dΠ(θ) |b(θ)−b(θ0)| n/3 Pθ ( η(z) − b(θ) n/3) dΠ(θ) −D n h(tnσ−1 n ) Θ c(θ)dΠ(θ)
  • 84. Assumptions Applicable to broad range of data structures [A1] ensures that η(z) concentrates on b(θ), unescapable [A2] controls degree of prior mass in a neighbourhood of θ0, standard in Bayesian asymptotics [A2] If Π absolutely continuous with prior density p bounded, above and below, near θ0, then D = dim(θ) = kθ [A3] identification condition critical for getting posterior concentration around θ0, b being injective depending on true structural model and particular choice of η.
  • 85. Summary statistic and (in)consistency Consider the moving average MA(2) model yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N(0, 1) and −2 θ1 2, θ1 + θ2 −1, θ1 − θ2 1. summary statistics equal to sample autocovariances ηj (y) = T−1 T t=1+j yt yt−j j = 0, 1 with η0(y) P → E[y2 t ] = 1 + (θ01)2 + (θ02)2 and η1(y) P → E[yt yt−1] = θ01(1 + θ02) For ABC target pε (θ|η(y)) to be degenerate at θ0 0 = b(θ0) − b (θ) = 1 + (θ01)2 + (θ02)2 θ01(1 + θ02) − 1 + (θ1)2 + (θ2)2 θ1(1 + θ2) must have unique solution θ = θ0 Take θ01 = .6, θ02 = .2: equation has two solutions θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
  • 86. Summary statistic and (in)consistency Consider the moving average MA(2) model yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N(0, 1) and −2 θ1 2, θ1 + θ2 −1, θ1 − θ2 1. summary statistics equal to sample autocovariances ηj (y) = T−1 T t=1+j yt yt−j j = 0, 1 with η0(y) P → E[y2 t ] = 1 + (θ01)2 + (θ02)2 and η1(y) P → E[yt yt−1] = θ01(1 + θ02) For ABC target pε (θ|η(y)) to be degenerate at θ0 0 = b(θ0) − b (θ) = 1 + (θ01)2 + (θ02)2 θ01(1 + θ02) − 1 + (θ1)2 + (θ2)2 θ1(1 + θ2) must have unique solution θ = θ0 Take θ01 = .6, θ02 = .2: equation has two solutions θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
  • 87. Concentration for the MA(2) model True value θ0 = (0.6, 0.2) Summaries first three autocorrelations Tolerance proportional to εT = 1/T0.4 Rejection of normality of these posteriors
  • 88. Asymptotic shape of posterior distribution Shape of Π (·| η(y), η(z) εn) for several connections between εn and rate at which η(yn) satisfy CLT Three different regimes: 1. σn = o( n) −→ Uniform limit 2. σn n −→ perturbated Gaussian limit 3. σn n −→ Gaussian limit
  • 89. Asymptotic shape of posterior distribution Shape of Π (·| η(y), η(z) εn) for several connections between εn and rate at which η(yn) satisfy CLT Three different regimes: 1. σn = o( n) −→ Uniform limit 2. σn n −→ perturbated Gaussian limit 3. σn n −→ Gaussian limit
  • 90. scaling matrices Introduction of sequence of (k, k) p.d. matrices Σn(θ) such that for all θ near θ0 c1 Dn ∗ Σn(θ) ∗ c2 Dn ∗, Dn = diag(dn(1), · · · , dn(k)), with 0 < c1, c2 < +∞, dn(j) → +∞ for all j’s Possibly different convergence rates for components of η(z) Reordering components so that dn(1) · · · dn(k) with assumption that lim inf n dn(j)εn = lim sup n dn(j)εn
  • 91. New assumptions (B1) Concentration of summary η: Σn(θ) ∈ Rk1×k1 is o(1) Σn(θ)−1 {η(z)−b(θ)} ⇒ Nk1 (0, Id), (Σn(θ)Σn(θ0)−1 )n = Co (B2) b(θ) is C1 and θ − θ0 b(θ) − b(θ0) (B3) Dominated convergence and lim n Pθ(Σn(θ)−1{η(z) − b(θ)} ∈ u + B(0, un)) j un(j) = ϕ(u)
  • 92. main result Set Σn(θ) = σnD(θ) for θ ≈ θ0 and Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2) when nσ−1 n → +∞ Π n [ −1 n (θ−θ0) ∈ A|y] ⇒ UB0 (A), B0 = {x ∈ Rk ; b (θ0)T x 1
  • 93. main result Set Σn(θ) = σnD(θ) for θ ≈ θ0 and Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2) when nσ−1 n → c Π n [Σn(θ0)−1 (θ − θ0) − Zo ∈ A|y] ⇒ Qc(A), Qc = N
  • 94. main result Set Σn(θ) = σnD(θ) for θ ≈ θ0 and Zo = Σn(θ0)−1(η(y) − b(θ0)), then under (B1) and (B2) when nσ−1 n → 0 and (B3) holds, set Vn = [b (θ0)]n Σn(θ0)b (θ0) then Π n [V −1 n (θ − θ0) − ˜Zo ∈ A|y] ⇒ Φ(A),
  • 95. intuition (?!) Set x(θ) = σ−1 n (θ − θ0) − Zo (k = 1) πn := Π n [ −1 n (θ − θ0) ∈ A|y] = |θ−θ0| un Ix(θ)∈A Pθ ( σ−1 n (η(z) − b(θ)) + x(θ) σ−1 n n)p(θ)dθ |θ−θ0| un Pθ ( σ−1 n (η(z) − b(θ)) + x(θ) σ−1 n n)p(θ)dθ + op(1) If n/σn 1 : Pθ σ−1 n (η(z) − b(θ)) + x(θ) σ−1 n n = 1+o(1), iff x σ−1 n n+o(1) If n/σn = o(1) Pθ σ−1 n (η(z) − b(θ)) + x σ−1 n n = φ(x)σn(1 + o(1))
  • 96. more comments Surprising : U(− n, n) limit when n σn but not that surprising since n = o(1) means concentration around θ0 and σn = o( n) implies that b(θ) − b(θ0) ≈ η(z) − η(y) again, no need to control approximation of f (η(y)|θ) by a Gaussian density: merely a control of the distribution generalisation to the case where eigenvalues of Σn are dn,1 = · · · = dn,k behaviour of EABC (θ|y) consistent with Li & Fearnhead (2016)
  • 97. more comments Surprising : U(− n, n) limit when n σn but not that surprising since n = o(1) means concentration around θ0 and σn = o( n) implies that b(θ) − b(θ0) ≈ η(z) − η(y) again, no need to control approximation of f (η(y)|θ) by a Gaussian density: merely a control of the distribution generalisation to the case where eigenvalues of Σn are dn,1 = · · · = dn,k behaviour of EABC (θ|y) consistent with Li & Fearnhead (2016)
  • 98. even more comments If (also) p(θ) is H¨older β EABC (θ|y) − θ0 = σn Zo b(θ0) score for f (η(y)|θ) + β/2 j=1 2j n Hj (θ0, p, b) bias from threshold approx +o(σn) + O( β+1 n ) with if 2 n = o(σn) : Efficiency EABC (θ|y) − θ0 = σn Zo b(θ0) + o(σn) the Hj (θ0, p, b)’s are deterministic c we gain nothing by getting a first crude ^θ(y) = EABC (θ|y) for some η(y) and then rerun ABC with ^θ(y)
  • 99. Illustration in the MA(2) setting Sample sizes of T = 500, 1000 Asymptotic normality rejected for εT = 1/T0.4 and for θ1, T = 500 and εT = 1/T0.55
  • 100. asymptotic behaviour of EABC [θ] When p = dim(η(y)) = d = dim(θ) and n = o(n−3/10) EABC [dT (θ − θ0)|yo ] ⇒ N(0, ( bo )T Σ−1 bo −1 [Li & Fearnhead (2018a)] In fact, if β+1 n √ n = o(1), with β H¨older-smoothness of π EABC [(θ−θ0)|yo ] = ( bo)−1Zo √ n + k j=1 hj (θ0) 2j n +op(1), 2k = β Iterating for fixed p mildly interesting: if ˜η(y) = EABC [θ|yo ] then EABC [θ|˜η(y)] = θ0 + ( bo)−1Zo √ n + π (θ0) π(θ0) 2 n + o() [Fearnhead & Prangle, 2012]
  • 101. asymptotic behaviour of EABC [θ] When p = dim(η(y)) = d = dim(θ) and n = o(n−3/10) EABC [dT (θ − θ0)|yo ] ⇒ N(0, ( bo )T Σ−1 bo −1 [Li & Fearnhead (2018a)] In fact, if β+1 n √ n = o(1), with β H¨older-smoothness of π EABC [(θ−θ0)|yo ] = ( bo)−1Zo √ n + k j=1 hj (θ0) 2j n +op(1), 2k = β Iterating for fixed p mildly interesting: if ˜η(y) = EABC [θ|yo ] then EABC [θ|˜η(y)] = θ0 + ( bo)−1Zo √ n + π (θ0) π(θ0) 2 n + o() [Fearnhead & Prangle, 2012]
  • 102. more asymptotic behaviour of EABC [θ] Li and Fearnhead (2018a,b) consider that EABC [dT (θ − θ0)|yo ] not optimal when p > d If √ n 2 n = o(1) and n √ n = o(1) √ n[EABC (θ) − θ0] = P bo Zo + op(1) Zo = √ n(η(y) − bo ) P bo Zo = (( bo )T bo )−1 ( bo )T Zo and Vas(P bo Zo ) ( bo )T Vas(Zo )−1 ( bo ) −1 If n √ n = o(1) √ n[EABC (θ)−θ0] = ( bo )T Σ−1 bo −1 ( bo )T Σ−1 Zo +op(1)
  • 103. impact of the dimension of η dimension of η(.) does not impact above result, but impacts acceptance probability if n = o(σn), k1 = dim(η(y)), k = dim(θ) & k1 k αn := Pr ( y − z n) k1 n σ−k1+k n if n σn αn := Pr ( y − z n) k n If we choose αn αn = o(σk n) leads to n = σn(αnσ−k n )1/k1 = o(σn) αn σn leads to n α 1/k n .
  • 104. Illustration in the MA(2) setting Sample sizes of T = 500, 1000 Asymptotic normality accepted for all graphs
  • 105. Practical implications In practice, tolerance determined by quantile (nearest neighbours): Select all θi associated with the α = δ/N smallest distances d2{η(zi ), η(y)} for some δ Then (i) if εT v−1 T or εT = o(v−1 T ), acceptance rate associated with the threshold εT is αT = pr ( η(z) − η(y) εT ) (vT εT )kη × v−kθ T v−kθ T (ii) if εT v−1 T , αT = pr ( η(z) − η(y) εT ) εkθ T v−kθ T
  • 106. Practical implications In practice, tolerance determined by quantile (nearest neighbours): Select all θi associated with the α = δ/N smallest distances d2{η(zi ), η(y)} for some δ Then (i) if εT v−1 T or εT = o(v−1 T ), acceptance rate associated with the threshold εT is αT = pr ( η(z) − η(y) εT ) (vT εT )kη × v−kθ T v−kθ T (ii) if εT v−1 T , αT = pr ( η(z) − η(y) εT ) εkθ T v−kθ T
  • 107. Curse of dimensionality For reasonable statistical behavior, rate of decline of αT the faster the larger the dimension of θ, kθ, but unaffected by dimension of η, kη Theoretical justification for dimension reduction methods that process parameter components individually and independently of other components [Beaumont & al., 2002; Fearnhead & Prangle, 2012; Martin& al., 2016] importance sampling approach of Li & Fearnhead (2018a) yields acceptance rates αT = O(1), when εT = O(1/vT )
  • 108. Curse of dimensionality For reasonable statistical behavior, rate of decline of αT the faster the larger the dimension of θ, kθ, but unaffected by dimension of η, kη Theoretical justification for dimension reduction methods that process parameter components individually and independently of other components [Beaumont & al., 2002; Fearnhead & Prangle, 2012; Martin& al., 2016] importance sampling approach of Li & Fearnhead (2018a) yields acceptance rates αT = O(1), when εT = O(1/vT )
  • 109. Monte Carlo error Link the choice of εT to Monte Carlo error associated with NT draws in Algorithm Conditions (on εT ) under which ^αT = αT {1 + op(1)} where ^αT = NT i=1 1l [d{η(y), η(z)} εT ] /NT proportion of accepted draws from NT simulated draws of θ Either (i) εT = o(v−1 T ) and (vT εT )−kη ε−kθ T MNT or (ii) εT v−1 T and ε−kθ T MNT for M large enough;
  • 110. conclusion on ABC consistency asymptotic description of ABC: different regimes depending on n & σn no point in choosing n arbitrarily small: just n = o(σn) no asymptotic gain in iterative ABC results under weak(er) conditions by not studying g(η(z)|θ)
  • 111. Mis-ABC-fied motivatoy example Approximate Bayesian computation ABC for model choice [some] asymptotics of ABC ABC under misspecification Misspecification Consequences
  • 112. Illustration Assumed data generating process (DGP) is z ∼ N(θ, 1) but actual DGP is y ∼ N(θ, ˜σ2) Use of summaries sample mean η1(y) = 1 n n i=1 yi centered summary η2(y) = 1 n−1 n i=1(yi − η1(y))2 − 1 Three ABC: ABC-AR: accept/reject approach with K (d{η(z), η(y)}) = 1l [d{η(z), η(y)} ] and d{x, y} = x − y ABC-K: smooth rejection approach, with K (d{η(z), η(y)}) univariate Gaussian kernel ABC-Reg: post-processing ABC approach with weighted linear regression adjustment
  • 113. Illustration Posterior means for ABC-AR, ABC-K and ABC-Reg as misspecification σ2 increases (N = 50, 000 simulated data sets) αn = n−5/9 quantile for ABC-AR ABC-K and ABC-Reg bandwidth of n−5/9
  • 114. Framework Data y with true distribution P0 Model P := {θ ∈ Θ ⊂ Rkθ : Pθ} Summary statistic η(y) = (η1(y), ..., ηkη (y)) Misspecification inf θ∈Θ D(P0||Pθ) = inf θ∈Θ − log dP0(y) dPθ(y) dP0(y) > 0, with θ∗ = arg inf θ∈Θ D(P0||Pθ) [Muller, 2013] ABC misspecification for b0 (resp. b(θ)) limit of η(y) (resp. η(z)) inf θ∈Θ d{b0, b(θ)} > 0 ABC pseudo-true value θ∗ = arg inf θ∈Θ d{b0, b(θ)}.
  • 115. Compulsory tolerance Under standard identification conditions on b(·) ∈ Rkη , there exists ∗ such that ∗ = inf θ∈Θ d{b0, b(θ)} > 0 c For tolerances n = o(1), once n < ∗ no draw of θ to be selected and posterior Π [A|η(y)] ill-conditioned But appropriately chosen tolerance sequence ( n)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 116. Compulsory tolerance Under standard identification conditions on b(·) ∈ Rkη , there exists ∗ such that ∗ = inf θ∈Θ d{b0, b(θ)} > 0 c For tolerances n = o(1), once n < ∗ no draw of θ to be selected and posterior Π [A|η(y)] ill-conditioned But appropriately chosen tolerance sequence ( n)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 117. ABC Posterior Concentration Under Misspecification Assumptions [A0] There exist continuous b : Θ → B and decreasing ρn(·) such that ρn(u) → 0 at infinity and Pθ [d{η(θ), b(θ)} > u] c(θ)ρn(u), Θ c(θ)dΠ(θ) < +∞ with existence of either (i) Polynomial deviations: positive sequence vn → +∞ and u0, κ > 0 such that ρn(u) = v−κ n u−κ , for u u0. (ii) Exponential deviations: hθ(·) > 0 such that Pθ[d{η(z), b(θ)} > u] c(θ)e−hθ(uvn) and there exist c, C > 0 such that Θ c(θ)e−hθ(uvn) dΠ(θ) Ce−c(uvn)τ
  • 118. ABC Posterior Concentration Under Misspecification Assumptions [A1] There exist D > 0 and M0, δ0 > 0 such that, for all δ0 δ > 0 and M M0, there exists Sδ ⊂ {θ ∈ Θ : d{b(θ), b0} − ∗ δ} for which In case (i), D < κ and Sδ 1 − c(θ) M dΠ(θ) δD . In case (ii), Sδ 1 − c(θ)e−hθ(M) dΠ(θ) δD .
  • 119. ABC Posterior Concentration Under Misspecification Under [A0] and [A1], for n ↓ ∗ with n ∗ + Mv−1 n + v−1 0,n if Mn sequence going to infinity and δn Mn{( n − ∗) + ρn( n − ∗)}, then Π [d{b(θ), b0} ∗ + δn|η(y)] = oP0 (1), (2) provided ρn( n − ∗ ) ( n − ∗ )−D/κ in case (i) ρn( n − ∗ ) | log( n − ∗ )|1/τ in case (ii) Further Π [d{θ, θ∗ } > δ|η(y)] = oP0 (1) [Berton & al., 2017; Frazier & al., 2017]
  • 120. ABC Posterior Concentration Under Misspecification Under [A0] and [A1], for n ↓ ∗ with n ∗ + Mv−1 n + v−1 0,n if Mn sequence going to infinity and δn Mn{( n − ∗) + ρn( n − ∗)}, then Π [d{b(θ), b0} ∗ + δn|η(y)] = oP0 (1), (2) provided ρn( n − ∗ ) ( n − ∗ )−D/κ in case (i) ρn( n − ∗ ) | log( n − ∗ )|1/τ in case (ii) Further Π [d{θ, θ∗ } > δ|η(y)] = oP0 (1) [Berton & al., 2017; Frazier & al., 2017]
  • 121. ABC Asymptotic Posterior Under further assumptions existence of positive definite matrix Σn(θ0) such that for all θ − θ0 δ and all 0 < u δvn Pθ [ Σn(θ0){η(z) − b(θ)} > u] c0u−κ parametric CLT: existence of sequence of positive definite matrices Σn(θ) such that for all θ in a neighbourhood of θ0 Σn(θ) ∗ vn, with vn → +∞ and Σn(θ){η(z) − b(θ)} ⇒ N(0, Ikη ), classical concentration assumptions If limn vn( n − ∗) = 0, then for Z0 n = Σn(θ∗){η(y) − b0} and Φkη (·) standard Normal lim n→+∞ Π Σn(θ∗ ){b(θ) − b(θ∗ )} − Z0 n ∈ B|η(y) = Φkη (B)
  • 122. Accept/Reject ABC revisited Given link between n and αn in Frazier & al. (2018), posterior concentration achieved through sequence of αn converging to zero ABC algorithm based on appropriate αn always yields (under appropriate regularity) approach that concentrates asymptotically, regardless of misspecification. (i) if ( n − ∗) v−1 n or ( n − ∗) = o(v−1 n ), then αn = Pr ( η(z) − η(y) n) (vn{ n − ∗ })kη × v−kθ n v−kθ n (ii) if ( n − ∗) v−1 n , then αn = Pr ( η(z) − η(y) n) { n − ∗ }kθ v−kθ n
  • 123. Regression adjustment In case of scalar θ, ABC-Reg first runs ABC-AR, with tolerance n, and obtains selected draws and summaries {θi , η(zi )} ABC-Reg second uses linear regression to predict accepted values of θ from η(z) through θi = α + β {η(y) − η(zi )} + νi ABC-Reg defines adjusted parameter draw as ˜θi = θi + ^β {η(y) − η(zi )} where ^β = 1 N N i=1 η(zi ) − ¯η η(zi ) − ¯η −1 1 N N i=1 η(zi ) − ¯η θi − ¯θ =
  • 124. ABC-Reg pseudo-consistency If n ∗ + Mv−1 n + v−1 0,n , with (i) ∗ = infθ∈Θ d{b(θ), b0} > 0 (ii) θ∗ = arg infθ∈Θ d{b(θ), b0} exists. (iii) For some β0 with β0 > 0, ^β − β0 = oPθ (1). then Π [|˜θ − ˜θ∗ | > δ|η(y)] = oP0 (1), [Frazier & al., 2017]