SlideShare a Scribd company logo
1 of 143
Download to read offline
Asymptotics of ABC
Christian P. Robert
U. Paris Dauphine PSL & Warwick U.
Workshop in honour of Prof. Donald Poskitt and
Gael Martin, 29 Nov 2023
Co-authors
▶ Jean-Marie Cornuet, Arnaud Estoup, Jean-Michel
Marin (Montpellier)
▶ David Frazier, Gael Martin (Monash U)
▶ Kerrie Mengersen (QUT)
▶ Espen Bernton, Pierre Jacob, Natesh Pillai
(Harvard U)
▶ Caroline Lawless, Judith Rousseau (U Oxford)
▶ Grégoire Clarté, Robin Ryder, Julien Stoehr (U
Paris Dauphine)
Co-authors
Outline
1 Motivation
2 Concept
3 Implementation
4 Convergence
5 Advanced topics
Posterior distribution
Central concept of Bayesian inference:
π( θ
|{z}
parameter
| xobs
|{z}
observation
)
proportional
z}|{
∝ π(θ)
|{z}
prior
× L(θ | xobs
)
| {z }
likelihood
Drives
▶ derivation of optimal decisions
▶ assessment of uncertainty
▶ model selection
▶ prediction
[McElreath, 2015]
Monte Carlo representation
Exploration of Bayesian posterior π(θ|xobs) may (!) require to
produce sample
θ1, . . . , θT
distributed from π(θ|xobs) (or asymptotically by Markov chain
Monte Carlo aka MCMC)
1984 1986/1989 1987 1990 1994 1999/2003
Kloek & van Dijk
1978
R
M(θ)I(θ)dθ = E[M(θ)]
Geman & Geman
Tierney & Kadane Tanner & Wong
b
π(θ|y) = m−1
Pm
j=1 p θ | z(j)
, y

Gelfand  Smith
[X | Y, Z]
[Y | Z, X]
[Z | X, Y ]
Carter  Kohn/Frühwirth-Schnatter
Roberts  Rosenthal/Neal
[McElreath, 2015]
From then ’til now!
[Martin et al., 2023-2024]
Difficulties
MCMC = workhorse of practical
Bayesian analysis (BUGS, JAGS,
Stan, tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically
unavailable or too costly to compute
Only partial solutions are available:
▶ demarginalisation (latent variables)
▶ exchange algorithm (auxiliary variables)
▶ pseudo-marginal (unbiased estimator)
Difficulties
MCMC = workhorse of practical
Bayesian analysis (BUGS, JAGS,
Stan, tc.), except when product
π(θ) × L(θ | xobs
)
well-defined but numerically
unavailable or too costly to compute
Only partial solutions are available:
▶ demarginalisation (latent variables)
▶ exchange algorithm (auxiliary variables)
▶ pseudo-marginal (unbiased estimator)
Example 1: Dynamic mixture
Mixture model
{1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x  0
where
▶ fβ,λ Weibull density,
▶ gε,σ generalised Pareto density, and
▶ wµ,τ Cauchy (arctan) cdf
Intractable normalising constant
C(µ, τ, β, λ, ε, σ) =
Z∞
0
{(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx
[Frigessi, Haug  Rue, 2002]
Example 2: truncated Normal
Given complex set A ⊂ Rk (with k large), truncated Normal
model
f(x | µ, Σ, A) ∝ exp{−(x − µ)T
Σ−1
(x − µ)/2} IA(x)
with intractable normalising constant
C(µ, Σ, A) =
Z
A
exp{−(x − µ)T
Σ−1
(x − µ)/2}dx
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^
µn = med(x1, . . . , xn)
and
^
σn = mad(x1, . . . , xn)
= med |xi − ^
µn|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^
µn, ^
σn) straightforward
Example 3: robust Normal statistics
Normal sample
x1, . . . , xn ∼ N(µ, σ2
)
summarised into (insufficient)
^
µn = med(x1, . . . , xn)
and
^
σn = mad(x1, . . . , xn)
= med |xi − ^
µn|
Under a conjugate prior π(µ, σ2), posterior close to intractable.
but simulation of (^
µn, ^
σn) straightforward
Insufficient Gibbs sampling
[Luciano et al., 2023]
Example 4: exponential random graph
ERGM: binary random vector x indexed
by all edges on set of nodes plus graph
f(x | θ) =
1
C(θ)
exp(θT
S(x))
with S(x) vector of statistics and C(θ)
intractable normalising constant
[Grelaud  al., 2009; Everitt, 2012; Bouranis  al., 2017]
Realistic[er] applications
▶ Kingman’s coalescent in population genetics
[Tavaré et al., 1997; Beaumont et al., 2003]
▶ α-stable distributions
[Peters et al, 2012]
▶ complex networks
[Dutta et al., 2018]
▶ astrostatistics  cosmostatistics
[Cameron  Pettitt, 2012; Ishida et al., 2015]
▶ state space models
[Martin et al., 2019]
Auxiliary likelihood-based ABC in state space
models
[Martin et al., 2019]
Concept
2 Concept
algorithm
summary statistics
random forests
calibration of tolerance
3 Implementation
4 Convergence
5 Advanced topics
A?B?C?
▶ A stands for approximate [wrong
likelihood]
▶ B stands for Bayesian [right prior]
▶ C stands for computation [producing
a parameter sample]
A?B?C?
▶ Rough version of the data [from dot
to ball]
▶ Non-parametric approximation of
the likelihood [near actual
observation]
▶ Use of non-sufficient statistics
[dimension reduction]
▶ Monte Carlo error [and no
unbiasedness]
ABC+
timeframe
1999 2003/2009 2009 2011 2015+
Tavaré et al./Pritchard et al.
1997/1999
pic
Jordan et al.
ELBO(q) = E[log p(θ, y)] − E[log q(θ]
Beaumont/Andrieu  Roberts
α θn, θ0

= min

1, π̂θ0
π̂θn
q(θn|θ0
)
q(θ0|θn)

Rue et al.
π̃ (θ|y) ∝ π(x,θ,y)
πG(x|θ,y) |x=x∗(θ)
Neal/Hoffman  Gelman
pic
Hybrid Approximate Methods
pA(θ|y)
[Martin et al., 2023]
A seemingly naı̈ve representation
When likelihood f(x|θ) not in closed form, likelihood-free
rejection technique:
ABC-AR algorithm
For an observation xobs ∼ f(x|θ), under the prior π(θ), keep
jointly simulating
θ′
∼ π(θ) , z ∼ f(z|θ′
) ,
until the auxiliary variable z is equal to the observed value,
z = xobs
[Diggle  Gratton, 1984; Rubin, 1984; Tavaré et al., 1997]
Why does it work?
The mathematical proof is trivial:
f(θi) ∝
X
z∈D
π(θi)f(z|θi)Iy(z)
∝ π(θi)f(y|θi)
= π(θi|y)
[Accept–Reject 101]
But very impractical when
Pθ(Z = xobs
) ≈ 0
A as approximative
When y is a continuous random variable, strict equality
z = xobs
is replaced with a tolerance zone
ρ(xobs
, z) ≤ ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z)  ε}
def
∝ π(θ|ρ(xobs
, z)  ε)
[Pritchard et al., 1999]
A as approximative
When y is a continuous random variable, strict equality
z = xobs
is replaced with a tolerance zone
ρ(xobs
, z) ≤ ε
where ρ is a distance
Output distributed from
π(θ) Pθ{ρ(xobs
, z)  ε}
def
∝ π(θ|ρ(xobs
, z)  ε)
[Pritchard et al., 1999]
ABC algorithm
Algorithm Likelihood-free rejection sampler
for i = 1 to N do
repeat
generate θ′ from prior π(·)
generate z from sampling density f(·|θ′)
until ρ{η(z), η(xobs)} ≤ ε
set θi = θ′
end for
where η(xobs) defines a (not necessarily sufficient) statistic
Custom: η(xobs) called summary statistic
Example 3: robust Normal statistics
mu=rnorm(N-1e6) #prior
sig=sqrt(rgamma(N,2,2))
medobs=median(obs)
madobs=mad(obs) #summary
for(t in diz-1:N){
psud=rnorm(1e2)/sig[t]+mu[t]
medpsu=median(psud)-medobs
madpsu=mad(psud)-madobs
diz[t]=medpsu^2+madpsu^2}
#ABC subsample
subz=which(dizquantile(diz,.1))
Exact ABC posterior
Algorithm samples from marginal in z of [exact] posterior
πABC
ε (θ, z|xobs
) =
π(θ)f(z|θ)IAε,xobs
(z)
R
Aε,xobs ×Θ π(θ)f(z|θ)dzdθ
,
where Aε,xobs = {z ∈ D|ρ{η(z), η(xobs)}  ε}.
Intuition that proper summary statistics coupled with small
tolerance ε = εη should provide good approximation of the
posterior distribution:
πABC
ε (θ|xobs
) =
Z
πABC
ε (θ, z|xobs
)dz ≈ π{θ|η(xobs
)}
Exact ABC posterior
Algorithm samples from marginal in z of [exact] posterior
πABC
ε (θ, z|xobs
) =
π(θ)f(z|θ)IAε,xobs
(z)
R
Aε,xobs ×Θ π(θ)f(z|θ)dzdθ
,
where Aε,xobs = {z ∈ D|ρ{η(z), η(xobs)}  ε}.
Intuition that proper summary statistics coupled with small
tolerance ε = εη should provide good approximation of the
posterior distribution:
πABC
ε (θ|xobs
) =
Z
πABC
ε (θ, z|xobs
)dz ≈ π{θ|η(xobs
)}
summary statistics
2 Concept
algorithm
summary statistics
random forests
calibration of tolerance
3 Implementation
4 Convergence
5 Advanced topics
Why summaries?
▶ reduction of dimension
▶ improvement of signal to noise ratio
▶ reduce tolerance ε considerably
▶ whole data may be unavailable (as in Example 3)
medobs=median(obs)
madobs=mad(obs) #summary
Example 6: MA inference
Moving average model MA(2):
xt = εt + θ1εt−1 + θ2εt−2 εt
iid
∼ N(0, 1)
Comparison of raw series:
Example 6: MA inference
Moving average model MA(2):
xt = εt + θ1εt−1 + θ2εt−2 εt
iid
∼ N(0, 1)
[Feller, 1970]
Comparison of acf’s:
Example 6: MA inference
Summary vs. raw:
Why not summaries?
▶ loss of sufficient information when πABC(θ|xobs) replaced
with πABC(θ|η(xobs))
▶ arbitrariness of summaries
▶ uncalibrated approximation
▶ whole data may be available (at same cost as summaries)
▶ (empirical) distributions may be compared (Wasserstein
distances)
[Bernton et al., 2019]
ABC with Wasserstein distances
[Bernton et al., 2019]
ABC with Wasserstein distances
Use as distance between simulated and observed samples the
Wasserstein distance:
Wp(y1:n, z1:n)p
= inf
σ∈Sn
1
n
n
X
i=1
ρ(yi, zσ(i))p
,
▶ covers well- and mis-specified cases
▶ only depends on data space distance ρ(·, ·)
▶ covers iid and dynamic models (curve matching)
▶ computionally feasible (linear in dimension, cubic in sample
size)
▶ Hilbert curve approximation in higher dimensions
[Bernton et al., 2019]
Consistent inference with Wasserstein distance
As ε → 0 [and n fixed]
If either
1. f
(n)
θ is n-exchangeable and D(y1:n, z1:n) = 0 if and only if
z1:n = yσ(1:n) for some σ ∈ Sn, or
2. D(y1:n, z1:n) = 0 if and only if z1:n = y1:n.
then, at y1:n fixed, ABC posterior converges strongly to original
posterior as ε → 0.
[Bernton et al., 2019]
Consistent inference with Wasserstein distance
As n → ∞ [at ε fixed]
WABC distribution with a fixed ε does not converge in n to a
Dirac mass
[Bernton et al., 2019]
Consistent inference with Wasserstein distance
As εn → 0 and n → ∞
Under range of assumptions, if fn(εn) → 0, and
P(W(^
µn, µ⋆) ≤ εn) → 1,
then WABC posterior with threshold εn + ε⋆ satisfies
πεn+ε⋆

{θ ∈ H : W(µ⋆, µθ)  ε⋆ + 4εn/3 + f−1
n (εL
n/R)} |y1:n
 P
≤ δ
[Bernton et al., 2019]
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
1. Starting from large collection of summary statistics, Joyce
and Marjoram (2008) consider the sequential inclusion into the
ABC target, with a stopping rule based on a likelihood ratio test
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
2. Based on decision-theoretic principles, Fearnhead and
Prangle (2012) end up with E[θ|xobs] as the optimal summary
statistic
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
3. Use of indirect inference by Drovandi, Pettit,  Paddy
(2011) with estimators of parameters of auxiliary model as
summary statistics
Summary selection strategies
Fundamental difficulty of selecting summary statistics when
there is no non-trivial sufficient statistic [except when done by
experimenters from the field]
4. Starting from large collection of summary statistics, Raynal
 al. (2018, 2019) rely on random forests to build estimators
and select summaries
Semi-automated ABC
Use of summary statistic η(·), importance proposal g(·), kernel
K(·) ≤ 1 with bandwidth h ↓ 0 such that
(θ, z) ∼ g(θ)f(z|θ)
accepted with probability (hence the bound)
K[{η(z) − η(xobs
)}/h]
and the corresponding importance weight defined by
π(θ)

g(θ)
Theorem Optimality of posterior expectation E[θ|xobs] of
parameter of interest as summary statistics η(xobs)
[Fearnhead  Prangle, 2012; Sisson et al., 2019]
Semi-automated ABC
Use of summary statistic η(·), importance proposal g(·), kernel
K(·) ≤ 1 with bandwidth h ↓ 0 such that
(θ, z) ∼ g(θ)f(z|θ)
accepted with probability (hence the bound)
K[{η(z) − η(xobs
)}/h]
and the corresponding importance weight defined by
π(θ)

g(θ)
Theorem Optimality of posterior expectation E[θ|xobs] of
parameter of interest as summary statistics η(xobs)
[Fearnhead  Prangle, 2012; Sisson et al., 2019]
random forests
2 Concept
algorithm
summary statistics
random forests
calibration of tolerance
3 Implementation
4 Convergence
5 Advanced topics
Random forests
Technique that stemmed from Leo Breiman’s bagging (or
bootstrap aggregating) machine learning algorithm for both
classification [testing] and regression [estimation]
[Breiman, 1996]
Improved performances by averaging over classification schemes
of randomly generated training sets, creating a “forest” of
(CART) decision trees, inspired by Amit and Geman (1997)
ensemble learning
[Breiman, 2001]
Growing the forest
Breiman’s solution for inducing random features in the trees of
the forest:
▶ boostrap resampling of the dataset and
▶ random subset-ing [of size
√
t] of the covariates driving the
classification or regression at every node of each tree
Covariate (summary) xτ that drives the node separation
xτ ≷ cτ
and the separation bound cτ chosen by minimising entropy or
Gini index
ABC with random forests
Idea: Starting with
▶ possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
▶ ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
ABC with random forests
Idea: Starting with
▶ possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
▶ ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Average of the trees is resulting summary statistics, highly
non-linear predictor of the model index
ABC with random forests
Idea: Starting with
▶ possibly large collection of summary statistics (η1, . . . , ηp)
(from scientific theory input to available statistical
softwares, methodologies, to machine-learning alternatives)
▶ ABC reference table involving model index, parameter
values and summary statistics for the associated simulated
pseudo-data
run R randomforest to infer M or θ from (η1i, . . . , ηpi)
Potential selection of most active summaries, calibrated against
pure noise
Classification of summaries by random forests
Given large collection of summary statistics, rather than
selecting a subset and excluding the others, estimate each
parameter by random forests
▶ handles thousands of predictors
▶ ignores useless components
▶ fast estimation with good local properties
▶ automatised with few calibration steps
▶ substitute to Fearnhead and Prangle (2012) preliminary
estimation of ^
θ(yobs)
▶ includes a natural (classification) distance measure that
avoids choice of either distance or tolerance
[Marin et al., 2016, 2018]
calibration of tolerance
2 Concept
algorithm
summary statistics
random forests
calibration of tolerance
3 Implementation
4 Convergence
5 Advanced topics
Calibration of tolerance
Calibration of threshold ε
▶ from scratch [how small is small?]
▶ from k-nearest neighbour perspective [quantile of prior
predictive]
subz=which(dizquantile(diz,.1))
▶ from asymptotics [convergence speed]
▶ related with choice of distance [automated selection by
random forests]
[Fearnhead  Prangle, 2012; Biau et al., 2013; Liu  Fearnhead 2018]
Implementation
3 Implementation
ABC-MCMC
ABC-PMC
post-processed ABC
4 Convergence
5 Advanced topics
Sofware
Several ABC R packages for performing parameter
estimation and model selection
[Nunes  Prangle, 2017]
ABC-IS
Basic ABC algorithm limitations
▶ blind [no learning]
▶ inefficient [curse of dimension]
▶ inapplicable to improper priors
Importance sampling version
▶ importance density g(θ)
▶ bounded kernel function Kh with bandwidth h
▶ acceptance probability of
Kh{ρ[η(xobs
), η(x{θ})]} π(θ)

g(θ) max
θ
Aθ
[Fearnhead  Prangle, 2012]
ABC-IS
Basic ABC algorithm limitations
▶ blind [no learning]
▶ inefficient [curse of dimension]
▶ inapplicable to improper priors
Importance sampling version
▶ importance density g(θ)
▶ bounded kernel function Kh with bandwidth h
▶ acceptance probability of
Kh{ρ[η(xobs
), η(x{θ})]} π(θ)

g(θ) max
θ
Aθ
[Fearnhead  Prangle, 2012]
ABC-MCMC
Markov chain (θ(t)) created via transition function
θ(t+1)
=







θ′ ∼ Kω(θ′|θ(t)) if x ∼ f(x|θ′) is such that x ≈ y
and u ∼ U(0, 1) ≤ π(θ′)Kω(θ(t)|θ′)
π(θ(t))Kω(θ′|θ(t))
,
θ(t) otherwise,
has the posterior π(θ|y) as stationary distribution
[Marjoram et al, 2003]
ABC-MCMC
Algorithm Likelihood-free MCMC sampler
get (θ(0), z(0)) by Algorithm 1
for t = 1 to N do
generate θ′ from Kω ·|θ(t−1)

, z′ from f(·|θ′), u from U[0,1],
if u ≤ π(θ′)Kω(θ(t−1)|θ′)
π(θ(t−1)Kω(θ′|θ(t−1))
IAε,xobs
(z′) then
set (θ(t), z(t)) = (θ′, z′)
else
(θ(t), z(t))) = (θ(t−1), z(t−1)),
end if
end for
ABC-PMC
Generate a sample at iteration t by
^
πt(θ(t)
) ∝
N
X
j=1
ω
(t−1)
j Kt(θ(t)
|θ
(t−1)
j )
modulo acceptance of the associated xt, with tolerance εt ↓, and
use importance weight associated with accepted simulation θ
(t)
i
ω
(t)
i ∝ π(θ
(t)
i )

^
πt(θ
(t)
i )
© Still likelihood free
[Sisson et al., 2007; Beaumont et al., 2009]
ABC-SMC
Use of a kernel Kt associated with target πεt and derivation of
the backward kernel
Lt−1(z, z′
) =
πεt (z′)Kt(z′, z)
πεt (z)
Update of the weights
ω
(t)
i ∝ ω
(t−1)
i
PM
m=1 IAεt
(x
(t)
im)
PM
m=1 IAεt−1
(x
(t−1)
im )
when x
(t)
im ∼ Kt(x
(t−1)
i , ·)
[Del Moral, Doucet  Jasra, 2009]
ABC-NP
Better usage of [prior] simulations by
adjustement: instead of throwing
away θ′ such that ρ(η(z), η(xobs))  ε,
replace θ’s with locally regressed
transforms
θ∗
= θ − {η(z) − η(xobs
)}T ^
β
[Csilléry et al., TEE, 2010]
where ^
β is obtained by [NP] weighted least square regression on
(η(z) − η(xobs)) with weights
Kδ ρ(η(z), η(xobs
))
[Beaumont et al., 2002, Genetics]
ABC-NN
Incorporating non-linearities and heterocedasticities:
θ∗
= ^
m(η(xobs
)) + [θ − ^
m(η(z))]
^
σ(η(xobs))
^
σ(η(z))
where
▶ ^
m(η) estimated by non-linear regression (e.g., neural
network)
▶ ^
σ(η) estimated by non-linear regression on residuals
log{θi − ^
m(ηi)}2
= log σ2
(ηi) + ξi
[Blum  François, 2009; Li  Fearnhead, 2018; Lawless et al., 2023]
Convergence
4 Convergence
ABC inference consistency
ABC model choice consistency
ABC under misspecification
5 Advanced topics
Convergence
4 Convergence
ABC inference consistency
ABC model choice consistency
ABC under misspecification
5 Advanced topics
Asymptotics of ABC
Since πABC(· | xobs) is an approximation of π(· | xobs) or
π(· | η(xobs)) coherence of ABC-based inference need be
established on its own
[Li  Fearnhead, 2018a,b; Frazier et al., 2018,2020]
Meaning
▶ establishing large sample (n) properties of ABC posteriors
and ABC procedures
▶ finding sufficient conditions and checks on summary
statistics η(·)
▶ determining proper rate ε = εn of convergence of tolerance
to 0
▶ [mostly] ignoring Monte Carlo errors
Consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ  0,
Π

∥θ − θ0∥  δ | ρ{η(xobs
), η(Z)} ≤ ε

→ 0
as n → +∞, ε → 0
Bayesian consistency implies that sets containing θ0 have
posterior probability tending to one as n → +∞, with
implication being the existence of a specific rate of
concentration
Consistency of ABC posteriors
ABC algorithm Bayesian consistent at θ0 if for any δ  0,
Π

∥θ − θ0∥  δ | ρ{η(xobs
), η(Z)} ≤ ε

→ 0
as n → +∞, ε → 0
▶ Concentration around true value and Bayesian consistency
impose less stringent conditions on the convergence speed
of tolerance εn to zero, when compared with asymptotic
normality of ABC posterior
▶ asymptotic normality of ABC posterior mean does not
require asymptotic normality of ABC posterior
Asymptotic setup
Assumptions:
▶ asymptotic: xobs = xobs(n)
∼ Pn
θ0
and ε = εn, n → +∞
▶ parametric: θ ∈ Rk, k fixed concentration of summary
statistic η(zn):
∃b : θ → b(θ) η(zn
) − b(θ) = oPθ
(1), ∀θ
▶ identifiability of parameter b(θ) ̸= b(θ′) when θ ̸= θ′
Consistency of ABC posteriors
▶ Concentration of summary η(z): there exists b(θ) such that
η(z) − b(θ) = oPθ
(1)
▶ Consistency:
Πεn

∥θ − θ0∥ ≤ δ | η(xobs
)

= 1 + op(1)
▶ Convergence rate: there exists δn = o(1) such that
Πεn

∥θ − θ0∥ ≤ δn | η(xobs
)

= 1 + op(1)
Consistency of ABC posteriors
▶ Consistency:
Πεn

∥θ − θ0∥ ≤ δ | η(xobs
)

= 1 + op(1)
▶ Convergence rate: there exists δn = o(1) such that
Πεn

∥θ − θ0∥ ≤ δn | η(xobs
)

= 1 + op(1)
▶ Point estimator consistency
^
θε = EABC[θ|η(xobs(n)
)], EABC[θ | η(xobs(n)
)] − θ0 = op(1)
vn(EABC[θ | η(xobs(n)
)] − θ0) ⇒ N(0, v)
Asymptotic shape of posterior distribution
Shape of
Π

· | ∥η(xobs
), η(z)∥ ≤ εn

depending on relation between εn and rate σn at which η(xobsn
)
satisfy CLT
Three different regimes:
1. σn = o(εn) −→ Uniform limit
2. σn ≍ εn −→ perturbated Gaussian limit
3. σn ≫ εn −→ Gaussian limit
Curse of dimension
▶ for reasonable statistical behavior, decline of acceptance αn
the faster the larger the dimension of θ, kθ, but unaffected
by dimension of η, kη
▶ theoretical justification for dimension reduction methods
that process parameter components individually and
independently of other components
[Fearnhead  Prangle, 2012; Martin  al., 2019]
▶ importance sampling approach of Li  Fearnhead (2018a)
yields acceptance rates αn = O(1), when εn = O(1/vn)
Monte Carlo error
Link the choice of εn to Monte Carlo error associated with Nn
draws in ABC Algorithm
Conditions (on εn) under which
^
αn = αn{1 + op(1)}
where ^
αn =
PNn
i=1 I [d{η(y), η(z)} ≤ εn] /Nn proportion of
accepted draws from Nn simulated draws of θ
Either
(i) εn = o(v−1
n ) and (vnεn)−kη ε−kθ
n ≤ MNn
or
(ii) εn ≳ v−1
n and ε−kθ
n ≤ MNn
for M large enough
What if rates differ?
[Lawless et al., 2023]
What if rates differ?
Different convergence rates for summaries, i.e., for all 1 ≤ j ≤ k,
existence of sequence (vnj)n such that
vnj (b(θ)j − η(z)j) = OP(1)
and tolerance sequence (εn)n satisfies
lim
n→∞
vnjεn = 0 ∀1 ≤ j ≤ k0
lim
n→∞
vnjεn = ∞ ∀k0 + 1 ≤ j ≤ k
called slow and fast components, resp.
[Lawless et al., 2023]
What if rates differ?
Under some assumptions slightly weaker than those of Frazier
et al. (2018) and Li  Fearnhead (2018),
1. ABC posterior distribution concentrates: for a sequence
(λn)n with λn ↓ 0
Z
|θ−θ0|≥λn
πεn (θ|η(y))dθ = oP(1)
[Lawless et al., 2023]
What if rates differ?
Under some assumptions slightly weaker than those of Frazier
et al. (2018) and Li  Fearnhead (2018),
2. Asymptotically,
πεn (θ|η(y))
∼
∝ I{|∇b(2)(θ0)(θ−θ0)|≤εn} 1 − |∇b(2)(θ0)(θ−θ0)|2
/ε2
n
k0/2
,
where b(2) = (b(k0+1), . . . , bk)
[Lawless et al., 2023]
What if rates differ?
Under some assumptions slightly weaker than those of Frazier
et al. (2018) and Li  Fearnhead (2018),
3. Local linear post-processing à la Beaumont (2002) achieves
posterior contraction rate
O(max(vn,k0+1, ε2
n))
while vanill’ABC faces posterior contraction rate
O(max(vn,k0+1, εn))
i.e. εn is replaced by ε2
n at no additional computational
cost
[Lawless et al., 2023]
What if rates differ?
−3.0 −2.5 −2.0 −1.5 −1.0 −0.5
−4
−3
−2
−1
0
..
error
o
o
Vanilla ABC
Regression−adjusted ABC
εn
104 iid Xi ∼ U(θ, θ + 1) with
S1 = 1
√
n
P√
n
i=1 Xi [v1n = n−1/4]
and S2 = min1≤i≤n Xi
[vn = n−1]
Posterior risks E[(θ − θ0)2]1/2,
with rates εn and
ε2
n, resp., until
εn  vn,k0+1 = n−1
[Lawless et al., 2023]
ABC model choice consistency
4 Convergence
ABC inference consistency
ABC model choice consistency
ABC under misspecification
5 Advanced topics
Bayesian model choice
Model candidates M1, M2, . . . to be compared for dataset xobs
making model index M part of inference
Use of a prior distribution, π(M = m), plus a prior distribution
on the parameter conditional on the value m of the model
index, πm(θm)
Goal to derive the posterior distribution of M, challenging
computational target when models are complex
[Savage, 1964; Berger, 1980]
Generic ABC for model choice
Algorithm Likelihood-free model choice sampler (ABC-MC)
for t = 1 to T do
repeat
Generate m from the prior π(M = m)
Generate θm from the prior πm(θm)
Generate z from the model fm(z|θm)
until ρ{η(z), η(xobs)}  ε
Set m(t) = m and θ(t) = θm
end for
[Cornuet et al., DIYABC, 2009]
ABC model choice consistency
Leaving approximations aside, limiting ABC procedure is Bayes
factor based on η(xobs)
B12(η(xobs
))
Potential loss of information at the testing level
[Robert et al., 2010]
When is Bayes factor based on insufficient statistic η(xobs)
consistent?
[Marin et al., 2013]
ABC model choice consistency
Leaving approximations aside, limiting ABC procedure is Bayes
factor based on η(xobs)
B12(η(xobs
))
Potential loss of information at the testing level
[Robert et al., 2010]
When is Bayes factor based on insufficient statistic η(xobs)
consistent?
[Marin et al., 2013]
Example 7: Gauss versus Laplace
Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2:
xobs ∼ L(θ2, 1/
√
2)⊗n, Laplace distribution with mean θ2 and
variance one
Four possible statistics η(xobs)
1. sample mean xobs (sufficient for M1 if not M);
2. sample median med(xobs) (insufficient);
3. sample variance var(xobs) (ancillary);
4. median absolute deviation
mad(xobs) = med(|xobs − med(xobs)|);
Example 7: Gauss versus Laplace
Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2:
xobs ∼ L(θ2, 1/
√
2)⊗n, Laplace distribution with mean θ2 and
variance one
●
●
●
●
●
●
●
●
●
●
●
Gauss Laplace
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
n=100
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Gauss Laplace
0.0
0.2
0.4
0.6
0.8
1.0
n=100
Consistency
Summary statistics
η(xobs
) = (τ1(xobs
), τ2(xobs
), · · · , τd(xobs
)) ∈ Rd
with
▶ true distribution η ∼ Gn, true mean µ0,
▶ distribution Gi,n under model Mi, corresponding posteriors
πi(· | ηn)
Assumptions of central limit theorem and large deviations for
η(xobs) under true, plus usual Bayesian asymptotics with di
effective dimension of the parameter)
[Pillai et al., 2013]
Asymptotic marginals
Asymptotically
mi,n(t) =
Z
Θi
gi,n(t|θi) πi(θi) dθi
such that
(i) if inf{|µi(θi) − µ0|; θi ∈ Θi} = 0,
Cl
√
n
d−di
≤ mi,n(ηn
) ≤ Cu
√
n
d−di
and
(ii) if inf{|µi(θi) − µ0|; θi ∈ Θi}  0
mi,n(ηn
) = oPn [
√
n
d−τi
+
√
n
d−αi
].
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Indeed, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
Cl
√
n
−(d1−d2)
≤ m1,n(ηn
)

m2(ηn
) ≤ Cu
√
n
−(d1−d2)
,
where Cl, Cu = OPn (1), irrespective of the true model.
© Only depends on the difference d1 − d2: no consistency
Between-model consistency
Consequence of above is that asymptotic behaviour of the Bayes
factor is driven by the asymptotic mean value µ(θ) of ηn under
both models. And only by this mean value!
Else, if
inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2}  inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0
then
m1,n(ηn)
m2,n(ηn)
≥ Cu min
√
n
−(d1−α2)
,
√
n
−(d1−τ2)
Checking for adequate statistics
Run a practical check of the relevance (or non-relevance) of ηn
null hypothesis that both models are compatible with the
statistic ηn
H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0
against
H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2}  0
testing procedure provides estimates of mean of ηn under each
model and checks for equality
ABC under misspecification
4 Convergence
ABC inference consistency
ABC model choice consistency
ABC under misspecification
5 Advanced topics
ABC under misspecification
ABC methods rely on simulations z(θ) from the model to
identify those close to xobs
What is happening when the model is wrong?
▶ for some tolerance sequences εn ↓ ε∗, well-behaved ABC
posteriors that concentrate posterior mass on pseudo-true
value
▶ if εn too large, asymptotic limit of ABC posterior uniform
with radius of order εn − ε∗
▶ even if
√
n{εn − ε∗} → 2c ∈ R, limiting distribution no
longer Gaussian
▶ ABC credible sets invalid confidence sets
[Frazier et al., 2020]
ABC under misspecification
ABC methods rely on simulations z(θ) from the model to
identify those close to xobs
What is happening when the model is wrong?
▶ for some tolerance sequences εn ↓ ε∗, well-behaved ABC
posteriors that concentrate posterior mass on pseudo-true
value
▶ if εn too large, asymptotic limit of ABC posterior uniform
with radius of order εn − ε∗
▶ even if
√
n{εn − ε∗} → 2c ∈ R, limiting distribution no
longer Gaussian
▶ ABC credible sets invalid confidence sets
[Frazier et al., 2020]
Example 8: Normal model with wrong variance
Assumed data generating process (DGP) is z ∼ N(θ, 1) but
actual DGP is xobs ∼ N(θ, σ̃2)
Use of summaries
▶ sample mean η1(xobs) = 1
n
Pn
i=1 xi
▶ centered summary η2(xobs) = 1
n−1
Pn
i=1(xi − η1(xobs)2 − 1
Three ABC:
▶ ABC-AR: accept/reject approach with
Kε(d{η(z), η(xobs)}) = I

d{η(z), η(xobs)} ≤ ε

and
d{x, y} = ∥x − y∥
▶ ABC-K: smooth rejection approach, with
Kε(d{η(z), η(xobs)}) univariate Gaussian kernel
▶ ABC-Reg: post-processing ABC approach with weighted
linear regression adjustment
Example 8: Normal model with wrong variance
▶ posterior means for ABC-AR, ABC-K and ABC-Reg as σ2
increases (N = 50, 000 simulated data sets)
▶ αn = n−5/9 quantile for ABC-AR
▶ ABC-K and ABC-Reg bandwidth of n−5/9
[Frazier et al., 2020]
ABC misspecification
▶ data xobs with true distribution P0 assumed issued from
model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic
η(xobs) = (η1(xobs), ..., ηkη (xobs))
▶ misspecification
inf
θ∈Θ
D(P0||Pθ) = inf
θ∈Θ
Z
log
dP0(x)
dPθ(x)
dP0(y)  0,
with
θ∗
= arg inf
θ∈Θ
D(P0||Pθ)
[Muller, 2013]
▶ ABC misspecification:
for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z))
inf
θ∈Θ
d{b0, b(θ)}  0
ABC misspecification
▶ data xobs with true distribution P0 assumed issued from
model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic
η(xobs) = (η1(xobs), ..., ηkη (xobs))
▶ ABC misspecification:
for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z))
inf
θ∈Θ
d{b0, b(θ)}  0
▶ ABC pseudo-true value:
θ∗
= arg inf
θ∈Θ
d{b0, b(θ)}.
Minimum tolerance
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)}  0
Once εn  ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Minimum tolerance
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)}  0
Once εn  ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
Minimum tolerance
Under identification conditions on b(·) ∈ Rkη , there exists ε∗
such that
ε∗
= inf
θ∈Θ
d{b0, b(θ)}  0
Once εn  ε∗ no draw of θ to be selected and posterior
Πε[A|η(xobs)] ill-conditioned
But appropriately chosen tolerance sequence (εn)n allows
ABC-based posterior to concentrate on distance-dependent
pseudo-true value θ∗
ABC concentration under misspecification
Assumptions
[A0] Existence of unique b0 such that d(η(xobs), b0) = oP0
(1)
and of sequence v0,n → +∞ such that
lim inf
n→+∞
P0
h
d(η(xobs
n ), b0) ≥ v−1
0,n
i
= 1.
ABC concentration under misspecification
Assumptions
[A1] Existence of injective map b : Θ → B ⊂ Rkη and function
ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing,
such that
Pθ [d(η(Z), b(θ))  u] ≤ c(θ)ρn(u),
Z
Θ
c(θ)dΠ(θ)  ∞
and assume either
(i) Polynomial deviations: existence of vn ↑ +∞ and
u0, κ  0 such that ρn(u) = v−κ
n u−κ, for u ≤ u0
(ii) Exponential deviations:
ABC concentration under misspecification
Assumptions
[A1] Existence of injective map b : Θ → B ⊂ Rkη and function
ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing,
such that
Pθ [d(η(Z), b(θ))  u] ≤ c(θ)ρn(u),
Z
Θ
c(θ)dΠ(θ)  ∞
and assume either
(i) Polynomial deviations:
(ii) Exponential deviations: existence of hθ(·)  0 such
that Pθ[d(η(z), b(θ))  u] ≤ c(θ)e−hθ(uvn) and
existence of m, C  0 such that
Z
Θ
c(θ)e−hθ(uvn)
dΠ(θ) ≤ Ce−m·(uvn)τ
, for u ≤ u0.
ABC concentration under misspecification
Assumptions
[A2] existence of D  0 and M0, δ0  0 such that, for all
δ0 ≥ δ  0 and M ≥ M0, existence of
Sδ ⊂ {θ ∈ Θ : d(b(θ), b0) − ε∗ ≤ δ} for which
(i) In case (i), D  κ and
R
Sδ

1 − c(θ)
M

dΠ(θ) ≳ δD.
(ii) In case (ii),
R
Sδ
1 − c(θ)e−hθ(M)

dΠ(θ) ≳ δD.
Consistency
Assume [A0] – [A2], with εn ↓ ε∗ with
εn ≥ ε∗
+ Mv−1
n + v−1
0,n,
for M large enough. Let Mn ↑ ∞ and δn ≥ Mn(εn − ε∗), then
Πε
h
d(b(θ), b0) ≥ ε∗
+ δn | η(xobs
)
i
= oP0
(1),
1. if δn ≥ Mnv−1
n u
−D/κ
n = o(1) in case (i)
2. if δn ≥ Mnv−1
n | log(un)|1/τ = o(1) in case (ii)
with un = εn − (ε∗ + Mv−1
n + v−1
0,n) ≥ 0 .
[Bernton et al., 2017; Frazier et al., 2020]
Regression adjustement under misspecification
Accepted value θ artificially related to η(xobs) and η(z) through
local linear regression model
θ′
= µ + β⊺
{η(xobs
) − η(z)} + ν,
where νi model residual
[Beaumont et al., 2003]
Asymptotic behavior of ABC-Reg posterior
e
Πε[· | η(xobs
)]
determined by behavior of
Πε[· | η(xobs
)], ^
β, and {η(xobs
) − η(z)}
Regression adjustement under misspecification
Accepted value θ artificially related to η(xobs) and η(z) through
local linear regression model
θ′
= µ + β⊺
{η(xobs
) − η(z)} + ν,
where νi model residual
[Beaumont et al., 2003]
Asymptotic behavior of ABC-Reg posterior
e
Πε[· | η(xobs
)]
determined by behavior of
Πε[· | η(xobs
)], ^
β, and {η(xobs
) − η(z)}
Regression adjustement under misspecification
▶ ABC-Reg takes draws of asymptotically optimal θ,
perturbed in a manner that need not preserve original
optimality
▶ for ∥β0∥ large, pseudo-true value θ̃∗ possibly outside Θ
▶ extends to nonlinear regression adjustments [Blum
 François, 2010]
▶ potential correction of the adjustment [Frazier et al., 2020]
▶ local regression adjustments with smaller posterior
variability than ABC-AR but fake precision
Advanced topics
5 Advanced topics
big data issues
big parameter issues
Computational bottleneck
Time per iteration increases with sample size n of the data:
cost of sampling O(n1+?) associated with a reasonable
acceptance probability makes ABC infeasible for large datasets
▶ surrogate models to get samples (e.g., using copulas)
▶ direct sampling of summary statistics (e.g., synthetic
likelihood)
[Wood, 2010]
▶ borrow from proposals for scalable MCMC (e.g., divide
 conquer)
Approximate ABC [AABC]
Idea approximations on both parameter and model spaces by
resorting to bootstrap techniques.
[Buzbas  Rosenberg, 2015]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
▶ If m too small, prior predictive sample may miss
informative parameters
▶ large error and misleading representation of true posterior
▶ only suited for models with very few parameters
Approximate ABC [AABC]
Procedure scheme
1. Sample (θi, xi), i = 1, . . . , m, from prior predictive
2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i)
simulated under k-closest θi to θ∗
3. Generate dataset x∗ as bootstrap weighted sample from
(x(1), . . . , x(k))
Drawbacks
▶ If m too small, prior predictive sample may miss
informative parameters
▶ large error and misleading representation of true posterior
▶ only suited for models with very few parameters
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelmé  Chopin, 2014]
Divide-and-conquer perspectives
1. divide the large dataset into smaller batches
2. sample from the batch posterior
3. combine the result to get a sample from the targeted
posterior
Alternative via ABC-EP
[Barthelmé  Chopin, 2014]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
Y
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
Y
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Drawback require sampling from f(· | θ)B by ABC means.
Should be feasible for latent variable (z) representations when
f(x | z, θ) available in closed form
[Doucet  Robert, 2001]
Geometric combination: WASP
Subset posteriors given partition xobs
[1] , . . . , xobs
[B] of observed
data xobs, let define
π(θ | xobs
[b] ) ∝ π(θ)
Y
j∈[b]
f(xobs
j | θ)B
.
[Srivastava et al., 2015]
Subset posteriors are combined via Wasserstein barycenter
[Cuturi, 2014]
Alternative backfeed subset posteriors as priors to other
subsets, partitioning summaries
Consensus ABC
Naı̈ve scheme
▶ For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior π̃(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^
π(· | d(S(xobs
[b] ), S(x[b]))  ε)
3. Compute sample posterior variance Σ−1
b
▶ Combine batch posterior approximations
θj =
B
X
b=1
Σbθ
[b]
j
. B
X
b=1
Σb
Remark Diffuse prior π̃(·) non informative calls for
ABC-MCMC steps
Consensus ABC
Naı̈ve scheme
▶ For each data batch b = 1, . . . , B
1. Sample (θ
[b]
1 , . . . , θ
[b]
n ) from diffused prior π̃(·) ∝ π(·)1/B
2. Run ABC to sample from batch posterior
^
π(· | d(S(xobs
[b] ), S(x[b]))  ε)
3. Compute sample posterior variance Σ−1
b
▶ Combine batch posterior approximations
θj =
B
X
b=1
Σbθ
[b]
j
. B
X
b=1
Σb
Remark Diffuse prior π̃(·) non informative calls for
ABC-MCMC steps
big parameter issues
5 Advanced topics
big data issues
big parameter issues
Big parameter issues
Curse of dimension: as dim(Θ) = kθ increases
▶ exploration of parameter space gets harder
▶ summary statistic η forced to increase, since at least of
dimension kη ≥ dim(Θ)
▶ acceptance probability decreases
Big parameter issues
Curse of dimension: as dim(Θ) = kθ increases
▶ exploration of parameter space gets harder
▶ summary statistic η forced to increase, since at least of
dimension kη ≥ dim(Θ)
▶ acceptance probability decreases
For a given tolerance εn, average acceptance probability
αn =
Z
π(θ)Pθ (|η(y) − η(z)| ≤ εn) dθ ≍ Lnεd
n
[Lawlesss et al., 2023]
Big parameter issues
Curse of dimension: as dim(Θ) = kθ increases
▶ exploration of parameter space gets harder
▶ summary statistic η forced to increase, since at least of
dimension kη ≥ dim(Θ)
▶ acceptance probability decreases
Some solutions
▶ adopt more local algorithms like ABC-MCMC or
ABC-SMC
▶ aim at posterior marginals and approximate joint posterior
by copula
[Li et al., 2016]
▶ run ABC-Gibbs
[Clarté et al., 2016]
Example 11: Hierarchical MA(2)
▶ xi
iid
∼ MA2(µi, σi)
▶ µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with
(βi,1, βi,2, βi,3)
iid
∼ Dir(α1, α2, α3)
▶ σi
iid
∼ IG(σ1, σ2)
▶ α = (α1, α2, α3), with prior E(1)⊗3
▶ σ = (σ1, σ2) with prior C(1)+⊗2
α
µ1 µ2 µn
. . .
x1 x2 xn
. . .
σ
σ1 σ2 σn
. . .
Example 11: Hierarchical MA(2)
© Based on 106 prior and 103 posteriors simulations, 3n
summary statistics, and series of length 100, ABC-Rej posterior
hardly distinguishable from prior!
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x⋆, sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide  conquer:
▶ one tolerance εj for each parameter θj
▶ one statistic sj for each parameter θj
[Clarté et al., 2019]
ABC-Gibbs
When parameter decomposed into θ = (θ1, . . . , θn)
Algorithm ABC-Gibbs sampler
starting point θ(0) = (θ
(0)
1 , . . . , θ
(0)
n ), observation xobs
for i = 1, . . . , N do
for j = 1, . . . , n do
θ
(i)
j ∼ πεj
(· | x⋆, sj, θ
(i)
1 , . . . , θ
(i)
j−1, θ
(i−1)
j+1 , . . . , θ
(i−1)
n )
end for
end for
Divide  conquer:
▶ one tolerance εj for each parameter θj
▶ one statistic sj for each parameter θj
[Clarté et al., 2019]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x⋆
) | α, µ).
conditionals are incompatible
▶ ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
▶ potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
[Clarté et al., 2016]
Compatibility
When using ABC-Gibbs conditionals with different acceptance
events, e.g., different statistics
π(α)π(sα(µ) | α) and π(µ)f(sµ(x⋆
) | α, µ).
conditionals are incompatible
▶ ABC-Gibbs does not necessarily converge (even for
tolerance equal to zero)
▶ potential limiting distribution
not a genuine posterior (double use of data)
unknown [except for a specific version]
possibly far from genuine posterior(s)
[Clarté et al., 2016]
Convergence
In hierarchical case n = 2,
Theorem If there exists 0  κ  1/2 such that
sup
θ1,θ̃1
∥πε2
(· | x⋆
, s2, θ1) − πε2
(· | x⋆
, s2, θ̃1)∥TV = κ
ABC-Gibbs Markov chain geometrically converges in total
variation to stationary distribution νε, with geometric rate
1 − 2κ.
Example 11: Hierarchical MA(2)
Separation from the prior for identical number of simulations
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
alternative ABC based on:
π̃(α, µ | x⋆
) ∝ π(α)q(µ)
Z generate a new µ
z }| {
π(µ̃ | α)1η(sα(µ),sα(µ̃))εα
dµ̃
×
Z
f(x̃ | µ)π(x⋆
| µ)1η(sµ(x⋆),sµ(x̃))εµ
dx̃,
with q arbitrary distribution on µ
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
induces full conditionals
π̃(α | µ) ∝ π(α)
Z
π(µ̃ | α)1η(sα(µ),sα(µ̃))εα
dx̃
and
π̃(µ | α, x⋆
) ∝ q(µ)
Z
π(µ̃ | α)1η(sα(µ),sα(µ̃))εα
dµ̃
×
Z
f(x̃ | µ)π(x⋆
| µ)1η(sµ(x⋆),sµ(x̃))εµ
dx̃
now compatible with new artificial joint
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
that is,
▶ prior simulations of α ∼ π(α) and of µ̃ ∼ π(µ̃ | α) until
η(sα(µ), sα(µ̃))  εα
▶ simulation of µ from instrumental q(µ) and of auxiliary
variables µ̃ and x̃ until both constraints satisfied
Explicit limiting distribution
For model
xj | µj ∼ π(xj | µj) , µj | α
i.i.d.
∼ π(µj | α) , α ∼ π(α)
Resulting Gibbs sampler stationary for posterior proportional to
π(α, µ) q(sα(µ))
| {z }
projection
f(sµ(x⋆
) | µ)
| {z }
projection
that is, for likelihood associated with sµ(x⋆) and prior
distribution proportional to π(α, µ)q(sα(µ)) [exact!]
Thank you Gael!!!
And all the best for the next move!
ERC postdoc position
Ongoing 2023-2030 ERC funding for PhD positions and
postdoctoral collaborations with
▶ Michael Jordan (more Paris than
Berkeley)
▶ Eric Moulines (Paris)
▶ Gareth Roberts (Warwick)
▶ myself (more Paris than Warwick)
on OCEAN (On IntelligenCE And Networks) project

More Related Content

Similar to Workshop in honour of Don Poskitt and Gael Martin

ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Christian Robert
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Christian Robert
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?Christian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesFrank Nielsen
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelMatt Moores
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Pre-computation for ABC in image analysis
Pre-computation for ABC in image analysisPre-computation for ABC in image analysis
Pre-computation for ABC in image analysisMatt Moores
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)Christian Robert
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013Christian Robert
 

Similar to Workshop in honour of Don Poskitt and Gael Martin (20)

ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
ABC workshop: 17w5025
ABC workshop: 17w5025ABC workshop: 17w5025
ABC workshop: 17w5025
 
Intro to ABC
Intro to ABCIntro to ABC
Intro to ABC
 
Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]Inference in generative models using the Wasserstein distance [[INI]
Inference in generative models using the Wasserstein distance [[INI]
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
Patch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective DivergencesPatch Matching with Polynomial Exponential Families and Projective Divergences
Patch Matching with Polynomial Exponential Families and Projective Divergences
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Pre-computation for ABC in image analysis
Pre-computation for ABC in image analysisPre-computation for ABC in image analysis
Pre-computation for ABC in image analysis
 
Boston talk
Boston talkBoston talk
Boston talk
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
RSS Annual Conference, Newcastle upon Tyne, Sept. 03, 2013
 

More from Christian Robert

Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihoodChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussionChristian Robert
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceChristian Robert
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018Christian Robert
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimationChristian Robert
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerChristian Robert
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsChristian Robert
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplerChristian Robert
 

More from Christian Robert (18)

discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 
CISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergenceCISEA 2019: ABC consistency and convergence
CISEA 2019: ABC consistency and convergence
 
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsa discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment models
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018short course at CIRM, Bayesian Masterclass, October 2018
short course at CIRM, Bayesian Masterclass, October 2018
 
prior selection for mixture estimation
prior selection for mixture estimationprior selection for mixture estimation
prior selection for mixture estimation
 
Coordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like samplerCoordinate sampler: A non-reversible Gibbs-like sampler
Coordinate sampler: A non-reversible Gibbs-like sampler
 
ABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified modelsABC convergence under well- and mis-specified models
ABC convergence under well- and mis-specified models
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
comments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle samplercomments on exponential ergodicity of the bouncy particle sampler
comments on exponential ergodicity of the bouncy particle sampler
 

Recently uploaded

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

Recently uploaded (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

Workshop in honour of Don Poskitt and Gael Martin

  • 1. Asymptotics of ABC Christian P. Robert U. Paris Dauphine PSL & Warwick U. Workshop in honour of Prof. Donald Poskitt and Gael Martin, 29 Nov 2023
  • 2. Co-authors ▶ Jean-Marie Cornuet, Arnaud Estoup, Jean-Michel Marin (Montpellier) ▶ David Frazier, Gael Martin (Monash U) ▶ Kerrie Mengersen (QUT) ▶ Espen Bernton, Pierre Jacob, Natesh Pillai (Harvard U) ▶ Caroline Lawless, Judith Rousseau (U Oxford) ▶ Grégoire Clarté, Robin Ryder, Julien Stoehr (U Paris Dauphine)
  • 4. Outline 1 Motivation 2 Concept 3 Implementation 4 Convergence 5 Advanced topics
  • 5. Posterior distribution Central concept of Bayesian inference: π( θ |{z} parameter | xobs |{z} observation ) proportional z}|{ ∝ π(θ) |{z} prior × L(θ | xobs ) | {z } likelihood Drives ▶ derivation of optimal decisions ▶ assessment of uncertainty ▶ model selection ▶ prediction [McElreath, 2015]
  • 6. Monte Carlo representation Exploration of Bayesian posterior π(θ|xobs) may (!) require to produce sample θ1, . . . , θT distributed from π(θ|xobs) (or asymptotically by Markov chain Monte Carlo aka MCMC) 1984 1986/1989 1987 1990 1994 1999/2003 Kloek & van Dijk 1978 R M(θ)I(θ)dθ = E[M(θ)] Geman & Geman Tierney & Kadane Tanner & Wong b π(θ|y) = m−1 Pm j=1 p θ | z(j) , y Gelfand Smith [X | Y, Z] [Y | Z, X] [Z | X, Y ] Carter Kohn/Frühwirth-Schnatter Roberts Rosenthal/Neal [McElreath, 2015]
  • 7. From then ’til now! [Martin et al., 2023-2024]
  • 8. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: ▶ demarginalisation (latent variables) ▶ exchange algorithm (auxiliary variables) ▶ pseudo-marginal (unbiased estimator)
  • 9. Difficulties MCMC = workhorse of practical Bayesian analysis (BUGS, JAGS, Stan, tc.), except when product π(θ) × L(θ | xobs ) well-defined but numerically unavailable or too costly to compute Only partial solutions are available: ▶ demarginalisation (latent variables) ▶ exchange algorithm (auxiliary variables) ▶ pseudo-marginal (unbiased estimator)
  • 10. Example 1: Dynamic mixture Mixture model {1 − wµ,τ(x)}fβ,λ(x) + wµ,τ(x)gε,σ(x) x 0 where ▶ fβ,λ Weibull density, ▶ gε,σ generalised Pareto density, and ▶ wµ,τ Cauchy (arctan) cdf Intractable normalising constant C(µ, τ, β, λ, ε, σ) = Z∞ 0 {(1 − wµ,τ(x))fβ,λ(x) + wµ,τ(x)gε,σ(x)} dx [Frigessi, Haug Rue, 2002]
  • 11. Example 2: truncated Normal Given complex set A ⊂ Rk (with k large), truncated Normal model f(x | µ, Σ, A) ∝ exp{−(x − µ)T Σ−1 (x − µ)/2} IA(x) with intractable normalising constant C(µ, Σ, A) = Z A exp{−(x − µ)T Σ−1 (x − µ)/2}dx
  • 12. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^ µn = med(x1, . . . , xn) and ^ σn = mad(x1, . . . , xn) = med |xi − ^ µn| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^ µn, ^ σn) straightforward
  • 13. Example 3: robust Normal statistics Normal sample x1, . . . , xn ∼ N(µ, σ2 ) summarised into (insufficient) ^ µn = med(x1, . . . , xn) and ^ σn = mad(x1, . . . , xn) = med |xi − ^ µn| Under a conjugate prior π(µ, σ2), posterior close to intractable. but simulation of (^ µn, ^ σn) straightforward
  • 15. Example 4: exponential random graph ERGM: binary random vector x indexed by all edges on set of nodes plus graph f(x | θ) = 1 C(θ) exp(θT S(x)) with S(x) vector of statistics and C(θ) intractable normalising constant [Grelaud al., 2009; Everitt, 2012; Bouranis al., 2017]
  • 16. Realistic[er] applications ▶ Kingman’s coalescent in population genetics [Tavaré et al., 1997; Beaumont et al., 2003] ▶ α-stable distributions [Peters et al, 2012] ▶ complex networks [Dutta et al., 2018] ▶ astrostatistics cosmostatistics [Cameron Pettitt, 2012; Ishida et al., 2015] ▶ state space models [Martin et al., 2019]
  • 17. Auxiliary likelihood-based ABC in state space models [Martin et al., 2019]
  • 18. Concept 2 Concept algorithm summary statistics random forests calibration of tolerance 3 Implementation 4 Convergence 5 Advanced topics
  • 19. A?B?C? ▶ A stands for approximate [wrong likelihood] ▶ B stands for Bayesian [right prior] ▶ C stands for computation [producing a parameter sample]
  • 20. A?B?C? ▶ Rough version of the data [from dot to ball] ▶ Non-parametric approximation of the likelihood [near actual observation] ▶ Use of non-sufficient statistics [dimension reduction] ▶ Monte Carlo error [and no unbiasedness]
  • 21. ABC+ timeframe 1999 2003/2009 2009 2011 2015+ Tavaré et al./Pritchard et al. 1997/1999 pic Jordan et al. ELBO(q) = E[log p(θ, y)] − E[log q(θ] Beaumont/Andrieu Roberts α θn, θ0 = min 1, π̂θ0 π̂θn q(θn|θ0 ) q(θ0|θn) Rue et al. π̃ (θ|y) ∝ π(x,θ,y) πG(x|θ,y) |x=x∗(θ) Neal/Hoffman Gelman pic Hybrid Approximate Methods pA(θ|y) [Martin et al., 2023]
  • 22. A seemingly naı̈ve representation When likelihood f(x|θ) not in closed form, likelihood-free rejection technique: ABC-AR algorithm For an observation xobs ∼ f(x|θ), under the prior π(θ), keep jointly simulating θ′ ∼ π(θ) , z ∼ f(z|θ′ ) , until the auxiliary variable z is equal to the observed value, z = xobs [Diggle Gratton, 1984; Rubin, 1984; Tavaré et al., 1997]
  • 23. Why does it work? The mathematical proof is trivial: f(θi) ∝ X z∈D π(θi)f(z|θi)Iy(z) ∝ π(θi)f(y|θi) = π(θi|y) [Accept–Reject 101] But very impractical when Pθ(Z = xobs ) ≈ 0
  • 24. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ≤ ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) ε} def ∝ π(θ|ρ(xobs , z) ε) [Pritchard et al., 1999]
  • 25. A as approximative When y is a continuous random variable, strict equality z = xobs is replaced with a tolerance zone ρ(xobs , z) ≤ ε where ρ is a distance Output distributed from π(θ) Pθ{ρ(xobs , z) ε} def ∝ π(θ|ρ(xobs , z) ε) [Pritchard et al., 1999]
  • 26. ABC algorithm Algorithm Likelihood-free rejection sampler for i = 1 to N do repeat generate θ′ from prior π(·) generate z from sampling density f(·|θ′) until ρ{η(z), η(xobs)} ≤ ε set θi = θ′ end for where η(xobs) defines a (not necessarily sufficient) statistic Custom: η(xobs) called summary statistic
  • 27. Example 3: robust Normal statistics mu=rnorm(N-1e6) #prior sig=sqrt(rgamma(N,2,2)) medobs=median(obs) madobs=mad(obs) #summary for(t in diz-1:N){ psud=rnorm(1e2)/sig[t]+mu[t] medpsu=median(psud)-medobs madpsu=mad(psud)-madobs diz[t]=medpsu^2+madpsu^2} #ABC subsample subz=which(dizquantile(diz,.1))
  • 28. Exact ABC posterior Algorithm samples from marginal in z of [exact] posterior πABC ε (θ, z|xobs ) = π(θ)f(z|θ)IAε,xobs (z) R Aε,xobs ×Θ π(θ)f(z|θ)dzdθ , where Aε,xobs = {z ∈ D|ρ{η(z), η(xobs)} ε}. Intuition that proper summary statistics coupled with small tolerance ε = εη should provide good approximation of the posterior distribution: πABC ε (θ|xobs ) = Z πABC ε (θ, z|xobs )dz ≈ π{θ|η(xobs )}
  • 29. Exact ABC posterior Algorithm samples from marginal in z of [exact] posterior πABC ε (θ, z|xobs ) = π(θ)f(z|θ)IAε,xobs (z) R Aε,xobs ×Θ π(θ)f(z|θ)dzdθ , where Aε,xobs = {z ∈ D|ρ{η(z), η(xobs)} ε}. Intuition that proper summary statistics coupled with small tolerance ε = εη should provide good approximation of the posterior distribution: πABC ε (θ|xobs ) = Z πABC ε (θ, z|xobs )dz ≈ π{θ|η(xobs )}
  • 30. summary statistics 2 Concept algorithm summary statistics random forests calibration of tolerance 3 Implementation 4 Convergence 5 Advanced topics
  • 31. Why summaries? ▶ reduction of dimension ▶ improvement of signal to noise ratio ▶ reduce tolerance ε considerably ▶ whole data may be unavailable (as in Example 3) medobs=median(obs) madobs=mad(obs) #summary
  • 32. Example 6: MA inference Moving average model MA(2): xt = εt + θ1εt−1 + θ2εt−2 εt iid ∼ N(0, 1) Comparison of raw series:
  • 33. Example 6: MA inference Moving average model MA(2): xt = εt + θ1εt−1 + θ2εt−2 εt iid ∼ N(0, 1) [Feller, 1970] Comparison of acf’s:
  • 34. Example 6: MA inference Summary vs. raw:
  • 35. Why not summaries? ▶ loss of sufficient information when πABC(θ|xobs) replaced with πABC(θ|η(xobs)) ▶ arbitrariness of summaries ▶ uncalibrated approximation ▶ whole data may be available (at same cost as summaries) ▶ (empirical) distributions may be compared (Wasserstein distances) [Bernton et al., 2019]
  • 36. ABC with Wasserstein distances [Bernton et al., 2019]
  • 37. ABC with Wasserstein distances Use as distance between simulated and observed samples the Wasserstein distance: Wp(y1:n, z1:n)p = inf σ∈Sn 1 n n X i=1 ρ(yi, zσ(i))p , ▶ covers well- and mis-specified cases ▶ only depends on data space distance ρ(·, ·) ▶ covers iid and dynamic models (curve matching) ▶ computionally feasible (linear in dimension, cubic in sample size) ▶ Hilbert curve approximation in higher dimensions [Bernton et al., 2019]
  • 38. Consistent inference with Wasserstein distance As ε → 0 [and n fixed] If either 1. f (n) θ is n-exchangeable and D(y1:n, z1:n) = 0 if and only if z1:n = yσ(1:n) for some σ ∈ Sn, or 2. D(y1:n, z1:n) = 0 if and only if z1:n = y1:n. then, at y1:n fixed, ABC posterior converges strongly to original posterior as ε → 0. [Bernton et al., 2019]
  • 39. Consistent inference with Wasserstein distance As n → ∞ [at ε fixed] WABC distribution with a fixed ε does not converge in n to a Dirac mass [Bernton et al., 2019]
  • 40. Consistent inference with Wasserstein distance As εn → 0 and n → ∞ Under range of assumptions, if fn(εn) → 0, and P(W(^ µn, µ⋆) ≤ εn) → 1, then WABC posterior with threshold εn + ε⋆ satisfies πεn+ε⋆ {θ ∈ H : W(µ⋆, µθ) ε⋆ + 4εn/3 + f−1 n (εL n/R)} |y1:n P ≤ δ [Bernton et al., 2019]
  • 41. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field]
  • 42. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 1. Starting from large collection of summary statistics, Joyce and Marjoram (2008) consider the sequential inclusion into the ABC target, with a stopping rule based on a likelihood ratio test
  • 43. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 2. Based on decision-theoretic principles, Fearnhead and Prangle (2012) end up with E[θ|xobs] as the optimal summary statistic
  • 44. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 3. Use of indirect inference by Drovandi, Pettit, Paddy (2011) with estimators of parameters of auxiliary model as summary statistics
  • 45. Summary selection strategies Fundamental difficulty of selecting summary statistics when there is no non-trivial sufficient statistic [except when done by experimenters from the field] 4. Starting from large collection of summary statistics, Raynal al. (2018, 2019) rely on random forests to build estimators and select summaries
  • 46. Semi-automated ABC Use of summary statistic η(·), importance proposal g(·), kernel K(·) ≤ 1 with bandwidth h ↓ 0 such that (θ, z) ∼ g(θ)f(z|θ) accepted with probability (hence the bound) K[{η(z) − η(xobs )}/h] and the corresponding importance weight defined by π(θ) g(θ) Theorem Optimality of posterior expectation E[θ|xobs] of parameter of interest as summary statistics η(xobs) [Fearnhead Prangle, 2012; Sisson et al., 2019]
  • 47. Semi-automated ABC Use of summary statistic η(·), importance proposal g(·), kernel K(·) ≤ 1 with bandwidth h ↓ 0 such that (θ, z) ∼ g(θ)f(z|θ) accepted with probability (hence the bound) K[{η(z) − η(xobs )}/h] and the corresponding importance weight defined by π(θ) g(θ) Theorem Optimality of posterior expectation E[θ|xobs] of parameter of interest as summary statistics η(xobs) [Fearnhead Prangle, 2012; Sisson et al., 2019]
  • 48. random forests 2 Concept algorithm summary statistics random forests calibration of tolerance 3 Implementation 4 Convergence 5 Advanced topics
  • 49. Random forests Technique that stemmed from Leo Breiman’s bagging (or bootstrap aggregating) machine learning algorithm for both classification [testing] and regression [estimation] [Breiman, 1996] Improved performances by averaging over classification schemes of randomly generated training sets, creating a “forest” of (CART) decision trees, inspired by Amit and Geman (1997) ensemble learning [Breiman, 2001]
  • 50. Growing the forest Breiman’s solution for inducing random features in the trees of the forest: ▶ boostrap resampling of the dataset and ▶ random subset-ing [of size √ t] of the covariates driving the classification or regression at every node of each tree Covariate (summary) xτ that drives the node separation xτ ≷ cτ and the separation bound cτ chosen by minimising entropy or Gini index
  • 51. ABC with random forests Idea: Starting with ▶ possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ▶ ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi)
  • 52. ABC with random forests Idea: Starting with ▶ possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ▶ ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Average of the trees is resulting summary statistics, highly non-linear predictor of the model index
  • 53. ABC with random forests Idea: Starting with ▶ possibly large collection of summary statistics (η1, . . . , ηp) (from scientific theory input to available statistical softwares, methodologies, to machine-learning alternatives) ▶ ABC reference table involving model index, parameter values and summary statistics for the associated simulated pseudo-data run R randomforest to infer M or θ from (η1i, . . . , ηpi) Potential selection of most active summaries, calibrated against pure noise
  • 54. Classification of summaries by random forests Given large collection of summary statistics, rather than selecting a subset and excluding the others, estimate each parameter by random forests ▶ handles thousands of predictors ▶ ignores useless components ▶ fast estimation with good local properties ▶ automatised with few calibration steps ▶ substitute to Fearnhead and Prangle (2012) preliminary estimation of ^ θ(yobs) ▶ includes a natural (classification) distance measure that avoids choice of either distance or tolerance [Marin et al., 2016, 2018]
  • 55. calibration of tolerance 2 Concept algorithm summary statistics random forests calibration of tolerance 3 Implementation 4 Convergence 5 Advanced topics
  • 56. Calibration of tolerance Calibration of threshold ε ▶ from scratch [how small is small?] ▶ from k-nearest neighbour perspective [quantile of prior predictive] subz=which(dizquantile(diz,.1)) ▶ from asymptotics [convergence speed] ▶ related with choice of distance [automated selection by random forests] [Fearnhead Prangle, 2012; Biau et al., 2013; Liu Fearnhead 2018]
  • 58. Sofware Several ABC R packages for performing parameter estimation and model selection [Nunes Prangle, 2017]
  • 59. ABC-IS Basic ABC algorithm limitations ▶ blind [no learning] ▶ inefficient [curse of dimension] ▶ inapplicable to improper priors Importance sampling version ▶ importance density g(θ) ▶ bounded kernel function Kh with bandwidth h ▶ acceptance probability of Kh{ρ[η(xobs ), η(x{θ})]} π(θ) g(θ) max θ Aθ [Fearnhead Prangle, 2012]
  • 60. ABC-IS Basic ABC algorithm limitations ▶ blind [no learning] ▶ inefficient [curse of dimension] ▶ inapplicable to improper priors Importance sampling version ▶ importance density g(θ) ▶ bounded kernel function Kh with bandwidth h ▶ acceptance probability of Kh{ρ[η(xobs ), η(x{θ})]} π(θ) g(θ) max θ Aθ [Fearnhead Prangle, 2012]
  • 61. ABC-MCMC Markov chain (θ(t)) created via transition function θ(t+1) =        θ′ ∼ Kω(θ′|θ(t)) if x ∼ f(x|θ′) is such that x ≈ y and u ∼ U(0, 1) ≤ π(θ′)Kω(θ(t)|θ′) π(θ(t))Kω(θ′|θ(t)) , θ(t) otherwise, has the posterior π(θ|y) as stationary distribution [Marjoram et al, 2003]
  • 62. ABC-MCMC Algorithm Likelihood-free MCMC sampler get (θ(0), z(0)) by Algorithm 1 for t = 1 to N do generate θ′ from Kω ·|θ(t−1) , z′ from f(·|θ′), u from U[0,1], if u ≤ π(θ′)Kω(θ(t−1)|θ′) π(θ(t−1)Kω(θ′|θ(t−1)) IAε,xobs (z′) then set (θ(t), z(t)) = (θ′, z′) else (θ(t), z(t))) = (θ(t−1), z(t−1)), end if end for
  • 63. ABC-PMC Generate a sample at iteration t by ^ πt(θ(t) ) ∝ N X j=1 ω (t−1) j Kt(θ(t) |θ (t−1) j ) modulo acceptance of the associated xt, with tolerance εt ↓, and use importance weight associated with accepted simulation θ (t) i ω (t) i ∝ π(θ (t) i ) ^ πt(θ (t) i ) © Still likelihood free [Sisson et al., 2007; Beaumont et al., 2009]
  • 64. ABC-SMC Use of a kernel Kt associated with target πεt and derivation of the backward kernel Lt−1(z, z′ ) = πεt (z′)Kt(z′, z) πεt (z) Update of the weights ω (t) i ∝ ω (t−1) i PM m=1 IAεt (x (t) im) PM m=1 IAεt−1 (x (t−1) im ) when x (t) im ∼ Kt(x (t−1) i , ·) [Del Moral, Doucet Jasra, 2009]
  • 65. ABC-NP Better usage of [prior] simulations by adjustement: instead of throwing away θ′ such that ρ(η(z), η(xobs)) ε, replace θ’s with locally regressed transforms θ∗ = θ − {η(z) − η(xobs )}T ^ β [Csilléry et al., TEE, 2010] where ^ β is obtained by [NP] weighted least square regression on (η(z) − η(xobs)) with weights Kδ ρ(η(z), η(xobs )) [Beaumont et al., 2002, Genetics]
  • 66. ABC-NN Incorporating non-linearities and heterocedasticities: θ∗ = ^ m(η(xobs )) + [θ − ^ m(η(z))] ^ σ(η(xobs)) ^ σ(η(z)) where ▶ ^ m(η) estimated by non-linear regression (e.g., neural network) ▶ ^ σ(η) estimated by non-linear regression on residuals log{θi − ^ m(ηi)}2 = log σ2 (ηi) + ξi [Blum François, 2009; Li Fearnhead, 2018; Lawless et al., 2023]
  • 67. Convergence 4 Convergence ABC inference consistency ABC model choice consistency ABC under misspecification 5 Advanced topics
  • 68. Convergence 4 Convergence ABC inference consistency ABC model choice consistency ABC under misspecification 5 Advanced topics
  • 69. Asymptotics of ABC Since πABC(· | xobs) is an approximation of π(· | xobs) or π(· | η(xobs)) coherence of ABC-based inference need be established on its own [Li Fearnhead, 2018a,b; Frazier et al., 2018,2020] Meaning ▶ establishing large sample (n) properties of ABC posteriors and ABC procedures ▶ finding sufficient conditions and checks on summary statistics η(·) ▶ determining proper rate ε = εn of convergence of tolerance to 0 ▶ [mostly] ignoring Monte Carlo errors
  • 70. Consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ 0, Π ∥θ − θ0∥ δ | ρ{η(xobs ), η(Z)} ≤ ε → 0 as n → +∞, ε → 0 Bayesian consistency implies that sets containing θ0 have posterior probability tending to one as n → +∞, with implication being the existence of a specific rate of concentration
  • 71. Consistency of ABC posteriors ABC algorithm Bayesian consistent at θ0 if for any δ 0, Π ∥θ − θ0∥ δ | ρ{η(xobs ), η(Z)} ≤ ε → 0 as n → +∞, ε → 0 ▶ Concentration around true value and Bayesian consistency impose less stringent conditions on the convergence speed of tolerance εn to zero, when compared with asymptotic normality of ABC posterior ▶ asymptotic normality of ABC posterior mean does not require asymptotic normality of ABC posterior
  • 72. Asymptotic setup Assumptions: ▶ asymptotic: xobs = xobs(n) ∼ Pn θ0 and ε = εn, n → +∞ ▶ parametric: θ ∈ Rk, k fixed concentration of summary statistic η(zn): ∃b : θ → b(θ) η(zn ) − b(θ) = oPθ (1), ∀θ ▶ identifiability of parameter b(θ) ̸= b(θ′) when θ ̸= θ′
  • 73. Consistency of ABC posteriors ▶ Concentration of summary η(z): there exists b(θ) such that η(z) − b(θ) = oPθ (1) ▶ Consistency: Πεn ∥θ − θ0∥ ≤ δ | η(xobs ) = 1 + op(1) ▶ Convergence rate: there exists δn = o(1) such that Πεn ∥θ − θ0∥ ≤ δn | η(xobs ) = 1 + op(1)
  • 74. Consistency of ABC posteriors ▶ Consistency: Πεn ∥θ − θ0∥ ≤ δ | η(xobs ) = 1 + op(1) ▶ Convergence rate: there exists δn = o(1) such that Πεn ∥θ − θ0∥ ≤ δn | η(xobs ) = 1 + op(1) ▶ Point estimator consistency ^ θε = EABC[θ|η(xobs(n) )], EABC[θ | η(xobs(n) )] − θ0 = op(1) vn(EABC[θ | η(xobs(n) )] − θ0) ⇒ N(0, v)
  • 75. Asymptotic shape of posterior distribution Shape of Π · | ∥η(xobs ), η(z)∥ ≤ εn depending on relation between εn and rate σn at which η(xobsn ) satisfy CLT Three different regimes: 1. σn = o(εn) −→ Uniform limit 2. σn ≍ εn −→ perturbated Gaussian limit 3. σn ≫ εn −→ Gaussian limit
  • 76. Curse of dimension ▶ for reasonable statistical behavior, decline of acceptance αn the faster the larger the dimension of θ, kθ, but unaffected by dimension of η, kη ▶ theoretical justification for dimension reduction methods that process parameter components individually and independently of other components [Fearnhead Prangle, 2012; Martin al., 2019] ▶ importance sampling approach of Li Fearnhead (2018a) yields acceptance rates αn = O(1), when εn = O(1/vn)
  • 77. Monte Carlo error Link the choice of εn to Monte Carlo error associated with Nn draws in ABC Algorithm Conditions (on εn) under which ^ αn = αn{1 + op(1)} where ^ αn = PNn i=1 I [d{η(y), η(z)} ≤ εn] /Nn proportion of accepted draws from Nn simulated draws of θ Either (i) εn = o(v−1 n ) and (vnεn)−kη ε−kθ n ≤ MNn or (ii) εn ≳ v−1 n and ε−kθ n ≤ MNn for M large enough
  • 78. What if rates differ? [Lawless et al., 2023]
  • 79. What if rates differ? Different convergence rates for summaries, i.e., for all 1 ≤ j ≤ k, existence of sequence (vnj)n such that vnj (b(θ)j − η(z)j) = OP(1) and tolerance sequence (εn)n satisfies lim n→∞ vnjεn = 0 ∀1 ≤ j ≤ k0 lim n→∞ vnjεn = ∞ ∀k0 + 1 ≤ j ≤ k called slow and fast components, resp. [Lawless et al., 2023]
  • 80. What if rates differ? Under some assumptions slightly weaker than those of Frazier et al. (2018) and Li Fearnhead (2018), 1. ABC posterior distribution concentrates: for a sequence (λn)n with λn ↓ 0 Z |θ−θ0|≥λn πεn (θ|η(y))dθ = oP(1) [Lawless et al., 2023]
  • 81. What if rates differ? Under some assumptions slightly weaker than those of Frazier et al. (2018) and Li Fearnhead (2018), 2. Asymptotically, πεn (θ|η(y)) ∼ ∝ I{|∇b(2)(θ0)(θ−θ0)|≤εn} 1 − |∇b(2)(θ0)(θ−θ0)|2 /ε2 n k0/2 , where b(2) = (b(k0+1), . . . , bk) [Lawless et al., 2023]
  • 82. What if rates differ? Under some assumptions slightly weaker than those of Frazier et al. (2018) and Li Fearnhead (2018), 3. Local linear post-processing à la Beaumont (2002) achieves posterior contraction rate O(max(vn,k0+1, ε2 n)) while vanill’ABC faces posterior contraction rate O(max(vn,k0+1, εn)) i.e. εn is replaced by ε2 n at no additional computational cost [Lawless et al., 2023]
  • 83. What if rates differ? −3.0 −2.5 −2.0 −1.5 −1.0 −0.5 −4 −3 −2 −1 0 .. error o o Vanilla ABC Regression−adjusted ABC εn 104 iid Xi ∼ U(θ, θ + 1) with S1 = 1 √ n P√ n i=1 Xi [v1n = n−1/4] and S2 = min1≤i≤n Xi [vn = n−1] Posterior risks E[(θ − θ0)2]1/2, with rates εn and ε2 n, resp., until εn vn,k0+1 = n−1 [Lawless et al., 2023]
  • 84. ABC model choice consistency 4 Convergence ABC inference consistency ABC model choice consistency ABC under misspecification 5 Advanced topics
  • 85. Bayesian model choice Model candidates M1, M2, . . . to be compared for dataset xobs making model index M part of inference Use of a prior distribution, π(M = m), plus a prior distribution on the parameter conditional on the value m of the model index, πm(θm) Goal to derive the posterior distribution of M, challenging computational target when models are complex [Savage, 1964; Berger, 1980]
  • 86. Generic ABC for model choice Algorithm Likelihood-free model choice sampler (ABC-MC) for t = 1 to T do repeat Generate m from the prior π(M = m) Generate θm from the prior πm(θm) Generate z from the model fm(z|θm) until ρ{η(z), η(xobs)} ε Set m(t) = m and θ(t) = θm end for [Cornuet et al., DIYABC, 2009]
  • 87. ABC model choice consistency Leaving approximations aside, limiting ABC procedure is Bayes factor based on η(xobs) B12(η(xobs )) Potential loss of information at the testing level [Robert et al., 2010] When is Bayes factor based on insufficient statistic η(xobs) consistent? [Marin et al., 2013]
  • 88. ABC model choice consistency Leaving approximations aside, limiting ABC procedure is Bayes factor based on η(xobs) B12(η(xobs )) Potential loss of information at the testing level [Robert et al., 2010] When is Bayes factor based on insufficient statistic η(xobs) consistent? [Marin et al., 2013]
  • 89. Example 7: Gauss versus Laplace Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2: xobs ∼ L(θ2, 1/ √ 2)⊗n, Laplace distribution with mean θ2 and variance one Four possible statistics η(xobs) 1. sample mean xobs (sufficient for M1 if not M); 2. sample median med(xobs) (insufficient); 3. sample variance var(xobs) (ancillary); 4. median absolute deviation mad(xobs) = med(|xobs − med(xobs)|);
  • 90. Example 7: Gauss versus Laplace Model M1: xobs ∼ N(θ1, 1)⊗n opposed to model M2: xobs ∼ L(θ2, 1/ √ 2)⊗n, Laplace distribution with mean θ2 and variance one ● ● ● ● ● ● ● ● ● ● ● Gauss Laplace 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 n=100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Gauss Laplace 0.0 0.2 0.4 0.6 0.8 1.0 n=100
  • 91. Consistency Summary statistics η(xobs ) = (τ1(xobs ), τ2(xobs ), · · · , τd(xobs )) ∈ Rd with ▶ true distribution η ∼ Gn, true mean µ0, ▶ distribution Gi,n under model Mi, corresponding posteriors πi(· | ηn) Assumptions of central limit theorem and large deviations for η(xobs) under true, plus usual Bayesian asymptotics with di effective dimension of the parameter) [Pillai et al., 2013]
  • 92. Asymptotic marginals Asymptotically mi,n(t) = Z Θi gi,n(t|θi) πi(θi) dθi such that (i) if inf{|µi(θi) − µ0|; θi ∈ Θi} = 0, Cl √ n d−di ≤ mi,n(ηn ) ≤ Cu √ n d−di and (ii) if inf{|µi(θi) − µ0|; θi ∈ Θi} 0 mi,n(ηn ) = oPn [ √ n d−τi + √ n d−αi ].
  • 93. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value!
  • 94. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value! Indeed, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} = inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then Cl √ n −(d1−d2) ≤ m1,n(ηn ) m2(ηn ) ≤ Cu √ n −(d1−d2) , where Cl, Cu = OPn (1), irrespective of the true model. © Only depends on the difference d1 − d2: no consistency
  • 95. Between-model consistency Consequence of above is that asymptotic behaviour of the Bayes factor is driven by the asymptotic mean value µ(θ) of ηn under both models. And only by this mean value! Else, if inf{|µ0 − µ2(θ2)|; θ2 ∈ Θ2} inf{|µ0 − µ1(θ1)|; θ1 ∈ Θ1} = 0 then m1,n(ηn) m2,n(ηn) ≥ Cu min √ n −(d1−α2) , √ n −(d1−τ2)
  • 96. Checking for adequate statistics Run a practical check of the relevance (or non-relevance) of ηn null hypothesis that both models are compatible with the statistic ηn H0 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} = 0 against H1 : inf{|µ2(θ2) − µ0|; θ2 ∈ Θ2} 0 testing procedure provides estimates of mean of ηn under each model and checks for equality
  • 97. ABC under misspecification 4 Convergence ABC inference consistency ABC model choice consistency ABC under misspecification 5 Advanced topics
  • 98. ABC under misspecification ABC methods rely on simulations z(θ) from the model to identify those close to xobs What is happening when the model is wrong? ▶ for some tolerance sequences εn ↓ ε∗, well-behaved ABC posteriors that concentrate posterior mass on pseudo-true value ▶ if εn too large, asymptotic limit of ABC posterior uniform with radius of order εn − ε∗ ▶ even if √ n{εn − ε∗} → 2c ∈ R, limiting distribution no longer Gaussian ▶ ABC credible sets invalid confidence sets [Frazier et al., 2020]
  • 99. ABC under misspecification ABC methods rely on simulations z(θ) from the model to identify those close to xobs What is happening when the model is wrong? ▶ for some tolerance sequences εn ↓ ε∗, well-behaved ABC posteriors that concentrate posterior mass on pseudo-true value ▶ if εn too large, asymptotic limit of ABC posterior uniform with radius of order εn − ε∗ ▶ even if √ n{εn − ε∗} → 2c ∈ R, limiting distribution no longer Gaussian ▶ ABC credible sets invalid confidence sets [Frazier et al., 2020]
  • 100. Example 8: Normal model with wrong variance Assumed data generating process (DGP) is z ∼ N(θ, 1) but actual DGP is xobs ∼ N(θ, σ̃2) Use of summaries ▶ sample mean η1(xobs) = 1 n Pn i=1 xi ▶ centered summary η2(xobs) = 1 n−1 Pn i=1(xi − η1(xobs)2 − 1 Three ABC: ▶ ABC-AR: accept/reject approach with Kε(d{η(z), η(xobs)}) = I d{η(z), η(xobs)} ≤ ε and d{x, y} = ∥x − y∥ ▶ ABC-K: smooth rejection approach, with Kε(d{η(z), η(xobs)}) univariate Gaussian kernel ▶ ABC-Reg: post-processing ABC approach with weighted linear regression adjustment
  • 101. Example 8: Normal model with wrong variance ▶ posterior means for ABC-AR, ABC-K and ABC-Reg as σ2 increases (N = 50, 000 simulated data sets) ▶ αn = n−5/9 quantile for ABC-AR ▶ ABC-K and ABC-Reg bandwidth of n−5/9 [Frazier et al., 2020]
  • 102. ABC misspecification ▶ data xobs with true distribution P0 assumed issued from model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic η(xobs) = (η1(xobs), ..., ηkη (xobs)) ▶ misspecification inf θ∈Θ D(P0||Pθ) = inf θ∈Θ Z log dP0(x) dPθ(x) dP0(y) 0, with θ∗ = arg inf θ∈Θ D(P0||Pθ) [Muller, 2013] ▶ ABC misspecification: for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z)) inf θ∈Θ d{b0, b(θ)} 0
  • 103. ABC misspecification ▶ data xobs with true distribution P0 assumed issued from model Pθ θ ∈ Θ ⊂ Rkθ and summary statistic η(xobs) = (η1(xobs), ..., ηkη (xobs)) ▶ ABC misspecification: for b0 (resp. b(θ)) limit of η(xobs) (resp. η(z)) inf θ∈Θ d{b0, b(θ)} 0 ▶ ABC pseudo-true value: θ∗ = arg inf θ∈Θ d{b0, b(θ)}.
  • 104. Minimum tolerance Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} 0 Once εn ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 105. Minimum tolerance Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} 0 Once εn ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 106. Minimum tolerance Under identification conditions on b(·) ∈ Rkη , there exists ε∗ such that ε∗ = inf θ∈Θ d{b0, b(θ)} 0 Once εn ε∗ no draw of θ to be selected and posterior Πε[A|η(xobs)] ill-conditioned But appropriately chosen tolerance sequence (εn)n allows ABC-based posterior to concentrate on distance-dependent pseudo-true value θ∗
  • 107. ABC concentration under misspecification Assumptions [A0] Existence of unique b0 such that d(η(xobs), b0) = oP0 (1) and of sequence v0,n → +∞ such that lim inf n→+∞ P0 h d(η(xobs n ), b0) ≥ v−1 0,n i = 1.
  • 108. ABC concentration under misspecification Assumptions [A1] Existence of injective map b : Θ → B ⊂ Rkη and function ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing, such that Pθ [d(η(Z), b(θ)) u] ≤ c(θ)ρn(u), Z Θ c(θ)dΠ(θ) ∞ and assume either (i) Polynomial deviations: existence of vn ↑ +∞ and u0, κ 0 such that ρn(u) = v−κ n u−κ, for u ≤ u0 (ii) Exponential deviations:
  • 109. ABC concentration under misspecification Assumptions [A1] Existence of injective map b : Θ → B ⊂ Rkη and function ρn with ρn(·) ↓ 0 as n → +∞, and ρn(·) non-increasing, such that Pθ [d(η(Z), b(θ)) u] ≤ c(θ)ρn(u), Z Θ c(θ)dΠ(θ) ∞ and assume either (i) Polynomial deviations: (ii) Exponential deviations: existence of hθ(·) 0 such that Pθ[d(η(z), b(θ)) u] ≤ c(θ)e−hθ(uvn) and existence of m, C 0 such that Z Θ c(θ)e−hθ(uvn) dΠ(θ) ≤ Ce−m·(uvn)τ , for u ≤ u0.
  • 110. ABC concentration under misspecification Assumptions [A2] existence of D 0 and M0, δ0 0 such that, for all δ0 ≥ δ 0 and M ≥ M0, existence of Sδ ⊂ {θ ∈ Θ : d(b(θ), b0) − ε∗ ≤ δ} for which (i) In case (i), D κ and R Sδ 1 − c(θ) M dΠ(θ) ≳ δD. (ii) In case (ii), R Sδ 1 − c(θ)e−hθ(M) dΠ(θ) ≳ δD.
  • 111. Consistency Assume [A0] – [A2], with εn ↓ ε∗ with εn ≥ ε∗ + Mv−1 n + v−1 0,n, for M large enough. Let Mn ↑ ∞ and δn ≥ Mn(εn − ε∗), then Πε h d(b(θ), b0) ≥ ε∗ + δn | η(xobs ) i = oP0 (1), 1. if δn ≥ Mnv−1 n u −D/κ n = o(1) in case (i) 2. if δn ≥ Mnv−1 n | log(un)|1/τ = o(1) in case (ii) with un = εn − (ε∗ + Mv−1 n + v−1 0,n) ≥ 0 . [Bernton et al., 2017; Frazier et al., 2020]
  • 112. Regression adjustement under misspecification Accepted value θ artificially related to η(xobs) and η(z) through local linear regression model θ′ = µ + β⊺ {η(xobs ) − η(z)} + ν, where νi model residual [Beaumont et al., 2003] Asymptotic behavior of ABC-Reg posterior e Πε[· | η(xobs )] determined by behavior of Πε[· | η(xobs )], ^ β, and {η(xobs ) − η(z)}
  • 113. Regression adjustement under misspecification Accepted value θ artificially related to η(xobs) and η(z) through local linear regression model θ′ = µ + β⊺ {η(xobs ) − η(z)} + ν, where νi model residual [Beaumont et al., 2003] Asymptotic behavior of ABC-Reg posterior e Πε[· | η(xobs )] determined by behavior of Πε[· | η(xobs )], ^ β, and {η(xobs ) − η(z)}
  • 114. Regression adjustement under misspecification ▶ ABC-Reg takes draws of asymptotically optimal θ, perturbed in a manner that need not preserve original optimality ▶ for ∥β0∥ large, pseudo-true value θ̃∗ possibly outside Θ ▶ extends to nonlinear regression adjustments [Blum François, 2010] ▶ potential correction of the adjustment [Frazier et al., 2020] ▶ local regression adjustments with smaller posterior variability than ABC-AR but fake precision
  • 115. Advanced topics 5 Advanced topics big data issues big parameter issues
  • 116. Computational bottleneck Time per iteration increases with sample size n of the data: cost of sampling O(n1+?) associated with a reasonable acceptance probability makes ABC infeasible for large datasets ▶ surrogate models to get samples (e.g., using copulas) ▶ direct sampling of summary statistics (e.g., synthetic likelihood) [Wood, 2010] ▶ borrow from proposals for scalable MCMC (e.g., divide conquer)
  • 117. Approximate ABC [AABC] Idea approximations on both parameter and model spaces by resorting to bootstrap techniques. [Buzbas Rosenberg, 2015] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks ▶ If m too small, prior predictive sample may miss informative parameters ▶ large error and misleading representation of true posterior ▶ only suited for models with very few parameters
  • 118. Approximate ABC [AABC] Procedure scheme 1. Sample (θi, xi), i = 1, . . . , m, from prior predictive 2. Simulate θ∗ ∼ π(·) and assign weight wi to dataset x(i) simulated under k-closest θi to θ∗ 3. Generate dataset x∗ as bootstrap weighted sample from (x(1), . . . , x(k)) Drawbacks ▶ If m too small, prior predictive sample may miss informative parameters ▶ large error and misleading representation of true posterior ▶ only suited for models with very few parameters
  • 119. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelmé Chopin, 2014]
  • 120. Divide-and-conquer perspectives 1. divide the large dataset into smaller batches 2. sample from the batch posterior 3. combine the result to get a sample from the targeted posterior Alternative via ABC-EP [Barthelmé Chopin, 2014]
  • 121. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) Y j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014]
  • 122. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) Y j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014] Drawback require sampling from f(· | θ)B by ABC means. Should be feasible for latent variable (z) representations when f(x | z, θ) available in closed form [Doucet Robert, 2001]
  • 123. Geometric combination: WASP Subset posteriors given partition xobs [1] , . . . , xobs [B] of observed data xobs, let define π(θ | xobs [b] ) ∝ π(θ) Y j∈[b] f(xobs j | θ)B . [Srivastava et al., 2015] Subset posteriors are combined via Wasserstein barycenter [Cuturi, 2014] Alternative backfeed subset posteriors as priors to other subsets, partitioning summaries
  • 124. Consensus ABC Naı̈ve scheme ▶ For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior π̃(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^ π(· | d(S(xobs [b] ), S(x[b])) ε) 3. Compute sample posterior variance Σ−1 b ▶ Combine batch posterior approximations θj = B X b=1 Σbθ [b] j . B X b=1 Σb Remark Diffuse prior π̃(·) non informative calls for ABC-MCMC steps
  • 125. Consensus ABC Naı̈ve scheme ▶ For each data batch b = 1, . . . , B 1. Sample (θ [b] 1 , . . . , θ [b] n ) from diffused prior π̃(·) ∝ π(·)1/B 2. Run ABC to sample from batch posterior ^ π(· | d(S(xobs [b] ), S(x[b])) ε) 3. Compute sample posterior variance Σ−1 b ▶ Combine batch posterior approximations θj = B X b=1 Σbθ [b] j . B X b=1 Σb Remark Diffuse prior π̃(·) non informative calls for ABC-MCMC steps
  • 126. big parameter issues 5 Advanced topics big data issues big parameter issues
  • 127. Big parameter issues Curse of dimension: as dim(Θ) = kθ increases ▶ exploration of parameter space gets harder ▶ summary statistic η forced to increase, since at least of dimension kη ≥ dim(Θ) ▶ acceptance probability decreases
  • 128. Big parameter issues Curse of dimension: as dim(Θ) = kθ increases ▶ exploration of parameter space gets harder ▶ summary statistic η forced to increase, since at least of dimension kη ≥ dim(Θ) ▶ acceptance probability decreases For a given tolerance εn, average acceptance probability αn = Z π(θ)Pθ (|η(y) − η(z)| ≤ εn) dθ ≍ Lnεd n [Lawlesss et al., 2023]
  • 129. Big parameter issues Curse of dimension: as dim(Θ) = kθ increases ▶ exploration of parameter space gets harder ▶ summary statistic η forced to increase, since at least of dimension kη ≥ dim(Θ) ▶ acceptance probability decreases Some solutions ▶ adopt more local algorithms like ABC-MCMC or ABC-SMC ▶ aim at posterior marginals and approximate joint posterior by copula [Li et al., 2016] ▶ run ABC-Gibbs [Clarté et al., 2016]
  • 130. Example 11: Hierarchical MA(2) ▶ xi iid ∼ MA2(µi, σi) ▶ µi = (βi,1 − βi,2, 2(βi,1 + βi,2) − 1) with (βi,1, βi,2, βi,3) iid ∼ Dir(α1, α2, α3) ▶ σi iid ∼ IG(σ1, σ2) ▶ α = (α1, α2, α3), with prior E(1)⊗3 ▶ σ = (σ1, σ2) with prior C(1)+⊗2 α µ1 µ2 µn . . . x1 x2 xn . . . σ σ1 σ2 σn . . .
  • 131. Example 11: Hierarchical MA(2) © Based on 106 prior and 103 posteriors simulations, 3n summary statistics, and series of length 100, ABC-Rej posterior hardly distinguishable from prior!
  • 132. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x⋆, sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide conquer: ▶ one tolerance εj for each parameter θj ▶ one statistic sj for each parameter θj [Clarté et al., 2019]
  • 133. ABC-Gibbs When parameter decomposed into θ = (θ1, . . . , θn) Algorithm ABC-Gibbs sampler starting point θ(0) = (θ (0) 1 , . . . , θ (0) n ), observation xobs for i = 1, . . . , N do for j = 1, . . . , n do θ (i) j ∼ πεj (· | x⋆, sj, θ (i) 1 , . . . , θ (i) j−1, θ (i−1) j+1 , . . . , θ (i−1) n ) end for end for Divide conquer: ▶ one tolerance εj for each parameter θj ▶ one statistic sj for each parameter θj [Clarté et al., 2019]
  • 134. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x⋆ ) | α, µ). conditionals are incompatible ▶ ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) ▶ potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) [Clarté et al., 2016]
  • 135. Compatibility When using ABC-Gibbs conditionals with different acceptance events, e.g., different statistics π(α)π(sα(µ) | α) and π(µ)f(sµ(x⋆ ) | α, µ). conditionals are incompatible ▶ ABC-Gibbs does not necessarily converge (even for tolerance equal to zero) ▶ potential limiting distribution not a genuine posterior (double use of data) unknown [except for a specific version] possibly far from genuine posterior(s) [Clarté et al., 2016]
  • 136. Convergence In hierarchical case n = 2, Theorem If there exists 0 κ 1/2 such that sup θ1,θ̃1 ∥πε2 (· | x⋆ , s2, θ1) − πε2 (· | x⋆ , s2, θ̃1)∥TV = κ ABC-Gibbs Markov chain geometrically converges in total variation to stationary distribution νε, with geometric rate 1 − 2κ.
  • 137. Example 11: Hierarchical MA(2) Separation from the prior for identical number of simulations
  • 138. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) alternative ABC based on: π̃(α, µ | x⋆ ) ∝ π(α)q(µ) Z generate a new µ z }| { π(µ̃ | α)1η(sα(µ),sα(µ̃))εα dµ̃ × Z f(x̃ | µ)π(x⋆ | µ)1η(sµ(x⋆),sµ(x̃))εµ dx̃, with q arbitrary distribution on µ
  • 139. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) induces full conditionals π̃(α | µ) ∝ π(α) Z π(µ̃ | α)1η(sα(µ),sα(µ̃))εα dx̃ and π̃(µ | α, x⋆ ) ∝ q(µ) Z π(µ̃ | α)1η(sα(µ),sα(µ̃))εα dµ̃ × Z f(x̃ | µ)π(x⋆ | µ)1η(sµ(x⋆),sµ(x̃))εµ dx̃ now compatible with new artificial joint
  • 140. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) that is, ▶ prior simulations of α ∼ π(α) and of µ̃ ∼ π(µ̃ | α) until η(sα(µ), sα(µ̃)) εα ▶ simulation of µ from instrumental q(µ) and of auxiliary variables µ̃ and x̃ until both constraints satisfied
  • 141. Explicit limiting distribution For model xj | µj ∼ π(xj | µj) , µj | α i.i.d. ∼ π(µj | α) , α ∼ π(α) Resulting Gibbs sampler stationary for posterior proportional to π(α, µ) q(sα(µ)) | {z } projection f(sµ(x⋆ ) | µ) | {z } projection that is, for likelihood associated with sµ(x⋆) and prior distribution proportional to π(α, µ)q(sα(µ)) [exact!]
  • 142. Thank you Gael!!! And all the best for the next move!
  • 143. ERC postdoc position Ongoing 2023-2030 ERC funding for PhD positions and postdoctoral collaborations with ▶ Michael Jordan (more Paris than Berkeley) ▶ Eric Moulines (Paris) ▶ Gareth Roberts (Warwick) ▶ myself (more Paris than Warwick) on OCEAN (On IntelligenCE And Networks) project