Presented at BayesComp 2023 in Levi (Finland), based on Picchini and Tamborrino (2022). Guided sequential ABC schemes for intractable Bayesian models, arXiv:2206.12235.
Guided sequential ABC schemes for simulation-based inference
1. Guided sequential ABC schemes for
simulation-based inference
Umberto Picchini
Dept. Mathematical Sciences, Chalmers and Gothenburg University
7@uPicchini
BayesComp, Levi (Finland) 12-17 March 2023
1
2. Joint work with Massimiliano Tamborrino (Warwick Uni, UK)
U. Picchini, M. Tamborrino. Guided sequential ABC schemes
for intractable Bayesian models. arXiv:2206.12235, 2022.
2
3. The work is about approximate Bayesian computation (ABC) which
is a simulation-based inference methodology.
We want to construct ways to improve sequential ABC schemes.
Sequential ABC schemers are the state-of-art, however these can be
“slow” in the sense that the cloud of parameters “particles” can be
very diffuse, especially in the initial iterations.
This is typical as the initial proposal sampler is the (possibly vague)
prior → very different from the posterior → low acceptance rate.
We explore how to make parameter proposals informed by the
observed data and hence be guided by data.
This will reduce the rejection rate and reduce the computational
effort, and we show that we keep accurate inference.
3
4. The work is about approximate Bayesian computation (ABC) which
is a simulation-based inference methodology.
We want to construct ways to improve sequential ABC schemes.
Sequential ABC schemers are the state-of-art, however these can be
“slow” in the sense that the cloud of parameters “particles” can be
very diffuse, especially in the initial iterations.
This is typical as the initial proposal sampler is the (possibly vague)
prior → very different from the posterior → low acceptance rate.
We explore how to make parameter proposals informed by the
observed data and hence be guided by data.
This will reduce the rejection rate and reduce the computational
effort, and we show that we keep accurate inference.
3
5. The work is about approximate Bayesian computation (ABC) which
is a simulation-based inference methodology.
We want to construct ways to improve sequential ABC schemes.
Sequential ABC schemers are the state-of-art, however these can be
“slow” in the sense that the cloud of parameters “particles” can be
very diffuse, especially in the initial iterations.
This is typical as the initial proposal sampler is the (possibly vague)
prior → very different from the posterior → low acceptance rate.
We explore how to make parameter proposals informed by the
observed data and hence be guided by data.
This will reduce the rejection rate and reduce the computational
effort, and we show that we keep accurate inference.
3
6. The work is about approximate Bayesian computation (ABC) which
is a simulation-based inference methodology.
We want to construct ways to improve sequential ABC schemes.
Sequential ABC schemers are the state-of-art, however these can be
“slow” in the sense that the cloud of parameters “particles” can be
very diffuse, especially in the initial iterations.
This is typical as the initial proposal sampler is the (possibly vague)
prior → very different from the posterior → low acceptance rate.
We explore how to make parameter proposals informed by the
observed data and hence be guided by data.
This will reduce the rejection rate and reduce the computational
effort, and we show that we keep accurate inference.
3
7. The work is about approximate Bayesian computation (ABC) which
is a simulation-based inference methodology.
We want to construct ways to improve sequential ABC schemes.
Sequential ABC schemers are the state-of-art, however these can be
“slow” in the sense that the cloud of parameters “particles” can be
very diffuse, especially in the initial iterations.
This is typical as the initial proposal sampler is the (possibly vague)
prior → very different from the posterior → low acceptance rate.
We explore how to make parameter proposals informed by the
observed data and hence be guided by data.
This will reduce the rejection rate and reduce the computational
effort, and we show that we keep accurate inference.
3
8. As long as we have a realistic model (implemented into a computer
model) from which we can simulate artificial datasets given model
parameters θ, we can produce “some” statistical inference.
This is useful when the likelihood function of θ is unavailable.
Even for models that are very simple to write down, it may be
impossible to do exact inference due to high-dimensional integration.
4
9. As long as we have a realistic model (implemented into a computer
model) from which we can simulate artificial datasets given model
parameters θ, we can produce “some” statistical inference.
This is useful when the likelihood function of θ is unavailable.
Even for models that are very simple to write down, it may be
impossible to do exact inference due to high-dimensional integration.
4
10. As long as we have a realistic model (implemented into a computer
model) from which we can simulate artificial datasets given model
parameters θ, we can produce “some” statistical inference.
This is useful when the likelihood function of θ is unavailable.
Even for models that are very simple to write down, it may be
impossible to do exact inference due to high-dimensional integration.
4
11. As long as we have a realistic model (implemented into a computer
model) from which we can simulate artificial datasets given model
parameters θ, we can produce “some” statistical inference.
This is useful when the likelihood function of θ is unavailable.
Even for models that are very simple to write down, it may be
impossible to do exact inference due to high-dimensional integration.
4
12. A paradigm shift is the concept of generative model or
simulator.
Say we write a computer code for the model M(θ) simulator as
an idealized representation of the phenomenon under study.
θ∗
→ M(θ∗
) → y∗
As long as we are able to run an instance of the model, we
simulate/generate artificial data y∗with y∗ ∼ p(y∗|θ = θ∗).
So we have obtained a sample y∗ from the (unknown) likelihood
using the simulator M(θ)
Therefore the simulator M(θ) defines the probabilistic model
p(y|θ) implicitly!
5
13. A paradigm shift is the concept of generative model or
simulator.
Say we write a computer code for the model M(θ) simulator as
an idealized representation of the phenomenon under study.
θ∗
→ M(θ∗
) → y∗
As long as we are able to run an instance of the model, we
simulate/generate artificial data y∗with y∗ ∼ p(y∗|θ = θ∗).
So we have obtained a sample y∗ from the (unknown) likelihood
using the simulator M(θ)
Therefore the simulator M(θ) defines the probabilistic model
p(y|θ) implicitly!
5
14. A paradigm shift is the concept of generative model or
simulator.
Say we write a computer code for the model M(θ) simulator as
an idealized representation of the phenomenon under study.
θ∗
→ M(θ∗
) → y∗
As long as we are able to run an instance of the model, we
simulate/generate artificial data y∗with y∗ ∼ p(y∗|θ = θ∗).
So we have obtained a sample y∗ from the (unknown) likelihood
using the simulator M(θ)
Therefore the simulator M(θ) defines the probabilistic model
p(y|θ) implicitly!
5
15. ABC-rejection with summary statistics
1. simulate from the prior θ∗ ∼ π(θ)
2. simulate M(θ∗) → y∗, compute S(y∗)
3. if ∥ S(y∗) − S(yo) ∥< ϵ store θ∗. Go to step 1 and repeat.
Stored θ’s are from πϵ(θ|S(yo))
with
πϵ(θ|S(yo
)) ∝ π(θ)
Z
Y
IAϵ,yo (y∗
)p(y∗
|θ)dy∗
Aϵ,yo (y∗
) = {y∗
∈ Y; ∥ S(y∗
) − S(yo
) ∥< ϵ}.
6
16. ABC-rejection with summary statistics
1. simulate from the prior θ∗ ∼ π(θ)
2. simulate M(θ∗) → y∗, compute S(y∗)
3. if ∥ S(y∗) − S(yo) ∥< ϵ store θ∗. Go to step 1 and repeat.
Stored θ’s are from πϵ(θ|S(yo))
with
πϵ(θ|S(yo
)) ∝ π(θ)
Z
Y
IAϵ,yo (y∗
)p(y∗
|θ)dy∗
Aϵ,yo (y∗
) = {y∗
∈ Y; ∥ S(y∗
) − S(yo
) ∥< ϵ}.
6
17. How to simulate the proposal θ∗
?
Several possibilities to simulate the proposal θ∗. The most
common are:
ˆ (inefficient but parallelisable) acceptance-rejection ABC,
proposes from θ∗ ∼ π(θ);
ˆ MCMC-ABC (serial, difficult to tune the ϵ), high
autocorrelations, proposes from kernel θ∗ ∼ q(θ∗|θ);
ˆ SIS-ABC and SMC-ABC (parallelisable, more plug-and
play than MCMC-ABC).
Today we exclusively discuss sequential ABC (SMC-ABC and
SIS-ABC)
7
18. How to simulate the proposal θ∗
?
Several possibilities to simulate the proposal θ∗. The most
common are:
ˆ (inefficient but parallelisable) acceptance-rejection ABC,
proposes from θ∗ ∼ π(θ);
ˆ MCMC-ABC (serial, difficult to tune the ϵ), high
autocorrelations, proposes from kernel θ∗ ∼ q(θ∗|θ);
ˆ SIS-ABC and SMC-ABC (parallelisable, more plug-and
play than MCMC-ABC).
Today we exclusively discuss sequential ABC (SMC-ABC and
SIS-ABC)
7
19. How to simulate the proposal θ∗
?
Several possibilities to simulate the proposal θ∗. The most
common are:
ˆ (inefficient but parallelisable) acceptance-rejection ABC,
proposes from θ∗ ∼ π(θ);
ˆ MCMC-ABC (serial, difficult to tune the ϵ), high
autocorrelations, proposes from kernel θ∗ ∼ q(θ∗|θ);
ˆ SIS-ABC and SMC-ABC (parallelisable, more plug-and
play than MCMC-ABC).
Today we exclusively discuss sequential ABC (SMC-ABC and
SIS-ABC)
7
20. How to simulate the proposal θ∗
?
Several possibilities to simulate the proposal θ∗. The most
common are:
ˆ (inefficient but parallelisable) acceptance-rejection ABC,
proposes from θ∗ ∼ π(θ);
ˆ MCMC-ABC (serial, difficult to tune the ϵ), high
autocorrelations, proposes from kernel θ∗ ∼ q(θ∗|θ);
ˆ SIS-ABC and SMC-ABC (parallelisable, more plug-and
play than MCMC-ABC).
Today we exclusively discuss sequential ABC (SMC-ABC and
SIS-ABC)
7
21. With sequential schemes, we transverse T generations of
populations of parameters.
We first sample many “particles” from the prior→ accept some
using a large threshold ϵ1 →perturb them→accept some using a
smaller ϵ2 < ϵ1 →perturb...
We stop when it takes an unbearable time to accept further
particles (eg stop when acceptance rate is below 1.5%).
8
22. With sequential schemes, we transverse T generations of
populations of parameters.
We first sample many “particles” from the prior→ accept some
using a large threshold ϵ1 →perturb them→accept some using a
smaller ϵ2 < ϵ1 →perturb...
We stop when it takes an unbearable time to accept further
particles (eg stop when acceptance rate is below 1.5%).
8
23. SMC-ABC moves N particles through T ABC posteriors as:
ˆ (t = 1) run acceptance-rejection ABC, getting particles
θ
(1)
1 , ..., θ
(N)
1 . Assign weights to each particle w
(i)
1 = 1/N.
ˆ for t=2:T
for i=1:N
repeat until acceptance:
sample a θ∗
k from set (θ
(1)
t−1, ..., θ
(N)
t−1) with probabilities
(w
(1)
t−1, ..., w
(N)
t−1).
(perturb) θ∗∗
k ∼ qt(·|θ∗
k)
simulate y∗
∼ p(y|θ∗∗
k ), get summaries s∗
= S(y∗
)
accept and store θ
(i)
t := θ∗∗
k if ||s∗
− sy|| < ϵt
if accept, set w
(i)
t = π(θ
(i)
t )/
PN
j=1 w
(j)
t−1qt(θ
(i)
t |θ
(j)
t−1)
normalise w
(i)
t := w
(i)
t /
P
i w
(i)
t
end
end
Reduce ϵ for next iteration.
end 9
24. SMC-ABC moves N particles through T ABC posteriors as:
ˆ (t = 1) run acceptance-rejection ABC, getting particles
θ
(1)
1 , ..., θ
(N)
1 . Assign weights to each particle w
(i)
1 = 1/N.
ˆ for t=2:T
for i=1:N
repeat until acceptance:
sample a θ∗
k from set (θ
(1)
t−1, ..., θ
(N)
t−1) with probabilities
(w
(1)
t−1, ..., w
(N)
t−1).
(perturb) θ∗∗
k ∼ qt(·|θ∗
k)
simulate y∗
∼ p(y|θ∗∗
k ), get summaries s∗
= S(y∗
)
accept and store θ
(i)
t := θ∗∗
k if ||s∗
− sy|| < ϵt
if accept, set w
(i)
t = π(θ
(i)
t )/
PN
j=1 w
(j)
t−1qt(θ
(i)
t |θ
(j)
t−1)
normalise w
(i)
t := w
(i)
t /
P
i w
(i)
t
end
end
Reduce ϵ for next iteration.
end 9
25. SMC-ABC moves N particles through T ABC posteriors as:
ˆ (t = 1) run acceptance-rejection ABC, getting particles
θ
(1)
1 , ..., θ
(N)
1 . Assign weights to each particle w
(i)
1 = 1/N.
ˆ for t=2:T
for i=1:N
repeat until acceptance:
sample a θ∗
k from set (θ
(1)
t−1, ..., θ
(N)
t−1) with probabilities
(w
(1)
t−1, ..., w
(N)
t−1).
(perturb) θ∗∗
k ∼ qt(·|θ∗
k)
simulate y∗
∼ p(y|θ∗∗
k ), get summaries s∗
= S(y∗
)
accept and store θ
(i)
t := θ∗∗
k if ||s∗
− sy|| < ϵt
if accept, set w
(i)
t = π(θ
(i)
t )/
PN
j=1 w
(j)
t−1qt(θ
(i)
t |θ
(j)
t−1)
normalise w
(i)
t := w
(i)
t /
P
i w
(i)
t
end
end
Reduce ϵ for next iteration.
end 9
26. The “perturbing proposal” sampler is an arbitrary qt(·|θ).
In practice, the most popular proposal sampler is a multivariate
Gaussian centred at a randomly picked particle θ∗
t−1 chosen
from the previous iteration
θ∗∗
t ∼ qt(·|θ∗
) = N(θ∗
t−1, 2Σt−1)
with Σt−1 = cov(θ
(1)
t−1, ..., θ
(N)
t−1).
ˆ Beaumont, Cornuet, Marin, Robert. Biometrika, 96(4):983–990, 2009.
ˆ Toni, Welch, Strelkowa, Ipsen, and Stumpf. Journal of the Royal Society
Interface, 6(31):187–202, 2008.
ˆ Filippi, Barnes, Cornebise, and Stumpf. Statistical applications in genetics
and molecular biology, 12(1):87–107, 2013.
This proposal is implemented in off-the-shelf software such as
pyABC and ABCpy.
10
27. The “perturbing proposal” sampler is an arbitrary qt(·|θ).
In practice, the most popular proposal sampler is a multivariate
Gaussian centred at a randomly picked particle θ∗
t−1 chosen
from the previous iteration
θ∗∗
t ∼ qt(·|θ∗
) = N(θ∗
t−1, 2Σt−1)
with Σt−1 = cov(θ
(1)
t−1, ..., θ
(N)
t−1).
ˆ Beaumont, Cornuet, Marin, Robert. Biometrika, 96(4):983–990, 2009.
ˆ Toni, Welch, Strelkowa, Ipsen, and Stumpf. Journal of the Royal Society
Interface, 6(31):187–202, 2008.
ˆ Filippi, Barnes, Cornebise, and Stumpf. Statistical applications in genetics
and molecular biology, 12(1):87–107, 2013.
This proposal is implemented in off-the-shelf software such as
pyABC and ABCpy.
10
28. The “perturbing proposal” sampler is an arbitrary qt(·|θ).
In practice, the most popular proposal sampler is a multivariate
Gaussian centred at a randomly picked particle θ∗
t−1 chosen
from the previous iteration
θ∗∗
t ∼ qt(·|θ∗
) = N(θ∗
t−1, 2Σt−1)
with Σt−1 = cov(θ
(1)
t−1, ..., θ
(N)
t−1).
ˆ Beaumont, Cornuet, Marin, Robert. Biometrika, 96(4):983–990, 2009.
ˆ Toni, Welch, Strelkowa, Ipsen, and Stumpf. Journal of the Royal Society
Interface, 6(31):187–202, 2008.
ˆ Filippi, Barnes, Cornebise, and Stumpf. Statistical applications in genetics
and molecular biology, 12(1):87–107, 2013.
This proposal is implemented in off-the-shelf software such as
pyABC and ABCpy.
10
29. The “perturbing proposal” sampler is an arbitrary qt(·|θ).
In practice, the most popular proposal sampler is a multivariate
Gaussian centred at a randomly picked particle θ∗
t−1 chosen
from the previous iteration
θ∗∗
t ∼ qt(·|θ∗
) = N(θ∗
t−1, 2Σt−1)
with Σt−1 = cov(θ
(1)
t−1, ..., θ
(N)
t−1).
ˆ Beaumont, Cornuet, Marin, Robert. Biometrika, 96(4):983–990, 2009.
ˆ Toni, Welch, Strelkowa, Ipsen, and Stumpf. Journal of the Royal Society
Interface, 6(31):187–202, 2008.
ˆ Filippi, Barnes, Cornebise, and Stumpf. Statistical applications in genetics
and molecular biology, 12(1):87–107, 2013.
This proposal is implemented in off-the-shelf software such as
pyABC and ABCpy.
10
30. What we propose here is to construct ways to incorporate observed
data summaries sy in the proposal sampler, so to propose from some
qt(·|sy, ...).
Therefore we construct guided (by data) sequential proposal
samplers.
11
31. What we propose here is to construct ways to incorporate observed
data summaries sy in the proposal sampler, so to propose from some
qt(·|sy, ...).
Therefore we construct guided (by data) sequential proposal
samplers.
11
32. Guided Sequential Importance Sampling ABC (SIS-ABC)
1: for i = 1, . . . , N do
2: repeat
3: θ∗ ∼ qt(µt−1, Σt−1) = N(µt−1, 2Σt−1).
4: ...etc
5: until ||s∗ − sy|| < ϵ
6: end for
Goal : From : θ∗
∼ qt(µt−1, Σt−1) to qt(· · · |sy)
We build on P, Simola and Corander1
:
ˆ denote by (θ(i)
, s(i)
) a (parameter,summary)-pair accepted at the
previous iteration;
ˆ assume that (θ(i)
, s(i)
) ∼ N(m, S), with
m ≡ (mθ, ms), St−1 ≡
"
Sθ Sθs
Ssθ Ss
#
,
1
P, Simola, Corander. Sequentially guided MCMC proposals for synthetic likelihoods and
correlated synthetic likelihoods, Bayesian Analysis 2022.
12
33. Guided Sequential Importance Sampling ABC (SIS-ABC)
1: for i = 1, . . . , N do
2: repeat
3: θ∗ ∼ qt(µt−1, Σt−1) = N(µt−1, 2Σt−1).
4: ...etc
5: until ||s∗ − sy|| < ϵ
6: end for
Goal : From : θ∗
∼ qt(µt−1, Σt−1) to qt(· · · |sy)
We build on P, Simola and Corander1
:
ˆ denote by (θ(i)
, s(i)
) a (parameter,summary)-pair accepted at the
previous iteration;
ˆ assume that (θ(i)
, s(i)
) ∼ N(m, S), with
m ≡ (mθ, ms), St−1 ≡
"
Sθ Sθs
Ssθ Ss
#
,
1
P, Simola, Corander. Sequentially guided MCMC proposals for synthetic likelihoods and
correlated synthetic likelihoods, Bayesian Analysis 2022.
12
34. Guided Sequential Importance Sampling ABC (SIS-ABC)
1: for i = 1, . . . , N do
2: repeat
3: θ∗ ∼ qt(µt−1, Σt−1) = N(µt−1, 2Σt−1).
4: ...etc
5: until ||s∗ − sy|| < ϵ
6: end for
Goal : From : θ∗
∼ qt(µt−1, Σt−1) to qt(· · · |sy)
We build on P, Simola and Corander1
:
ˆ denote by (θ(i)
, s(i)
) a (parameter,summary)-pair accepted at the
previous iteration;
ˆ assume that (θ(i)
, s(i)
) ∼ N(m, S), with
m ≡ (mθ, ms), St−1 ≡
"
Sθ Sθs
Ssθ Ss
#
,
1
P, Simola, Corander. Sequentially guided MCMC proposals for synthetic likelihoods and
correlated synthetic likelihoods, Bayesian Analysis 2022.
12
35. Then it is well know that conditionals of a multivariate
Gaussian are Gaussian
θ(i)
|s(i)
∼ N(mθ(i)|s(i) , Sθ(i)|s(i) ).
let’s use this fact.
13
36. For x
(i)
t := (θ
(i)
t , s
(i)
t ) ∼ N(mt, St) we can estimate mt and St via the
(weighted) accepted particles as
m̂t =
N
X
i=1
w
(i)
t x
(i)
t , Ŝt =
PN
i=1 w
(i)
t (x
(i)
t−1 − m̂t)(x(i)
− m̂t)′
(1 −
PN
i=1 w
(i)
t
2
)
(1)
ˆ We have m̂t = (m̂θ, m̂s)
ˆ and
Ŝ ≡
"
Ŝθ Ŝθs
Ŝsθ Ŝs
#
.
Using the fact that conditionals are Gaussian, we can write a guided
proposal for next iteration t + 1
qt+1(θ|sy) ≡ N(m̂θ|sy,t, Ŝθ|sy,t)
with
m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1
(sy − m̂s) (2)
Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1
Ŝsθ. (3)
14
37. For x
(i)
t := (θ
(i)
t , s
(i)
t ) ∼ N(mt, St) we can estimate mt and St via the
(weighted) accepted particles as
m̂t =
N
X
i=1
w
(i)
t x
(i)
t , Ŝt =
PN
i=1 w
(i)
t (x
(i)
t−1 − m̂t)(x(i)
− m̂t)′
(1 −
PN
i=1 w
(i)
t
2
)
(1)
ˆ We have m̂t = (m̂θ, m̂s)
ˆ and
Ŝ ≡
"
Ŝθ Ŝθs
Ŝsθ Ŝs
#
.
Using the fact that conditionals are Gaussian, we can write a guided
proposal for next iteration t + 1
qt+1(θ|sy) ≡ N(m̂θ|sy,t, Ŝθ|sy,t)
with
m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1
(sy − m̂s) (2)
Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1
Ŝsθ. (3)
14
38. For x
(i)
t := (θ
(i)
t , s
(i)
t ) ∼ N(mt, St) we can estimate mt and St via the
(weighted) accepted particles as
m̂t =
N
X
i=1
w
(i)
t x
(i)
t , Ŝt =
PN
i=1 w
(i)
t (x
(i)
t−1 − m̂t)(x(i)
− m̂t)′
(1 −
PN
i=1 w
(i)
t
2
)
(1)
ˆ We have m̂t = (m̂θ, m̂s)
ˆ and
Ŝ ≡
"
Ŝθ Ŝθs
Ŝsθ Ŝs
#
.
Using the fact that conditionals are Gaussian, we can write a guided
proposal for next iteration t + 1
qt+1(θ|sy) ≡ N(m̂θ|sy,t, Ŝθ|sy,t)
with
m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1
(sy − m̂s) (2)
Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1
Ŝsθ. (3)
14
39. The guided sampler we have defined is useful for a sequential
importance sampling ABC (SIS-ABC).
We use it for SIS-ABC and not for SMC-ABC since our
proposal function has global features, since both the mean and
the covariance are not particle-specific.
qt(θ|sy) ≡ N(m̂θ|sy,t−1, Ŝθ|sy,t−1)
m̂θ|sy,t = m̂θ + Ŝθs(Ŝs)−1
(sy − m̂s) (4)
Ŝθ|sy,t = Ŝθ − Ŝθs(Ŝs)−1
Ŝsθ. (5)
We call this sampler blocked because it proposes in block all
coordinates of θ.
15
40. The previous sampler need some adjustment (details in the
paper!) as otherwise it is too “mode-seeking”, ie neglects the
posterior tails to focus on the mode-region.
No time here, but the paper has details on how to fix this.
16
41. Guided sequential Monte Carlo ABC (SMC-ABC)
We have built also other proposals. What we do here is specific for
SMC-ABC since this time we do condition on a sampled particle
AND sy.
Recall in SMC-ABC we have a resampling step (snippet below):
1: for i = 1, . . . , N do
2: repeat
3: pick (with replacement) θ∗
from the weighted set
{θ
(i)
t−1, w
(i)
t−1}N
i=1.
4: θ∗∗
∼ N(θ∗
, 2Σt−1).
5: ...etc
6: until ||s∗
− sy|| < ϵ
7: end for
We are going to sample θ∗∗
conditionally on components of θ∗
and sy.
17
42. Guided SMC-ABC
As usual we iteratively resample a θ∗
from the previous iteration.
How can we condition on both sy and θ∗
?
We cannot condition simultaneously on all coordinates of θ∗
: imagine
instead decomposing θ∗
= (θ∗
k, θ∗
−k) and now consider
θ∗
k
θ∗
−k
sy
(6)
We now place a multivariate Gaussian assumption on (6) so we can
produce a “perturbation kernel”
qt(θ∗
k|θ∗
−k, sy).
Example: θ∗
= (θ∗
1, θ∗
2, θ∗
3) then we propose θ∗∗
as
θ∗∗
1 ∼ q(θ∗
1|θ∗
2,3, sy), θ∗∗
2 ∼ q(θ∗
2|θ∗
1,3, sy), θ∗∗
3 ∼ q(θ∗
3|θ∗
1:2, sy),
and finally compose θ∗∗
= (θ∗∗
1 , θ∗∗
2 , θ∗∗
3 ). 18
43. Guided SMC-ABC
As usual we iteratively resample a θ∗
from the previous iteration.
How can we condition on both sy and θ∗
?
We cannot condition simultaneously on all coordinates of θ∗
: imagine
instead decomposing θ∗
= (θ∗
k, θ∗
−k) and now consider
θ∗
k
θ∗
−k
sy
(6)
We now place a multivariate Gaussian assumption on (6) so we can
produce a “perturbation kernel”
qt(θ∗
k|θ∗
−k, sy).
Example: θ∗
= (θ∗
1, θ∗
2, θ∗
3) then we propose θ∗∗
as
θ∗∗
1 ∼ q(θ∗
1|θ∗
2,3, sy), θ∗∗
2 ∼ q(θ∗
2|θ∗
1,3, sy), θ∗∗
3 ∼ q(θ∗
3|θ∗
1:2, sy),
and finally compose θ∗∗
= (θ∗∗
1 , θ∗∗
2 , θ∗∗
3 ). 18
44. Guided SMC-ABC
As usual we iteratively resample a θ∗
from the previous iteration.
How can we condition on both sy and θ∗
?
We cannot condition simultaneously on all coordinates of θ∗
: imagine
instead decomposing θ∗
= (θ∗
k, θ∗
−k) and now consider
θ∗
k
θ∗
−k
sy
(6)
We now place a multivariate Gaussian assumption on (6) so we can
produce a “perturbation kernel”
qt(θ∗
k|θ∗
−k, sy).
Example: θ∗
= (θ∗
1, θ∗
2, θ∗
3) then we propose θ∗∗
as
θ∗∗
1 ∼ q(θ∗
1|θ∗
2,3, sy), θ∗∗
2 ∼ q(θ∗
2|θ∗
1,3, sy), θ∗∗
3 ∼ q(θ∗
3|θ∗
1:2, sy),
and finally compose θ∗∗
= (θ∗∗
1 , θ∗∗
2 , θ∗∗
3 ). 18
45. Guided SMC-ABC
It is easy to show that we can write the following guided sampler (this
is for the case where k is a single index, but can be generalized)
θ∗∗
k ∼ N(m̂∗
k|sy,t−1, σ̂2
k|sy,t−1)
m̂∗
k|sy,t−1 = m̂k + Ŝk,−k(Ŝ−k,−k)−1
θ∗
−k
sy
#
−
m̂−k
m̂s
#
,
σ̂2
k|sy,t−1 = σ̂2
k − Ŝk,−k(Ŝ−k,−k)−1
Ŝ−k,k,
where all quantities are computed using accepted particles from the
previous iteration.
We call this fullcond guided SMC-ABC sampler (“fully conditional”).
19
46. A few important remarks
1. Besides Gaussian proposals we also construct copula-proposals
(Gaussian copulas and t-copulas), with many possible choice of
marginal distributions. This is discussed in the paper.
2. Experiments suggest that the Gaussianity assumption on the joint
(θ, sy) works very well even with highly non Gaussian posterior
targets.
To allow for more flexibility, we considered (Gaussian and t)
copulas with different marginals.
Notice guided Gaussian copulas are also in Y. Chen, M. Gutmann, Adaptive
Gaussian Copula ABC, AISTATS, 2019.
We go through a couple of examples now.
20
47. A few important remarks
1. Besides Gaussian proposals we also construct copula-proposals
(Gaussian copulas and t-copulas), with many possible choice of
marginal distributions. This is discussed in the paper.
2. Experiments suggest that the Gaussianity assumption on the joint
(θ, sy) works very well even with highly non Gaussian posterior
targets.
To allow for more flexibility, we considered (Gaussian and t)
copulas with different marginals.
Notice guided Gaussian copulas are also in Y. Chen, M. Gutmann, Adaptive
Gaussian Copula ABC, AISTATS, 2019.
We go through a couple of examples now.
20
48. A few important remarks
1. Besides Gaussian proposals we also construct copula-proposals
(Gaussian copulas and t-copulas), with many possible choice of
marginal distributions. This is discussed in the paper.
2. Experiments suggest that the Gaussianity assumption on the joint
(θ, sy) works very well even with highly non Gaussian posterior
targets.
To allow for more flexibility, we considered (Gaussian and t)
copulas with different marginals.
Notice guided Gaussian copulas are also in Y. Chen, M. Gutmann, Adaptive
Gaussian Copula ABC, AISTATS, 2019.
We go through a couple of examples now.
20
49. A few important remarks
1. Besides Gaussian proposals we also construct copula-proposals
(Gaussian copulas and t-copulas), with many possible choice of
marginal distributions. This is discussed in the paper.
2. Experiments suggest that the Gaussianity assumption on the joint
(θ, sy) works very well even with highly non Gaussian posterior
targets.
To allow for more flexibility, we considered (Gaussian and t)
copulas with different marginals.
Notice guided Gaussian copulas are also in Y. Chen, M. Gutmann, Adaptive
Gaussian Copula ABC, AISTATS, 2019.
We go through a couple of examples now.
20
50. We compare against the following SMC-ABC samplers:
ˆ standard: Gaussian sampler N(θ∗
, 2 · Covt−1) (as in Filippi et
al. 20132
, by generalizing Beaumont et al);
ˆ olcm: Gaussian sampler with the optimal local covariance
matrix of Filippi et al. 2013
2
Filippi, Barnes, Cornebise, and Stumpf. Statistical Applications in
Genetics and Molecular Biology, 12(1):87–107, 2013.
21
51. We compare against the following SMC-ABC samplers:
ˆ standard: Gaussian sampler N(θ∗
, 2 · Covt−1) (as in Filippi et
al. 20132
, by generalizing Beaumont et al);
ˆ olcm: Gaussian sampler with the optimal local covariance
matrix of Filippi et al. 2013
2
Filippi, Barnes, Cornebise, and Stumpf. Statistical Applications in
Genetics and Molecular Biology, 12(1):87–107, 2013.
21
52. Example: Twisted prior with highly correlated posterior
* Observed data y = (y1, ..., ydθ
) ∼ N(θ, Ψ), with
θ = (θ1, ..., θdθ
), Ψ = diag(σ0, ..., σ0)).
* The prior is the twisted-normal prior with density function prop. to
π(θ) ∝ exp
−
θ2
1
200
−
(θ2 − bθ2
1 + 100b)2
2
−
dθ
X
j=3
θ2
j
.
* Here: σ0 = 1, b = 0.1, dθ = 5, y = (10, 0, 0, 0, 0).
As before, S(y) = y.
How are our approaches coping with highly correlated parameters in
the posterior?
22
53. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
54. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
55. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
56. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
57. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
58. Illustration from Nott et al. (2018), High-dimensional ABC.
In this case study we know a-priori that (θ1, θ2) are highly correlated.
We make use of this information in our fully-conditional SMC-ABC.
So we sample a θ∗
= (θ∗
1, ..., θ∗
5) from the previous population and,
perturb it as
ˆ (θ∗∗
1 , θ∗∗
2 ) ∼ q(θ1, θ2|sy, θ∗
3:5) (in block since they are correlated);
ˆ θ∗∗
3 ∼ q(θ3|sy, θ∗
1:2,4,5)
ˆ θ∗∗
4 ∼ q(θ4|sy, θ∗
1:2, θ∗
3,5);
ˆ θ∗∗
5 ∼ q(θ5|sy, θ∗
1:4); 23
60. Example 2: Hierarchical g-and-k model
Setup from Clarté et al. (2021) ”Componentwise approximate Bayesian
computation via Gibbs-like steps”.
ˆ xi1, . . . , xiJ iid from from gk model with (Ai, B, g, k, c)
parameters with B, g, k, c known constants.
ˆ Ai ∼ N(α, 1), θ = (α, A1, . . . , An).
α
A1 A2 An
· · ·
x11
.
.
.
x1J
x21
.
.
.
x2J
xn1
.
.
.
xnJ
· · ·
ˆ 21-dim. parameter θ = (α, A1, . . . , A20).
ˆ s(xi) = quant(xi; l = 8)(l = 0, . . . , 8) ⇒ 180 summaries.
25
61. We run guided and non-guided methods with N = 104 particles
and compare them with ABC-Gibbs (Clarté et al.,2021).
How are our approaches coping with high dimensional
parameter and summaries spaces?
A few numbers for one run:
ˆ (non-guided) standard ABC-SMC: 14.5M simulations
to reach ϵ = 2.14 in more than 42 hours.
ˆ (non-guided) olcm ABC-SMC: 16M simulations to
reach ϵ = 0.70 in 55 hrs.
ˆ guided-SIS-ABC: 1M simulations to go below ϵ = 0.60
in 2.7 hours!
26
63. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
64. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
65. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
66. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
67. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
68. Take-home message:
ˆ We propose guided proposals samplers making use of data
information.
ˆ These are easy to construct and incorporate into existing
packages (any help for ABCpy or pyABC?).
ˆ No substantial computational overhead when computing
our guided methods compared to not using them.
ˆ We propose different Gaussian proposals, but also
Gaussian-copulas and a t- copulas.
ˆ For copula-based samplers, the mean/covariance/marginals
are not learned via deep-learning unlike other authors.
ˆ We challenged our methods with: highly-correlated
posteriors; multimodal posteriors; models with up to 20
parameters and up to 400 summary statistics.
28
69. NORDSTAT 2023
See you in Gothenburg 19-22 June, for 3.5 days of mathstats!
Abstracts submission deadline is March 30!
https://nordstat2023.org/ 7@nordstat2023
THANK YOU
7@uPicchini
29
70. NORDSTAT 2023
See you in Gothenburg 19-22 June, for 3.5 days of mathstats!
Abstracts submission deadline is March 30!
https://nordstat2023.org/ 7@nordstat2023
THANK YOU
7@uPicchini
29