Precomputation for SMC-ABC with undirected graphical models

Background Pre-computation Experimental Results Conclusion
Pre-processing for SMC-ABC
with undirected graphical models
Matt Moores1,2 Chris Drovandi1 Kerrie Mengersen1,2
Antonietta Mira3 Christian Robert4,5
1Mathematical Sciences School, Queensland University of Technology,
Brisbane, Australia
2Institute for Health and Biomedical Innovation, QUT Kelvin Grove
3Institute of Finance, Università della Svizzera italiana, Switzerland
4CEREMADE, Université Paris Dauphine, France
5Department of Statistics, University of Warwick, UK
ABC in Sydney
July 4, 2014

Outline
1 Background
Hidden Potts model
2 Pre-computation
3 Experimental Results
Simulation Study
Satellite Remote Sensing

Background
Image analysis often involves:
Large datasets, with millions of pixels
Multiple images with similar characteristics
For example: satellite remote sensing (Landsat), computed
tomography (CT)
Table: Scale of common types of images
Number Landsat CT slices
of pixels (90m2/px) (512×512)
26 0.06km2
. . .
56 14.06km2
0.1
106 900.00km2
3.8
156 10251.56km2
43.5

Motivation
Computational cost is dominated by simulation of pseudo-data
e.g. Hidden Potts model in image analysis
(Grelaud et al. 2009, Everitt 2012)
Model fitting with ABC can be separated into:
Learning about the summary statistic, given the parameter
f (s(w) | θ) π(θ)
Choosing parameter values, given a summary statistic
π (θ | δ(s(y), s(w)) )
For latent models, an additional step of learning about the
summary statistic, given the data: s(z) | y, θ
Grelaud, Robert, Marin, Rodolphe Taly (2009) Bayesian Analysis 4(2)
Everitt (2012) JCGS 21(4)

hidden Markov random field
Joint distribution of observed pixel intensities yi ∈ y
and latent labels zi ∈ z:
Pr(y, z|µ, σ2
, β) ∝ L(y|µ, σ2
, z)π(z|β) (1)
Additive Gaussian noise:
yi|zi =j
iid
∼ N µj, σ2
j

(2)
Potts model:
π(zi|zi∼`, β) =
exp {β
P
i∼` δ(zi, z`)}
Pk
j=1 exp {β
P
i∼` δ(j, z`)}
(3)
Potts (1952) Proceedings of the Cambridge Philosophical Society 48(1)

Inverse Temperature

Doubly-intractable likelihood
p(β|z) = C(β)−1
π(β) exp {β S(z)} (4)
The normalising constant of the Potts model has computational
complexity of O(n2kn), since it involves a sum over all possible
combinations of the labels z ∈ Z:
C(β) =
X
z∈Z
exp {β S(z)} (5)
S(z) is the sufficient statistic of the Potts model:
S(z) =
X
i∼`∈L
δ(zi, z`) (6)
where L is the set of all unique neighbour pairs.

Expectation of S(z)
exact expectation of S(z) for n=12 and k=
β
E
(
S
(
z
))
5
10
15
1 2 3 4
2
3
4
(a) n = 12 k ∈ 2, 3, 4
exact expectation of S(z) for k=3 and n=
β
E
(
S
(
z
))
5
10
15
1 2 3 4
4
6
9
12
(b) k = 3 n ∈ 4, 6, 9, 12
Figure: Distribution of Ez|β[S(z)]

Standard deviation of S(z)
exact standard deviation of S(z) for n=12 and k=
β
σ
(
S
(
z
))
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 2 3 4
2
3
4
(a) n = 12 k ∈ 2, 3, 4
exact standard deviation of S(z) for k=3 and n=
β
σ
(
S
(
z
))
0.0
0.5
1.0
1.5
2.0
2.5
1 2 3 4
4
6
9
12
(b) k = 3 n ∈ 4, 6, 9, 12
Figure: Distribution of σz|β[S(z)]

Pre-computation
The distribution of the summary statistics f(s(w)|θ) is
independent of the observed data y
By simulating pseudo-data for values of θ, we can create a
binding function φ(θ) for an auxiliary model fA(s(w)|φ(θ))
This binding function can be reused across multiple datasets,
amortising its computational cost
By replacing s(w) with pseudo summary statistics drawn from our
auxiliary model, we avoid the need to simulate pseudo-data during
model fitting.
Wood (2010) Nature 466
Cabras, Castellanos Ruli (2014) Metron (to appear)

Simulation from f(·|β)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
10
15
20
25
30
β
E
(
S
(
z
))
(a) Ez|β (S(w))
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
1
2
3
4
β
σ
(
S
(
z
))
(b) σz|β (S(w))
Figure: Approximation of S(w)|β using 1000 iterations of
Swendsen-Wang (discarding 500 as burn-in)
Swendsen Wang (1987) Physical Review Letters 58

Piecewise linear model
0.0 0.5 1.0 1.5 2.0 2.5 3.0
10000
15000
20000
25000
30000
β
E
S
(
z
)
(a) φ̂µ(β)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
50
100
150
200
250
300
350
β
σ
S
(
z
)
(b) φ̂σ(β)
Figure: Binding functions for S(w) | β with n = 56
, k = 3

Approximations (A2
BC?)
This method introduces additional layers of approximation error:
finite sampling points for β
(even more of a problem in higher dimensions)
piecewise linear model for the binding functions φ̂µ(β) and
φ̂σ(β)
Gaussian kernel for Ŝ(w)|β ∼ N

φ̂µ(β), φ̂σ(β)2

But it enables us to scale up ABC for real world problems, for
which it was previously infeasible

Scalable SMC-ABC for the hidden Potts model
Algorithm 1 SMC-ABC using precomputed fA(s(w)|φ(θ))
1: Draw N particles β0
i ∼ π0(β)
2: Draw N × M statistics Ŝ(wi,m) ∼ N

φ̂µ(β0
i), φ̂σ(β0
i)2

3: repeat
4: Update S(zt)|y, πt(β)
5: Adaptively select ABC tolerance t
6: Update importance weights ωi for each particle
7: if effective sample size (ESS) Nmin then
8: Resample particles according to their weights
9: end if
10: Update particles using random walk proposal
(with adaptive RWMH bandwidth σ2
t )
11: until
naccept
N 0.015 or t 10−9 or t ≥ 100

Additive Gaussian noise
SMC iteration
S
(
z
)
0 20 40 60 80 100
15000
15500
16000
16500
17000
17500
18000
Figure: Change in the value of S(z) according to the current distribution
of πt(β|z) and L(y|µ, σ2
, z)

Stopping criterion
Iteration
0 20 40 60 80 100
0
2000
4000
6000
ε
σ
(a) t
Iteration
proportion
accepted
0 20 40 60 80 100
0.0
0.2
0.4
0.6
0.8
1.0
(b) RWMH acceptance
Iteration
ESS
0 20 40 60 80 100
0
2000
4000
6000
8000
10000
(c) ESSt

Simulation Study
20 images, n = 125 × 125, k = 3:
β ∼ U(0, 1.005)
z ∼ f(·|β) using 2000 iterations of Swendsen-Wang
µj ∼ N 0, 1002

1
σ2
j
∼ Γ (1, 100)
Comparison of 2 ABC algorithms:
Scalable SMC-ABC using precomputed ˆ
f(β)
Standard SMC-ABC using 500 iterations of Gibbs sampling
Del Moral, Doucet, Jasra (2012) Stat. Comput. 22(5)

Accuracy of posterior estimates for β
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
β
posterior
distribution
(a) pseudo-data (M=50)
0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
β
posterior
distribution
(b) pre-computed (M=200)

Distribution of posterior sampling error for β
algorithm
error
−0.1
0.0
0.1
0.2
0.3
0.4
0.5
Pseudo−data Pre−computed

Improvement in runtime
0.5
1.0
2.0
5.0
10.0
20.0
50.0
100.0
algorithm
elapsed
time
(hours)
(a) elapsed (wall clock) time
5
10
20
50
100
200
500
algorithm
CPU
time
(hours)
(b) CPU time

Satellite image of southwest Brisbane
Figure: Normalised difference vegetation index (NDVI)

Auxiliary Model
0.0 0.5 1.0 1.5 2.0 2.5 3.0
500000
1000000
1500000
2000000
β
(a) φ̂µ(β)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0
1000
2000
3000
4000
5000
β
(b) φ̂σ(β)
Figure: Binding functions for n = 978380, k = 6

Results
Precomputation of fA(s(w)|φ(θ)) took 13h 23m
for 987 values of β
Model fitting took 1h 42m for 40 SMC iterations
In contrast:
Generating M = 50 pseudo-datasets per particle
takes 89 hours for a single iteration
(∴ 20 weeks for 40 iterations)

Posterior Density Estimate
β
Density
0
10
20
30
1.24 1.26 1.28 1.30

Summary
Scalability of SMC-ABC can be improved by pre-computing an
auxiliary model fA(s(w)|φ(θ))
Pre-computation took 1.4 hours on a 16 core Xeon server
for 987 values of β with 15,625 pixels
(13.4 hours for 978,380 pixels)
Average runtime for SMC-ABC improved from 71.4 hours to
39 minutes with 15,625 pixels
(1.7 hours for 978,380 pixels)
The binding functions represent the nonlinear, heteroskedastic
relationship between the parameter and the summary statistic.
This method could be extended to multivariate applications, such
as estimating both β and k for the hidden Potts model, or
estimating θ for an ERGM.

Appendix
For Further Reading I
Matt Moores, Chris Drovandi, Kerrie Mengersen Christian Robert
Pre-processing for approximate Bayesian computation in image analysis.
arXiv:1403.4359 [stat.CO], 2014.
Simon Wood
Statistical inference for noisy nonlinear ecological dynamic systems.
Nature, 466: 1102–04, 2010.
Stefano Cabras, Marı́a Eugenia Castellanos Erlis Ruli
A Quasi likelihood approximation of posterior distributions for
likelihood-intractable complex models.
To appear in Metron, 2014.
Christopher Drovandi, Anthony Pettitt Malcolm Faddy
Approximate Bayesian computation using indirect inference.
J. R. Stat. Soc. Ser. C 60(3): 317–37, 2011.
Richard Everitt
Bayesian Parameter Estimation for Latent Markov Random Fields and
Social Networks.
J. Comput. Graph. Stat., 21(4): 940–60, 2012.

Appendix
For Further Reading II
Christopher Drovandi, Anthony Pettitt Anthony Lee
Bayesian indirect inference using a parametric auxiliary model.
http://eprints.qut.edu.au/63767/3/63767.pdf
Pierre Del Moral, Arnaud Doucet Ajay Jasra
An adaptive sequential Monte Carlo method for approximate Bayesian
computation.
Statistics Computing, 22(5): 1009–20, 2012.
A. Grelaud, C. P. Robert, J.-M. Marin, F. Rodolphe J.-F. Taly
ABC likelihood-free methods for model choice in Gibbs random fields.
Bayesian Analysis, 4(2): 317–36, 2009.
Renfrey B. Potts
Some generalized order-disorder transformations.
Proc. Cambridge Philosophical Society, 48(1): 106–9, 1952.
R. H. Swendsen J.-S. Wang
Nonuniversal critical dynamics in Monte Carlo simulations.
Physical Review Letters, 58: 86–8, 1987.

Precomputation for SMC-ABC with undirected graphical models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Precomputation for SMC-ABC with undirected graphical models

Similar to Precomputation for SMC-ABC with undirected graphical models (20)

More from Matt Moores

More from Matt Moores (9)

Recently uploaded

Recently uploaded (20)

Precomputation for SMC-ABC with undirected graphical models