Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model

Bayesian Indirect Likelihood pBII Methods Examples References
Bayesian Indirect Inference using a Parametric
Auxiliary Model
Dr Chris Drovandi
Queensland University of Technology, Australia
c.drovandi@qut.edu.au
Collaborators: Tony Pettitt and Anthony Lee
July 25, 2014
Chris Drovandi QUT Seminar 2014

Bayesian Indirect Likelihood
Bayesian inference is based on posterior distribution
p(θ|Ý ) ∝ p(Ý |θ)p(θ).
Thus Bayesian inference requires the likelihood function
p(Ý |θ)
Many models have an intractable likelihood function
Replace with some alternative tractable ‘likelihood’
˜p(θ|Ý ) ∝ ˜p(Ý |θ)p(θ).
Referred to as Bayesian Indirect Likelihood (BIL)

BIL Methods
Composite Likelihood (e.g. Pauli et al 2011)
Emulation (e.g. Gaussian process, Wilkinson 2014)
parametric BIL (uses the likelihood of an alternative
parametric model, e.g. Gallant and McCullagh 2009)
non-parametric BIL (traditional ABC method recovered, e.g.
Beaumont et al 2002)

pBII methods
pBII - Bayesian Indirect Inference methods that uses a
‘p’arametric auxiliary model in some way
Requires parametric model with parameter φ with tractable
likelihood, pA(y|φ)
Does not have to be an obvious connection between θ and φ
Assume throughout that dim(φ) ≥ dim(θ)
Two classes of pBII methods:
Use auxiliary model to form summary statistics for ABC (ABC
II)
Use auxiliary model to form replacement likelihood, pBIL

ABC II (Intro to ABC)
ABC assumes that simulation from model is straightforward,
Ü ∼ p(Ý |θ)
Compares observed and simulated data on the basis of
summary statistics, s(·)
Based on the following target distribution
pǫ(θ|Ý ) ∝ p(θ)pǫ(Ý |θ),
where ǫ is ABC tolerance and
pǫ(Ý |θ) =
Ü
p(Ü|θ)Kǫ(||s(Ý ) − s(Ü )||)dx,
where K is a kernel weighting function. This can be estimated
unbiasedly and this is enough (Andrieu and Roberts 2009)
If s(·) is suﬃcient and ǫ → 0 then pǫ(θ|Ý ) ≡ p(θ|Ý ) (does not
happen in practice)
Thus two sources of error

ABC II (Intro to ABC Cont...)
ABC has connections with kernel density estimation (Blum,
2009)
Choice of summary statistics involve trade-oﬀ between
dimensionality and information loss
Non-parametric aspect wants summary as low-dimensional as
possible
But decreasing dimension means information loss
Choice of summary statistics most crucial to ABC
approximation

ABC II Methods
In some applications can propose an alternative parametric
auxiliary model
Use auxiliary model to form summary statistic (ABC II)
Auxiliary parameter estimate as summary statistic (ABC IP
and ABC IL)
Score of auxiliary model as summary statistic (ABC IS)

ABC IP (Drovandi et al 2011)
Parameter estimates of auxiliary model are summary statistics
Simulate data from generative model Ü ∼ p(·|θ)
Estimate auxiliary parameter
φ(θ, Ü) = arg max
φ
pA(Ü|φ).
Compare with φ(Ý )
ρ(s(Ü ), s(Ý )) = (φ(Ü) − φ(Ý ))T Á (φ(Ý ))(φ(Ü ) − φ(Ý )).
Eﬃcient weighting of summary statistics using observed Fisher
information Á (φ(Ý ))

ABC IL (Gleim and Pigorsch 2013)
Same as ABC IP but uses likelihood in the discrepancy
function
Parameter estimates of auxiliary model are summary statistics
φ(θ, Ü) = arg max
φ
pA(Ü|φ).
Compare with φ(Ý )
ρ(s(Ü ), s(Ý )) = log pA(Ý |φ(Ý )) − log pA(Ý |φ(Ü)).

ABC IS (Gleim and Pigorsch 2013)
Uses scores of auxiliary model as summary statistics
Ë (Ý , φ) =
∂ log pA(Ý |φ)
∂φ1
, · · · ,
∂ log pA(Ý |φ)
∂φdim(φ)
T
,
Evaluate scores of auxiliary model based on simulated data
and φ(Ý ), Ë (Ü, φ(Ý ))
Note that Ë (Ý , φ(Ý )) = 0. Uses following discrepancy
function
ρ(s(Ü ), s(Ý )) = Ë (Ü, φ(Ý ))T Á (φ(Ý ))−1Ë (Ü , φ(Ý )).
Very fast when scores are analytic (no ﬁtting of auxiliary
model)

ABC II Assumptions
Assumption (ABC IP Assumptions)
The estimator of the auxiliary parameter, φ(θ, Ü ), is unique for all
θ with positive prior support.
Assumption (ABC IL Assumptions)
The auxiliary likelihood evaluated at the auxiliary estimate,
pA(Ý |φ(Ü , θ)), is unique for all θ with positive prior support.
Assumption (ABC IS Assumptions)
The MLE of the auxiliary model fitted to the observed data, φ(Ý ),
is an interior point of the parameter space of φ and Â(φ(Ý )) is
positive definite. The log-likelihood of the auxiliary model,
log pA(·|φ), is differentiable and the score, ËA(Ü, φ(Ý )), is unique
for any Ü that may be drawn according to any θ that has positive
prior support.

ABC II vs Traditional ABC for Summary Statistics
Advantages
Can check if auxiliary model ﬁts data well
Natural ABC discrepancy functions
Control dimensionality of summary statistics via parsimonious
auxiliary model (e.g. compare auxiliary models via AIC)
Disadvantages
ABC II can be expensive
Restricted to applications where auxiliary model can be
proposed

pdBIL (Gallant and McCullogh 2009, Reeves and Pettitt
2005)
Replaces true likelihood with auxiliary likelihood (full data
level)
Simulate n independent datasets from generative model
Ü1:n
iid
∼ p(·|θ)
φ(θ, Ü1:n) = arg max
φ
pA(Ü1:n|φ).
Provides estimate of mapping φ(θ)
Evaluate auxiliary likelihood pA(Ý |φ(θ, Ü1:n))
Not ABC. No summary statistics or ABC tolerance
Theoretically behaves very diﬀerent to ABC II

pdBIL (Cont...)
Ultimate target of pdBIL method
pA(θ|Ý ) ∝ pA(Ý |φ(θ))p(θ).
Unfortunately binding function unknown, estimate via n iid
simulations from generative model, φθ,n = φ(θ, Ü 1:n)
Target distribution of the method becomes
pA,n(θ|Ý ) ∝ pA,n(Ý |θ)p(θ),
where
pA,n(Ý |θ) =
Ü1:n
pA(Ý |φθ,n)p(Ü1:n|θ)dÜ 1:n
pA,n(Ý |θ) can be estimated unbiasedly, which is enough
(Andrieu and Roberts (2009))

pdBIL (Cont...)
Comparing pA(θ|Ý ) with p(θ|Ý )
Under suitable conditions pdBIL is exact as n → ∞ if
generative model special case of auxiliary model
Suggests in this context auxiliary model needs to be ﬂexible to
mimic behaviour of generative model
Empirical evidence suggests for good approximation auxiliary
likelihood must do well in non-negligible posterior regions

Synthetic Likeihood as a psBIL Method
Reduces data to set of summary statistics. Target distribution:
p(θ|s(Ý )) ∝ p(s(Ý )|θ)p(θ).
But p(s(Ý )|θ) is unknown
Synthetic likelihood approach Wood (2010) assumes a
parametric model, pA(s(Ý )|φ(θ)). Thus a pBIL method.
Wood takes pA to be multivariate normal N(µ(θ), Σ(θ))
For some θ, simulate n iid replicates,
Ü1:n = (s(Ü 1), . . . , s(Ü n)), iid sample from p(s(Ý )|θ). Fit
auxiliary model to this dataset:
µ(Ü 1:n, θ) =
1
n
n
i=1
s(Üi ),
Σ(Ü 1:n, θ) =
1
n
n
i=1
(s(Üi ) − µ(Ü1:n, θ))(s(Ü i ) − µ(Ü1:n, θ))T
,

ABC as BIL with Non-Parametric Auxiliary Model (npBIL)
ABC is recovered as a BIL method by selecting a
non-parametric auxiliary likelihood
Full data (npdBIL): Choose φ(θ, Ü 1:n) = Ü1:n
pA(Ý |φ(θ, Ü 1:n)) = pA(Ý |Ü 1:n) =
1
n
n
i=1
Kǫ(ρ(Ý , Ü i )),
Data reduced to summary statistic (npsBIL):
1
n
n
i=1
Kǫ(ρ(×(Ý ), ×(Ü i ))).
ABC II is a special npsBIL method where summary statistics
come from a parametric auxiliary model

pBII Methods
Figure: pBII Methods

BIL
pBIL npBIL=ABC
npdBIL npsBIL
ABC II
ABC IP ABC IL ABC IS
pdBIL psBIL
Key:
BIL - ’B’ayesian ’I’ndirect ’L’ikelihood
pBIL - ’p’arametric BIL
pdBIL - pBIL based on full ’d’ata
psBIL - pBIL based on ’s’ummary statistic
ABC - ’a’pproximate ’B’ayesian ’c’omputation
npBIL - ’n’on-’p’arametric BIL (equivalent to ABC)
npdBIL - npBIL based on full ’d’ata
npsBIL - npBIL based on ’s’ummary statistic
ABC II - ABC ’I’ndirect ’I’nference
ABC IP - ABC ’I’ndirect ’P’arameter
ABC IL - ABC ’I’ndirect ’L’ikelihood
ABC IS - ABC ’I’ndirect ’S’core
Figure: BIL framework

Comparison of pdBIL and ABC II
Theoretical Comarison
pdBIL exact when true model contained within auxiliary model
(as n → ∞). Suggests to choose flexible auxiliary model
Even when true model is special case, ABC II statistics not
sufficient in general
When ABC II have sufficient statistics, pdBIL not exact in
general, even when n → ∞
Some Toy Examples Later...
Choice of Auxiliary Model
ABC II requires good summarisation of data (independent of
specification of generative model)
Auxiliary model for pdBIL does not necessarily have to fit data
well (if generative model mis-specified). Requires flexibility
wrt chosen generative model
If generative model (approx) correct and fits data well then
same auxiliary model may be chosen

Sampling from ABC II Target - MCMC ABC
Algorithm 1 MCMC ABC algorithm of Marjoram et al 2003.
1: Set θ0
2: for i = 1 to T do
3: Draw θ∗ ∼ q(·|θi−1)
4: Simulate x∗ ∼ p(·|θ∗)
5: Compute r = p(θ∗)q(θi−1|θ∗)
p(θi−1)q(θ∗|θi−1)
1(ρ(s(x∗), s(y)) ≤ ǫ)
6: if uniform(0, 1) < r then
7: θi = θ∗
8: else
9: θi = θi−1
10: end if
11: end for

Sampling from pdBIL Target - MCMC pdBIL
Find increase in acceptance probability with increase in n
Algorithm 1 MCMC BIL algorithm (see also Gallant and McCullogh 2009).
1: Set θ0
2: Simulate x∗
n ∼ p(·|θ0)
3: Compute φ0 = arg maxφ pA(x∗
n|φ)
4: for i = 1 to T do
5: Draw θ∗ ∼ q(·|θi−1)
6: Simulate x∗
n ∼ p(·|θ∗)
7: Compute ˆφ(x∗
n) = arg maxφ pA(x∗
n|φ)
8: Compute r = pA(y| ˆφ(x∗
n))π(θ∗)q(θi−1|θ∗)
pA(y|φi−1)π(θi−1)q(θ∗|θi−1)
9: if uniform(0, 1) < r then
10: θi = θ∗
11: φi = ˆφ(x∗
n)
12: else
13: θi = θi−1
14: φi = φi−1
15: end if
16: end for

Toy Example 1 (Drovandi and Pettitt 2013)
True model Poisson(λ) and auxiliary model Normal(µ, τ)
ABC II ‘gets lucky’ as ˆµ = ¯y, suﬃcient for λ
pdBIL as n → ∞ approximates Poisson(λ) likelihood with
Normal(λ, λ) likelihood. Not exact.
28 30 32
0
0.2
0.4
0.6
0.8
n = 1
n = 10
n = ∞
true
20 30 40 50
−800
−600
−400
λ
log−likelihood
true
auxiliary
Acceptance probabilities: n = 1 46%, n = 10 67%, n = 100 72%
and n = 1000 73% for increasing n

Toy Example 1
Results for pdBIL when auxiliary model mis-speciﬁed
Normal(µ, 9) (underdispersed) Normal(µ, 49) (overdispersed)
ABC II will still work by chance
28 30 32
0
0.2
0.4
0.6
0.8
1
true
τ unspecified
τ = 16
τ = 49
29 30 31 32
−330
−320
−310
λ
log−likelihood true
τ unspecified
τ=16
τ=49

Toy Example 2
True model t-distribution(µ, σ, ν = 1) and auxiliary model
t-distribution(µ, σ, ν)
Here pdBIL exact as n → ∞ since true is special case of
auxiliary
ABC II do not produce suﬃcient statistics (full set of order
statistics are minimal suﬃcient)
−1.5 −1 −0.5 0 0.5 1 1.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
true
n=1
n=10
n=100
0.5 1 1.5 2 2.5
0
0.5
1
1.5
2
2.5
true
n=1
n=10
n=100

Toy Example 2 cont
−1 −0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
true
ABC IP
ABC IL
ABC IS
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6
0
0.5
1
1.5
2
2.5
true
ABC IP
ABC IL
ABC IS

Quantile Distribution Example
Model of Interest: g-and-k quantile distribution
Deﬁned in terms of its quantile function:
Q(z(p); θ) = a + b 1 + c
1 − exp(−gz(p))
1 + exp(−gz(p))
(1 + z(p)2
)k
z(p).
(1)
p - quantile, z(p) - standard normal quantile, θ = (a, b, g, k),
c = 0.8 (see Rayner and MacGillivray 2003).
Numerical likelihood evaluation possible
Simulation easier via inversion method
Data consists of 10000 independent draws with a = 3, b = 1,
g = 2 and k = 0.5

Quantile Distribution Example (Cont...)
Auxiliary model is a 3-component normal mixture model
Flexible and ﬁts data well
But breaks assumption of ABC IP (parameter estimates not
unique)
0 5 10 15 20
0
0.2
0.4
0.6
0.8
y
density
estimated density
3 component mixture density

pdBIL Results
2.95 3 3.05
0
10
20
30
n=1
n=10
n=50
posterior
(a) a
0.9 1 1.1
0
5
10
15
(b) b
1.9 2 2.1 2.2
0
5
10
(c) g
0.45 0.5 0.55
0
10
20
30
(d) k
Acc Prob: 2.8% for n = 1, 13.1% for n = 10 and 20.8% for n = 50

ABC II (no regression adjustment)
2.95 3 3.05
0
10
20
30
posterior
ABC IP
ABC IL
ABC IS
(e) a
0.9 1 1.1
0
5
10
15
(f) b
1.9 2 2.1 2.2
0
5
10
(g) g
0.45 0.5 0.55
0
10
20
30
(h) k

Compare all (note ABC has regression adjustment)
2.95 3 3.05
0
10
20
30
ABC
ABC IS
ABC IL
ABC IP
pdBIL n=50
posterior
(i) a
0.9 1 1.1
0
5
10
15
(j) b
1.9 2 2.1 2.2
0
5
10
(k) g
0.45 0.5 0.55
0
10
20
30
(l) k

4 Component Auxiliary Mixture Model
2.95 3 3.05
0
10
20
30
40
ABC
ABC IS
ABC IL
ABC IP
pdBIL n=50
posterior
(m) a
0.9 1 1.1
0
5
10
15
(n) b
1.9 2 2.1 2.2
0
5
10
(o) g
0.45 0.5 0.55
0
10
20
30
(p) k

Macroparasite Immunity Example
Estimate parameters of a Markov process model explaining
macroparasite population development with host immunity
212 hosts (cats) i = 1, . . . , 212. Each cat injected with li
juvenile Brugia pahangi larvae (approximately 100 or 200).
At time ti host is sacriﬁced and the number of matures are
recorded
Host assumed to develop an immunity
Three variable problem: M(t) matures, L(t) juveniles, I(t)
immunity.
Only L(0) and M(ti ) is observed for each host
No tractable likelihood

0 200 400 600 800 1000 1200
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
ProportonofMatures

Trivariate Markov Process of Riley et al (2003)
M(t) L(t)
I(t)
Mature Parasites Juvenile Parasites
Immunity
Maturation
Gain of immunity Loss of immunity
Deathduetoimmunity
Naturaldeath
Naturaldeath
Invisible
Invisible
Invisible
γL(t)
νL(t) µI I(t)
βI(t)L(t)
µLL(t)
µMM(t)

Auxiliary Beta-Binomial model
The data show too much variation for Binomial
A Beta-Binomial model has an extra parameter to capture
dispersion
p(mi |αi , βi ) =
li
mi
B(mi + αi , li − mi + βi )
B(αi , βi )
,
Useful reparameterisation pi = αi /(αi + βi ) and
θi = 1/(αi + βi )
Relate the proportion and over dispersion parameters to time,
ti , and initial larvae, li , covariates
logit(pi ) = β0 + β1 log(ti ) + β2(log(ti ))2
,
log(θi ) =
η100, if li ≈ 100
η200, if li ≈ 200
,
Five parameters φ = (β0, β1, β2, η100, η200)

pdBIL Results
0 0.5 1 1.5 2 2.5
x 10
−3
0
500
1000
1500
n=1
n=20
n=50
(q) ν
0 1 2
0
0.2
0.4
0.6
0.8
(r) µI
0 0.01 0.02 0.03
0
50
100
150
(s) µL
0 1 2 3
0
0.2
0.4
0.6
0.8
(t) β

pBII results (ABC II no regression adjustment)
0 1 2 3
x 10
−3
0
500
1000
ABC IS
ABC IP
ABC IL
pdBIL n=50
(u) ν
0 1 2
0
0.2
0.4
0.6
0.8
(v) µI
−0.01 0 0.01 0.02 0.03 0.04
0
50
100
150
(w) µL
0 1 2
0
0.2
0.4
0.6
0.8
(x) β

pBII results (ABC has regression adjustment)
0 1 2
x 10
−3
0
500
1000
1500
2000
ABC IS REG
ABC IP REG
ABC IL REG
pdBIL n=50
(y) ν
0 0.005 0.01 0.015 0.02 0.025
0
50
100
150
(z) µL

Discussion
pdBIL very different to ABC II theoretically
pdBIL needs to have a good auxiliary model. ABC II might be
useful is auxiliary model mis-specified
ABC II more flexible; can incorporate other summary statistics

Key References
Drovandi et al. (2011). Approximate Bayesian Computation
using Indirect Inference. JRSS C.
Drovandi, Pettitt and Lee (2014). Bayesian Indirect Inference
using a Parametric Auxiliary Model.
http://eprints.qut.edu.au/63767/
Gleim and Pigorsch (2013). Approximate Bayesian
Computation with Indirect Summary Statistics.
Gallant and McCullogh (2009). On the determination of
general scientiﬁc models with application to asset pricing.
JASA.
Reeves and Pettitt (2005). A theoretical framework for
approximate Bayesian computation. 20th IWSM.

Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model

Similar to Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model (20)

Recently uploaded

Recently uploaded (20)

Dr Chris Drovandi (QUT) - Bayesian Indirect Inference Using a Parametric Auxiliary Model