A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

A likelihood-free version of the stochastic
approximation EM algorithm (SAEM) for
parameter estimation in complex models
Umberto Picchini
Centre for Mathematical Sciences,
Lund University
twitter: @uPicchini
umberto@maths.lth.se
18 October 2016, Department of Computer and Information Science,
Linköping University.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini

This presentation is based on the working paper:
P. (2016). Likelihood-free stochastic approximation EM for inference
in complex models, arXiv:1609.03508.

I will consider:
the problem of parameter inference with “complex models”, i.e.
models having an intractable likelihood.
the inference problem for “incomplete data”, in the sense given
by the seminal EM-paper [Dempster et al. 1977].
In two words, what I investigate is:
we have data Y arising from a generic model depending on the
unobservable X and parameter θ.
How do we estimate θ from Y, in presence of the latent X?

The presence of the latent (unobservable) X means that we deal with
an incomplete data problem.
The EM algorithm1 is the standard way to conduct
maximum-likelihood inference for θ in presence of incomplete data.
The complete data is the couple (Y, X), and the corresponding
complete likelihood is p(Y, X; θ).
The incomplete (data) likelihood is p(Y; θ).
We are interested in ﬁnding the MLE
ˆθ = arg max
θ∈Θ
p(Y; θ)
given observations Y = (Y1, ..., Yn).
1
Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data
via the EM algorithm. JRSS-B.

In the rest of this presentation I will discuss:
SAEM: a popular stochastic version of EM, for when EM is not
directly applicable.
Implementing SAEM is difﬁcult! And impossible for models
with intractable likelihoods. What to do?
A quick intro to Wood’s synthetic likelihoods (SL).
Our contribution embedding SL within SAEM.
Simulation studies.

EM in one slide
EM is a two steps procedure: E-step followed by the M-step.
Deﬁne
Q(θ|θ ) = log pY,X(Y, X; θ)pX|Y(X|Y; θ )dX ≡ EX|Y log pY,X(Y, X; θ).
At iteration k 1
E-step: compute Q(θ| ˆθ
(k−1)
);
M-step: obtain ˆθ
(k)
= arg maxθ∈Θ Q(θ| ˆθ
(k−1)
).
As k → ∞ the sequence { ˆθ
(k)
}k converges to a stationary point of the
data likelihood p(Y; θ) under weak assumptions.
Typically, E-step is hard while M-step is “easy”.

SAEM (stochastic approximation EM)
A more efﬁcient approximation to the E-step is is given by SAEM2
generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;
˜Q(θ| ˆθ
(k)
) =
(1 − γk) ˜Q(θ| ˆθ
(k−1)
) + γk
1
mk
mk
r=1 log pY,X(Y, xr; θ) .
With {γk} a decreasing sequence such that k γk = ∞, k γ2
k < ∞.
As k → ∞, it is not required for mk to increase, in fact it is possible to
take mk ≡ 1 for all k, however see next slide for convergence
properties.
2
Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic
approximation version of the EM algorithm. Annals of Statistics.

Beautiful things happen if you manage to write log p(Y, X) as a member of
the curved exponential family, e.g.
log p(Y, X; θ) = −Λ(θ) + Sc(Y, X), Γ(θ) . (1)
Here ... is the scalar product, Λ and Γ are two functions of θ and Sc(Y, X)
is the minimal sufﬁcient statistic of the complete model.
Then we only need to update the sufﬁcient statistics
sk = sk−1 + γk(Sc(Y, X(k)
) − sk−1).
Computing Sc(Y, X) for most non-trivial models is hard! But if you manage,
the M-step is often explicit:
θ
(k)
= arg max
θ∈Θ
(−Λ(θ) + sk, Γ(θ) )
Only for case (1) Delyon et al. (1999) prove convergence of the sequence
{θk}k to a stationary point of p(Y; θ) under weak conditions.

Some considerations
General problem with all EM-type algorithms: we assumed the ability to
simulate latent states from p(X|Y). This is often not trivial.
For state-space models, plenty of possibilities given by particle ﬁlters
(sequential Monte Carlo). In this case, the sampling issue is
“solvable”.
What to do outside of state-space models? What if the model has no
dynamic structure?
What if the model is so complex that we can’t write pY,X(Y, X) in
closed form?
Example, for SDE models the transition density of the underlying Markov
process is unknown.
Then we cannot write p(X0:n) = n
j=1 p(Xj|Xj−1), hence we cannot write
pY,X(Y0:n, X0:n) = p(Y0:n|X0:n)p(X0:n).

If we can’t write the complete likelihood certainly we cannot hope to
find the sufficient statistics Sc(·).
Specifically: it is impossible to apply SAEM for models having
intractable likelihoods, e.g. models for which we can’t write p(Y, X)
in closed form.
Likelihood-free methods use the ability to simulate from a model to
compensate for our ignorance about the underlying likelihood.

Say we formulate a statistical model p(Y; θ) such that n observations
are assumed Yj ∼ p(Y; θ), j = 1, .., n.
Suppose we do not know p(Y; ·), however
we do know how to implement a simulator to generate draws
from p(Y; ·).
Trivial example (but you get the idea)
y = x + ε, x ∼ px, ε ∼ N(0, σ2
ε)
simulate x∗
∼ px(X) [possible even when px unknown!]
simulate y∗
∼ N(x∗
, σ2
ε), then y∗
∼ py(Y|σε)
Therefore, in the following we consider the case where the only thing
we know is how to forward simulate from an assumed model.

Bayes: complex networks might not allow for trivial sampling (Gibbs-type),
i.e. when the conditional densities are unknown.
[Pic from Schadt et al. (2009) doi:10.1038/nrd2826]

The ability to simulate from a model even when we have no
knowledge of the analytic expressions of the underlying likelihood(s),
is central in likelihood-free methods for intractable likelihoods.
Several ways to deal with “intractable likelihoods”.
“Plug-and-play methods”: the only requirements is the ability to
simulate from the data-generating-model.
particle marginal methods (PMMH, PMCMC) based on SMC
ﬁlters [Andrieu et al. 2010].
(improved) Iterated ﬁltering [Ionides et al. 2015]
approximate Bayesian computation (ABC) [Marin et al. 2012].
Synthetic likelihoods [Wood 2010].
In the following I focus on Synthetic Likelihoods.

A nearly chaotic model
Two realizations from a Ricker model.
yt ∼ Poi(φNt)
Nt = r · Nt−1 · e−Nt−1
.
Small changes in r cause major departures from data.051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
log(r
2.5 3.0 3.5
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red).

The resulting likelihood can be difﬁcult to explore if algorithms are
badly initialized.
2.5 3.0 3.5 4.0
−15−10−5
Ricker
log(r)
271217
nt
−5 −4
−35−25−15−50
Pen
lo
−50
Varley
0−5
Mayna
Log−likelihood(103
)
Figure: The loglikelihood is in black.

A change of paradigm
from S. Wood, Nature 2010:
“Naive methods of statistical inference try to make the model
reproduce the exact course of the observed data in a way that the real
system itself would not do if repeated.”
“What is important is to identify a set of statistics that is sensitive to
the scientiﬁcally important and repeatable features of the data, but
insensitive to replicate-speciﬁc details of phase.”
In other words, with complex, stochastic and/or chaotic model we
could try to match features of the data, not the path of the data itself.
A similar approach is considered in ABC (approximate Bayesian
computation).

Synthetic likelihoods
y: observed data, from static or dynamic models
s(y): (vector of) summary statistics of data, e.g. mean,
autocorrelations, marginal quantiles etc.
assume
s(y) ∼ N(µθ, Σθ)
an assumption justiﬁable via second order Taylor expansion
(same as in Laplace approximations).
µθ and Σθ unknown: estimate them via simulations.

nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...
Figure: Schematic representation of the synthetic likelihoods procedure.

For ﬁxed θ simulate R artiﬁcial datasets y∗
1 , ..., y∗
R from your model and
compute corresponding (possibly vector valued) summaries s∗
1 , ..., s∗
R.
compute
ˆµθ =
1
R
R
r=1
s∗
r , ˆΣθ =
1
R − 1
R
r=1
(s∗
r − ˆµθ)(s∗
r − ˆµθ)
compute the statistics sobs for the observed data y.
evaluate a multivariate Gaussian likelihood at sobs
liksyn(θ) := N(sobs; ˆµθ, ˆΣθ) ∝ | ˆΣθ|−1/2
exp
−(sobs − ˆµθ) ˆΣ
−1
θ (sobs − ˆµθ)
2
This likelihood can be maximized for a varying θ or be plugged within
an MCMC algorithm targeting
ˆπ(θ|sobs) ∝ liksyn(θ)π(θ).

So the synthetic likelihood methodology assumes no speciﬁc
knowledge of the probabilistic features of the model.
Only assumes the ability to forward-generate from the model.
assumes that the analyst is able to specify “informative”
summaries.
assumes that said summaries are (approximately) Gaussian
s ∼ N(·).
Transform the summaries to be ≈ N is often not an issue (just as we
do in linear regression).
Of course the major issue (still open, also in ABC) is how to build
informative summaries. This is left unsolved.

I intend to use the synthetic likelihoods approach to enable
likelihood-free inference using SAEM.
This should allow SAEM to be applied to intractable likelihood
models.

We use synthetic likelihoods to construct a Gaussian approximation
over a set of complete summaries (S(Y), S(X)) to deﬁne a complete
synthetic loglikelihood.
the complete synthetic loglikelihood
log p(s; θ) = log N(s; µ(θ), Σ(θ)), (2)
with s = (S(Y), S(X))
In (2) µ(θ) and Σ(θ) are unknown but can be estimated using
synthetic likelihoods (SL), conditionally on θ.
However we need to obtain a maximizer for the (incomplete)
synthetic loglikelihood log p(S(Y); θ).

SAEM with synthetic likelihoods (SL)
For given θ SL returns estimates ˆµ(θ) and ˆΣ(θ) (sample mean and
sample covariance).
Crucial result
For a Gaussian likelihood ˆµ(θ) and ˆΣ(θ) are sufﬁcient statistics for
µ(θ) and Σ(θ). And a Gaussian is member of the exponential family.
Recall: what SAEM does is to update sufﬁcient statistics, perfect for
us!
At kth SAEM iteration:
ˆµ(k)
(θ) = ˆµ(k−1)
(θ) + γ(k)
( ˆµ(θ) − ˆµ(k−1)
(θ)) (3)
ˆΣ
(k)
(θ) = ˆΣ
(k−1)
(θ) + γ(k)
( ˆΣ(θ) − ˆΣ
(k−1)
(θ)). (4)

Updating the latent variable X
At kth iteration of SAEM we need to sample S(X(k))|S(Y). This is
trivial!.
We have
S(X(k)
)|S(Y) ∼ N( ˆµ
(k)
x|y (θ), ˆΣ
(k)
x|y (θ))
where
ˆµ
(k)
x|y = ˆµx + ˆΣxy
ˆΣ
−1
y (S(Y) − ˆµy)
ˆΣ
(k)
x|y = ˆΣx − ˆΣxy
ˆΣ
−1
y
ˆΣyx
where ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx are extracted from ( ˆµ(k)
, ˆΣ
(k)
).
That is ˆµ(k)
(θ) = ( ˆµx, ˆµy) and
ˆΣ
(k)
(θ) =
Σx Σxy
Σyx Σy
.

The M-step
Now that we have simulated a S(X(k)) (conditional on data) lets
produce the complete summaries at iteration k:
s(k)
:= (S(Y), S(X(k)
))
and maximize (M-step) the complete synthetic loglikelihood:
ˆθ
(k)
= arg max
θ∈Θ
log N(s(k)
; µ(θ), Σ(θ)) (5)
For each perturbation of θ the M-step performs a synthetic likelihood
simulation.
It returns the best found maximizer for (5) and corresponding best
( ˆµ, ˆΣ). Plug these in the updating moments equations (3)-(4).

The slide that follows describes a single iteration of SAEM-SL.

Input: observed summaries S(Y), positive integers L and R. Values for ˆθ
(k−1)
, ˆµ(k−1) and ˆΣ
(k−1)
.
Output: ˆθ
(k)
.
At iteration k:
1. Extract ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx from ˆµ(k−1) and ˆΣ
(k−1)
. Compute conditional moments ˆµx|y, ˆΣx|y.
2. Sample S(X(k−1))|S(Y) ∼ N( ˆµ
(k−1)
x|y
(θ), ˆΣ
(k−1)
x|y (θ)) and form s(k−1) := (S(Y), S(X(k−1))).
3. Obtain (θ(k), µ(k), Σ(k)) from InternalSL(s(k−1), ˆθ
(k−1)
, R) starting at ˆθ
(k−1)
.
4. Increase k := k + 1 and go to step 1.
Function InternalSL(s(k−1), θstart, R):
Input: s(k−1), starting parameters θstart, a positive integer R. Functions to compute simulated summaries S(y∗) and
S(x∗) must be available.
Output: the best found θ∗ maximizing log N(s(k); ˆµ, ˆΣ) and corresponding (µ∗, Σ∗).
Here θc denotes a generic candidate value.
i. Simulate x∗
r ∼ pX(X0:N ; θc), y∗
r ∼ pY|X(Y1:n|X1:n; θc) for r = 1, ..., R.
ii. Compute user-deﬁned summaries s∗
r = (S(y∗
r ), S(x∗
r )) for r = 1, ..., R. Construct the corresponding ( ˆµ, ˆΣ).
iii. Evaluate log N(s(k); ˆµ, ˆΣ).
Use a numerical procedure that performs (i)–(iii) L times to ﬁnd the best θ∗ maximizing log N(s(k); ˆµ, ˆΣ) for varying θc.
Denote with (µ∗, ˆΣ
∗
) the simulated moments corresponding to the best found θ∗. Set θ(k) := θ∗.
iv. Update moments:
ˆµ(k) = ˆµ(k−1) + γ(k)( ˆµ∗ − ˆµ(k−1))
ˆΣ
(k)
= ˆΣ
(k−1)
+ γ(k)( ˆΣ
∗
− ˆΣ
(k−1)
).
Return (θ(k), ˆµ(k), ˆΣ
(k)
).

We have now completed all the steps required to implement a
likelihood free version of SAEM.
Main inference problem: not clear how to construct a set of
informative (S(Y), S(X)) for θ. These are user-deﬁned, hence
arbitrary.
Main computational bottleneck: compared to the regular
SAEM, our M-step is a numerical optimization routine. We used
Nelder-Mead, which is rather slow.
Ideal case (typically unattainable)
If we have:
1 s = (S(Y), S(X)) is jointly sufﬁcient for θ and
2 s is multivariate Gaussian
then our likelihood free SAEM converges to a stationary point of
p(Y; θ) under the conditions given in Delyon et al 1999.

I have two examples to show:
a state-space model driven by an SDE: I compare SAEM-SL
with the regular SAEM and with direct optimzation of the
synthetic likelihood.
a simple Gaussian state-model: I compare SAEM-SML vs the
regular SAEM, iterated ﬁltering and particle marginal methods.
A “static model” example is available in my paper3.
3
P. 2016. Likelihood-free stochastic approximation EM for inference in complex
models, arXiv:1609.03508.

Example: a nonlinear Gaussian state-space model
We study a standard toy-model (e.g. Jasra et al. 20104).
Yj = Xj + σyνj, j 1
Xj = 2 sin(eXj−1 ) + σxτj,
with νj, τj ∼ N(0, 1) i.i.d. and X0 = 0.
θ = (σx, σy).
4
Jasra, Singh, Martin and McCoy, 2012. Filtering via approximate Bayesian
computation. Statistics and Computing.

We generate n = 50 observations from the model with
σx = σy = 2.23.
0 5 10 15 20 25 30 35 40 45 50
time
-10
-5
0
5
10
Y

the standard SAEM
Let’s set-up the “standard” SAEM. We need the complete likelihood
and sufﬁcient statistics.
Easy for this model.
p(Y, X) = p(Y|X)p(X) =
n
j=1
p(Yj|Xj)p(Xj|Xj−1)
Yj|Xj ∼ N(Xj, σ2
y)
Xj|Xj−1 ∼ N(2 sin(eXj−1
), σ2
x)
Sσ2
x
= n
j=1(Xj − 2 sin(eXj−1 ))2 and Sσ2
y
= n
j=1(Yj − Xj)2 are
sufﬁcient for σ2
x and σ2
y

Plug the sufﬁcient statistics in the complete (log)likelihood, and set to
zero the gradient w.r.t. (σ2
x, σ2
y).
Explicit M-step at kth iteration:
ˆσ2(k)
x = Sσ2
x
/n
ˆσ2(k)
y = Sσ2
y
/n
To run SAEM the only left thing needed is a way to sample X(k)|Y.
For this we use sequential Monte Carlo, e.g. the bootstrap ﬁlter (in
backup slides, if needed).
I skip this sampling step. Just know that this is easily accomplished
for state space models.

SAEM-SL: SAEM with synthetic likelihoods
To implement SAEM-SL no knowledge of the complete likelihood is
required, nor analytic derivation of the sufﬁcient statistics.
We just have to postulate some “reasonable” summaries for X and Y.
For each synthetic likelihood step, we simulate R = 500 realizations
of S(Xr) and S(Yr), containing:
the sample median of Xr, r = 1, ..., R;
the median absolute deviation of Xr;
the 10th, 20th, 75th and 90th percentile of Xr.
the sample median of Yr;
the median absolute deviation of Yr;
the 10th, 20th, 75th and 90th percentile of Yr.

Results with SAEM-SL on 30 different datasets
Starting parameter values are randomly initialised. Here R = 500.
0 10 20 30 40 50 60 70
σ
x
-2
0
2
4
6
8
10
12
14
16
18
20
0 10 20 30 40 50 60 70
σ
y
0
5
10
15
20
25
30
Figure: trace plots for SAEM-SL (σx, left; σy, right) for the thirty estimation
procedures. Horizontal lines are true parameter values.

(M, ¯M) (500,200) (1000,200) (1000,20)
σx (true value 2.23)
SAEM-SMC 2.54 [2.53,2.54] 2.55 [2.54,2.56] 1.99 [1.85,2.14]
IF2 1.26 [1.21,1.41] 1.35 [1.28,1.41] 1.35 [1.28,1.41]
σy (true value 2.23)
SAEM-SMC 0.11 [0.10,0.13] 0.06 [0.06,0.07] 1.23 [1.00,1.39]
IF2 1.62 [1.56,1.75] 1.64 [1.58,1.67] 1.64 [1.58,1.67]
Table: SAEM with bostrap ﬁlter using M particles; IF2=iterated ﬁltering.
R 500 1000
σx (true value 2.23)
SAEM-SL 1.67 [0.42,1.97] 1.51 [0.82,2.03]
σy (true value 2.23)
SAEM-SL 2.40 [2.01,2.63] 2.27 [1.57,2.57]
Table: SAEM with synthetic likelihoods. K = 60 iterations.

Example: state-space SDE model [P., 2016]
We consider a one-dimensional state-space model driven by a SDE.
Suppose we administer 4 mg of theophylline [Dose] to a subject.
Xt is the level of theophylline concentration in blood at time t (hrs).
Consider the following state-space model:



Yj = Xj + εj, εj ∼iid N(0, σ2
ε)
dXt = Dose·Ka·Ke
Cl e−Kat − KeXt dt + σ
√
XtdWt, t t0
Ke is the elimination rate constant
Ka is the absorption rate constant
Cl the clearance of the drug
σ the intensity of intrinsic stochastic noise.

We simulate a set of n = 30 observations from the model at
equispaced times.
But how to simulate from this model? No analytic solution for the
SDE is available.
We resort to the Euler-Maruyama discretization with a small stepsize
h = 0.05 on the time interval [0,30]:
Xt+h = Xt +
Dose · Ka · Ke
Cl
e−Kat
− KeXt h + (σ h · Xt)Zt+h,
{Zt} ∼iid N(0, h)
This implies a latent simulated process of length N:
X0:N = {X0, Xh, ..., XN}.

A typical relation of the process:
time (hrs)
0 5 10 15 20 25 30
0
2
4
6
8
10
12
14
Figure: data (circles) and the latent process (black line).

The classic SAEM
Applying the “standard” SAEM is not really trivial here.
The complete likelihood:
p(Y, X) = p(Y|X)p(X) =
n
j=1
p(Yj|Xj)
N
i=1
p(Xi|Xi−1)
Yj|Xj ∼ N(Xj, σ2
y)
Xi|Xi−1 ∼ not available.
Euler-Maruyama induces a Gaussian approximation:
p(xi|xi−1) ≈
1
σ
√
2πxi−1h
exp −
xi − xi−1 − (Dose·Ka·Ke
Cl e−Kaτi−1
− Kexi−1)h
2
2σ2xi−1h
.

The classic SAEM
I am not going to show how to obtain all the sufficient summary
statistics (see the paper).
Just trust me that it requires a bit of work.
And this is just a one-dimensional model!
We sample X(k)|Y using the bootstrap filter sequential Monte Carlo
method.
If you are not familiar with sequential Monte Carlo, worry not. Just
consider it a method returning a “best” filtered X(k) based on Y (for
linear Gaussian models you would use Kalman).

SAEM-SL with synthetic likelihoods
User-deﬁned summaries for a simulation r: (s(x∗
r ), s(y∗
r )).
s(x∗
r ) contains:
(i) the median values of X∗
0:N ;
(ii) the median absolute deviation of X∗
0:N,
(iii) a statistic for σ computed from X∗
0:N (see next slide).
(iv) ( j(Y∗
j − X∗
j )2/n)1/2.
s(y∗
r ) contains:
(i) the median value of y∗
r ;
(ii) its median absolute deviation;
(iii) the slope of the line connecting the ﬁrst and last simulated
observation (Y∗
n − Y∗
1 )/(tn − t1).

In Miao 2014: for an SDE of the type dXt = µ(Xt)dt + σg(Xt)dWt
with t ∈ [0, T], we have
Γ |Xi+1 − Xi|2
Γ g(Xi)(ti+1 − ti)
→ σ2
as |Γ| → 0
where the convergence is in probability and Γ a partition of [0, T].
We deduce that using the discretization {X0, X1, ..., XN} produced by
the Euler-Maruyama scheme, we can take the square root of the left
hand side in the limit above, which should be informative for σ.

100 different datasets are simulated from ground-truth parameters.
All optimizations start away from ground truth values.
SAEM-SL: at each iteration of the M-step simulates R = 500
summaries, with L = 10 Nelder-Mead iterations (M-step) and
K = 100 SAEM iterations.
0 20 40 60 80 100 120
Ke
0
0.05
0.1
0.15
0.2
0 20 40 60 80 100 120
Cl
0
0.05
0.1
0.15
0.2
0 20 40 60 80 100 120
σ
0
0.1
0.2
0.3
0 20 40 60 80 100 120
σǫ
0
0.2
0.4
0.6
0.8

SAEM-SMC using the bootstrap ﬁlter with M = 500 particles to
obtain a X(k)|Y.
Cl and σ are essentially unidentiﬁed.

Ke Cl σ σε
true values 0.050 0.040 0.100 0.319
SAEM-SMC 0.045 [0.042,0.049] 0.085 [0.078,0.094] 0.171 [0.158,0.184] 0.395 [0.329,0.465]
SAEM-SL 0.044 [0.038,0.051] 0.033 [0.028,0.039] 0.106 [0.083,0.132] 0.266 [0.209,0.307]
optim. SL 0.063 [0.054,0.069] 0.089 [0.068,0.110] 0.304 [0.249,0.370] 0.543 [0.485,0.625]
SAEM-SMC: uses M = 500 particles to ﬁlter X(k)|Y via SMC. Runs
for K = 300 SAEM iterations.
SAEM-SL at each iteration of the M-step simulates R = 500
summaries, with L = 10 Nelder-Mead iterations (M-step) and
K = 100 SAEM iterations.
“optim. SL” denotes the direct maximization of Wood’s synthetic
(incomplete) likelihood:
ˆθ = arg max
θ∈Θ
log N(S(Y); µ(θ), Σ(θ)). (6)

How about Gaussianity of the summaries?
Here we have qq-normal plots from the 7 postulated summaries at the
obtained optimum (500 simulations each).
-4 -2 0 2 4
sx
(1)
4
6
8
10
12
-4 -2 0 2 4
sx
(2)
1.5
2
2.5
3
3.5
-4 -2 0 2 4
sx
(3)
1.8
2
2.2
2.4
-4 -2 0 2 4
sx
(4)
0.1
0.2
0.3
0.4
-4 -2 0 2 4
sy
(1)
4
6
8
10
12
-4 -2 0 2 4
sy
(2)
1.5
2
2.5
3
3.5
-4 -2 0 2 4
sy
(3)
-0.4
-0.3
-0.2
-0.1
The summaries quantiles nicely follow the line (not visible) for the
perfect match with Gaussian quantiles.

Summary
We introduced SAEM-SL, a version of SAEM that is able to deal
with intractable likelihoods;
It only requires the formulation and simulation of “informative”
summaries s.
How to construct informative summaries automatically is a
difficult open problem.
if said user-defined summaries s are sufficient for θ (very
unlikely), and if s ∼ N(·) then SAEM-SL converges to the true
maximum likelihood estimates for p(Y|θ).
The method can be used for intractable models, or even just to
initialize starting values for more refined algorithms (e.g.
particle MCMC).

Key references
Andrieu et al. 2010. Particle Markov chain Monte Carlo methods.
JRSS-B.
Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic
approximation version of the EM algorithm. Annals of Statistics.
Dempster, Laird and Rubin, 1977. Maximum likelihood from
incomplete data via the EM algorithm. JRSS-B.
Ionides et al. 2015. Inference for dynamic and latent variable models
via iterated, perturbed Bayes maps. PNAS.
Marin et al. 2012. Approximate Bayesian computational methods.
Stat. Comput.
Picchini 2016. Likelihood-free stochastic approximation EM for
inference in complex models, arXiv:1609.03508.
Wood 2010. Statistical inference for noisy nonlinear ecological
dynamic systems. Nature.

Appendix

Justiﬁcation of Gaussianity (Wood 2010)
Assuming Gaussianity for summaries s(·) can be justiﬁed from a
standard Taylor expansion.
Say that fθ(s) is the true (unknown) joint density of s.
Expand fθ(s) around its mode µθ:
log fθ(s) ≈ log fθ(µθ) +
1
2
(s − µθ)
∂2 log fθ
∂s∂s
(s − µθ)
hence
fθ(s) ≈ const × exp −
1
2
(s − µθ) −
∂2 log fθ
∂s∂s
(s − µθ)
s ∼ N µθ, −
∂2 log fθ
∂s∂s
−1
, approximately when s ≈ µθ

Asymptotic properties for synthetic likelihoods (Wood
2010)
As the number of simulated statistics R → ∞
the maximizer ˆθ of liks(θ) is a consistent estimator.
ˆθ is an unbiased estimator.
ˆθ might not be in general Gaussian. It will be Gaussian if Σθ
depends weakly on θ or when d = dim(s) is large.

Algorithm 1 Bootstrap ﬁlter with M particles and threshold 1 ¯M
M. Resamples only when ESS < ¯M.
Step 0. Set j = 1: for m = 1, ..., M sample X
(m)
1 ∼ p(X0), compute weights
W
(m)
1 = f(Y1|X
(m)
1 ) and normalize weights w
(m)
1 := W
(m)
1 / M
m=1 W
(m)
1 .
Step 1.
if ESS({w
(m)
j }) < ¯M then
resample M particles {X
(m)
j , w
(m)
j } and set W
(m)
j = 1/M.
end if
Set j := j + 1 and if j = n + 1, stop and return all constructed weights
{W
(m)
j }m=1:M
j=1:n to sample a single path. Otherwise go to step 2.
Step 2. For m = 1, ..., M sample X
(m)
j ∼ p(·|X
(m)
j−1). Compute
W
(m)
j := w
(m)
j−1p(Yj|X
(m)
j )
normalize weights w
(m)
j := W
(m)
j / M
m=1 W
(m)
j and go to step 1.

A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Similar to A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models (20)

Recently uploaded

Recently uploaded (20)

A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models