SlideShare a Scribd company logo
1 of 54
Download to read offline
A likelihood-free version of the stochastic
approximation EM algorithm (SAEM) for
parameter estimation in complex models
Umberto Picchini
Centre for Mathematical Sciences,
Lund University
twitter: @uPicchini
umberto@maths.lth.se
18 October 2016, Department of Computer and Information Science,
Linköping University.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
This presentation is based on the working paper:
P. (2016). Likelihood-free stochastic approximation EM for inference
in complex models, arXiv:1609.03508.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
I will consider:
the problem of parameter inference with “complex models”, i.e.
models having an intractable likelihood.
the inference problem for “incomplete data”, in the sense given
by the seminal EM-paper [Dempster et al. 1977].
In two words, what I investigate is:
we have data Y arising from a generic model depending on the
unobservable X and parameter θ.
How do we estimate θ from Y, in presence of the latent X?
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The presence of the latent (unobservable) X means that we deal with
an incomplete data problem.
The EM algorithm1 is the standard way to conduct
maximum-likelihood inference for θ in presence of incomplete data.
The complete data is the couple (Y, X), and the corresponding
complete likelihood is p(Y, X; θ).
The incomplete (data) likelihood is p(Y; θ).
We are interested in finding the MLE
ˆθ = arg max
θ∈Θ
p(Y; θ)
given observations Y = (Y1, ..., Yn).
1
Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data
via the EM algorithm. JRSS-B.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
In the rest of this presentation I will discuss:
SAEM: a popular stochastic version of EM, for when EM is not
directly applicable.
Implementing SAEM is difficult! And impossible for models
with intractable likelihoods. What to do?
A quick intro to Wood’s synthetic likelihoods (SL).
Our contribution embedding SL within SAEM.
Simulation studies.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
EM in one slide
EM is a two steps procedure: E-step followed by the M-step.
Define
Q(θ|θ ) = log pY,X(Y, X; θ)pX|Y(X|Y; θ )dX ≡ EX|Y log pY,X(Y, X; θ).
At iteration k 1
E-step: compute Q(θ| ˆθ
(k−1)
);
M-step: obtain ˆθ
(k)
= arg maxθ∈Θ Q(θ| ˆθ
(k−1)
).
As k → ∞ the sequence { ˆθ
(k)
}k converges to a stationary point of the
data likelihood p(Y; θ) under weak assumptions.
Typically, E-step is hard while M-step is “easy”.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
How to get around the E-step
The E-step requires the evaluation of:
Q(θ|θ ) = log pY,X(Y, X; θ)pX|Y(X|Y; θ )dX.
This is hard, as pX|Y(X|Y; ·) is typically unknown.
MCEM [Wei-Tanner 1990]
Assume we are able to simulate draws from pX|Y(X|Y; ·) say mk times
→ Monte-Carlo approximation:
generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;
Q(θ|θ ) ≈ 1
mk
mk
r=1 log pY,X(Y, xr; θ).
Problem: mk needs to increase as k increases. Double asymptotic
problem!
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
SAEM (stochastic approximation EM)
A more efficient approximation to the E-step is is given by SAEM2
generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk;
˜Q(θ| ˆθ
(k)
) =
(1 − γk) ˜Q(θ| ˆθ
(k−1)
) + γk
1
mk
mk
r=1 log pY,X(Y, xr; θ) .
With {γk} a decreasing sequence such that k γk = ∞, k γ2
k < ∞.
As k → ∞, it is not required for mk to increase, in fact it is possible to
take mk ≡ 1 for all k, however see next slide for convergence
properties.
2
Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic
approximation version of the EM algorithm. Annals of Statistics.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Beautiful things happen if you manage to write log p(Y, X) as a member of
the curved exponential family, e.g.
log p(Y, X; θ) = −Λ(θ) + Sc(Y, X), Γ(θ) . (1)
Here ... is the scalar product, Λ and Γ are two functions of θ and Sc(Y, X)
is the minimal sufficient statistic of the complete model.
Then we only need to update the sufficient statistics
sk = sk−1 + γk(Sc(Y, X(k)
) − sk−1).
Computing Sc(Y, X) for most non-trivial models is hard! But if you manage,
the M-step is often explicit:
θ
(k)
= arg max
θ∈Θ
(−Λ(θ) + sk, Γ(θ) )
Only for case (1) Delyon et al. (1999) prove convergence of the sequence
{θk}k to a stationary point of p(Y; θ) under weak conditions.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Some considerations
General problem with all EM-type algorithms: we assumed the ability to
simulate latent states from p(X|Y). This is often not trivial.
For state-space models, plenty of possibilities given by particle filters
(sequential Monte Carlo). In this case, the sampling issue is
“solvable”.
What to do outside of state-space models? What if the model has no
dynamic structure?
What if the model is so complex that we can’t write pY,X(Y, X) in
closed form?
Example, for SDE models the transition density of the underlying Markov
process is unknown.
Then we cannot write p(X0:n) = n
j=1 p(Xj|Xj−1), hence we cannot write
pY,X(Y0:n, X0:n) = p(Y0:n|X0:n)p(X0:n).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
If we can’t write the complete likelihood certainly we cannot hope to
find the sufficient statistics Sc(·).
Specifically: it is impossible to apply SAEM for models having
intractable likelihoods, e.g. models for which we can’t write p(Y, X)
in closed form.
Likelihood-free methods use the ability to simulate from a model to
compensate for our ignorance about the underlying likelihood.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Say we formulate a statistical model p(Y; θ) such that n observations
are assumed Yj ∼ p(Y; θ), j = 1, .., n.
Suppose we do not know p(Y; ·), however
we do know how to implement a simulator to generate draws
from p(Y; ·).
Trivial example (but you get the idea)
y = x + ε, x ∼ px, ε ∼ N(0, σ2
ε)
simulate x∗
∼ px(X) [possible even when px unknown!]
simulate y∗
∼ N(x∗
, σ2
ε), then y∗
∼ py(Y|σε)
Therefore, in the following we consider the case where the only thing
we know is how to forward simulate from an assumed model.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Bayes: complex networks might not allow for trivial sampling (Gibbs-type),
i.e. when the conditional densities are unknown.
[Pic from Schadt et al. (2009) doi:10.1038/nrd2826]
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The ability to simulate from a model even when we have no
knowledge of the analytic expressions of the underlying likelihood(s),
is central in likelihood-free methods for intractable likelihoods.
Several ways to deal with “intractable likelihoods”.
“Plug-and-play methods”: the only requirements is the ability to
simulate from the data-generating-model.
particle marginal methods (PMMH, PMCMC) based on SMC
filters [Andrieu et al. 2010].
(improved) Iterated filtering [Ionides et al. 2015]
approximate Bayesian computation (ABC) [Marin et al. 2012].
Synthetic likelihoods [Wood 2010].
In the following I focus on Synthetic Likelihoods.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
A nearly chaotic model
Two realizations from a Ricker model.
yt ∼ Poi(φNt)
Nt = r · Nt−1 · e−Nt−1
.
Small changes in r cause major departures from data.051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
log(r
2.5 3.0 3.5
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The resulting likelihood can be difficult to explore if algorithms are
badly initialized.
2.5 3.0 3.5 4.0
−15−10−5
Ricker
log(r)
271217
nt
−5 −4
−35−25−15−50
Pen
lo
−50
Varley
0−5
Mayna
Log−likelihood(103
)
Figure: The loglikelihood is in black.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
A change of paradigm
from S. Wood, Nature 2010:
“Naive methods of statistical inference try to make the model
reproduce the exact course of the observed data in a way that the real
system itself would not do if repeated.”
“What is important is to identify a set of statistics that is sensitive to
the scientifically important and repeatable features of the data, but
insensitive to replicate-specific details of phase.”
In other words, with complex, stochastic and/or chaotic model we
could try to match features of the data, not the path of the data itself.
A similar approach is considered in ABC (approximate Bayesian
computation).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Synthetic likelihoods
y: observed data, from static or dynamic models
s(y): (vector of) summary statistics of data, e.g. mean,
autocorrelations, marginal quantiles etc.
assume
s(y) ∼ N(µθ, Σθ)
an assumption justifiable via second order Taylor expansion
(same as in Laplace approximations).
µθ and Σθ unknown: estimate them via simulations.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...
Figure: Schematic representation of the synthetic likelihoods procedure.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
For fixed θ simulate R artificial datasets y∗
1 , ..., y∗
R from your model and
compute corresponding (possibly vector valued) summaries s∗
1 , ..., s∗
R.
compute
ˆµθ =
1
R
R
r=1
s∗
r , ˆΣθ =
1
R − 1
R
r=1
(s∗
r − ˆµθ)(s∗
r − ˆµθ)
compute the statistics sobs for the observed data y.
evaluate a multivariate Gaussian likelihood at sobs
liksyn(θ) := N(sobs; ˆµθ, ˆΣθ) ∝ | ˆΣθ|−1/2
exp
−(sobs − ˆµθ) ˆΣ
−1
θ (sobs − ˆµθ)
2
This likelihood can be maximized for a varying θ or be plugged within
an MCMC algorithm targeting
ˆπ(θ|sobs) ∝ liksyn(θ)π(θ).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
So the synthetic likelihood methodology assumes no specific
knowledge of the probabilistic features of the model.
Only assumes the ability to forward-generate from the model.
assumes that the analyst is able to specify “informative”
summaries.
assumes that said summaries are (approximately) Gaussian
s ∼ N(·).
Transform the summaries to be ≈ N is often not an issue (just as we
do in linear regression).
Of course the major issue (still open, also in ABC) is how to build
informative summaries. This is left unsolved.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
I intend to use the synthetic likelihoods approach to enable
likelihood-free inference using SAEM.
This should allow SAEM to be applied to intractable likelihood
models.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
We use synthetic likelihoods to construct a Gaussian approximation
over a set of complete summaries (S(Y), S(X)) to define a complete
synthetic loglikelihood.
the complete synthetic loglikelihood
log p(s; θ) = log N(s; µ(θ), Σ(θ)), (2)
with s = (S(Y), S(X))
In (2) µ(θ) and Σ(θ) are unknown but can be estimated using
synthetic likelihoods (SL), conditionally on θ.
However we need to obtain a maximizer for the (incomplete)
synthetic loglikelihood log p(S(Y); θ).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
SAEM with synthetic likelihoods (SL)
For given θ SL returns estimates ˆµ(θ) and ˆΣ(θ) (sample mean and
sample covariance).
Crucial result
For a Gaussian likelihood ˆµ(θ) and ˆΣ(θ) are sufficient statistics for
µ(θ) and Σ(θ). And a Gaussian is member of the exponential family.
Recall: what SAEM does is to update sufficient statistics, perfect for
us!
At kth SAEM iteration:
ˆµ(k)
(θ) = ˆµ(k−1)
(θ) + γ(k)
( ˆµ(θ) − ˆµ(k−1)
(θ)) (3)
ˆΣ
(k)
(θ) = ˆΣ
(k−1)
(θ) + γ(k)
( ˆΣ(θ) − ˆΣ
(k−1)
(θ)). (4)
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Updating the latent variable X
At kth iteration of SAEM we need to sample S(X(k))|S(Y). This is
trivial!.
We have
S(X(k)
)|S(Y) ∼ N( ˆµ
(k)
x|y (θ), ˆΣ
(k)
x|y (θ))
where
ˆµ
(k)
x|y = ˆµx + ˆΣxy
ˆΣ
−1
y (S(Y) − ˆµy)
ˆΣ
(k)
x|y = ˆΣx − ˆΣxy
ˆΣ
−1
y
ˆΣyx
where ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx are extracted from ( ˆµ(k)
, ˆΣ
(k)
).
That is ˆµ(k)
(θ) = ( ˆµx, ˆµy) and
ˆΣ
(k)
(θ) =
Σx Σxy
Σyx Σy
.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The M-step
Now that we have simulated a S(X(k)) (conditional on data) lets
produce the complete summaries at iteration k:
s(k)
:= (S(Y), S(X(k)
))
and maximize (M-step) the complete synthetic loglikelihood:
ˆθ
(k)
= arg max
θ∈Θ
log N(s(k)
; µ(θ), Σ(θ)) (5)
For each perturbation of θ the M-step performs a synthetic likelihood
simulation.
It returns the best found maximizer for (5) and corresponding best
( ˆµ, ˆΣ). Plug these in the updating moments equations (3)-(4).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The slide that follows describes a single iteration of SAEM-SL.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Input: observed summaries S(Y), positive integers L and R. Values for ˆθ
(k−1)
, ˆµ(k−1) and ˆΣ
(k−1)
.
Output: ˆθ
(k)
.
At iteration k:
1. Extract ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx from ˆµ(k−1) and ˆΣ
(k−1)
. Compute conditional moments ˆµx|y, ˆΣx|y.
2. Sample S(X(k−1))|S(Y) ∼ N( ˆµ
(k−1)
x|y
(θ), ˆΣ
(k−1)
x|y (θ)) and form s(k−1) := (S(Y), S(X(k−1))).
3. Obtain (θ(k), µ(k), Σ(k)) from InternalSL(s(k−1), ˆθ
(k−1)
, R) starting at ˆθ
(k−1)
.
4. Increase k := k + 1 and go to step 1.
Function InternalSL(s(k−1), θstart, R):
Input: s(k−1), starting parameters θstart, a positive integer R. Functions to compute simulated summaries S(y∗) and
S(x∗) must be available.
Output: the best found θ∗ maximizing log N(s(k); ˆµ, ˆΣ) and corresponding (µ∗, Σ∗).
Here θc denotes a generic candidate value.
i. Simulate x∗
r ∼ pX(X0:N ; θc), y∗
r ∼ pY|X(Y1:n|X1:n; θc) for r = 1, ..., R.
ii. Compute user-defined summaries s∗
r = (S(y∗
r ), S(x∗
r )) for r = 1, ..., R. Construct the corresponding ( ˆµ, ˆΣ).
iii. Evaluate log N(s(k); ˆµ, ˆΣ).
Use a numerical procedure that performs (i)–(iii) L times to find the best θ∗ maximizing log N(s(k); ˆµ, ˆΣ) for varying θc.
Denote with (µ∗, ˆΣ
∗
) the simulated moments corresponding to the best found θ∗. Set θ(k) := θ∗.
iv. Update moments:
ˆµ(k) = ˆµ(k−1) + γ(k)( ˆµ∗ − ˆµ(k−1))
ˆΣ
(k)
= ˆΣ
(k−1)
+ γ(k)( ˆΣ
∗
− ˆΣ
(k−1)
).
Return (θ(k), ˆµ(k), ˆΣ
(k)
).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
We have now completed all the steps required to implement a
likelihood free version of SAEM.
Main inference problem: not clear how to construct a set of
informative (S(Y), S(X)) for θ. These are user-defined, hence
arbitrary.
Main computational bottleneck: compared to the regular
SAEM, our M-step is a numerical optimization routine. We used
Nelder-Mead, which is rather slow.
Ideal case (typically unattainable)
If we have:
1 s = (S(Y), S(X)) is jointly sufficient for θ and
2 s is multivariate Gaussian
then our likelihood free SAEM converges to a stationary point of
p(Y; θ) under the conditions given in Delyon et al 1999.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
I have two examples to show:
a state-space model driven by an SDE: I compare SAEM-SL
with the regular SAEM and with direct optimzation of the
synthetic likelihood.
a simple Gaussian state-model: I compare SAEM-SML vs the
regular SAEM, iterated filtering and particle marginal methods.
A “static model” example is available in my paper3.
3
P. 2016. Likelihood-free stochastic approximation EM for inference in complex
models, arXiv:1609.03508.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Example: a nonlinear Gaussian state-space model
We study a standard toy-model (e.g. Jasra et al. 20104).
Yj = Xj + σyνj, j 1
Xj = 2 sin(eXj−1 ) + σxτj,
with νj, τj ∼ N(0, 1) i.i.d. and X0 = 0.
θ = (σx, σy).
4
Jasra, Singh, Martin and McCoy, 2012. Filtering via approximate Bayesian
computation. Statistics and Computing.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
We generate n = 50 observations from the model with
σx = σy = 2.23.
0 5 10 15 20 25 30 35 40 45 50
time
-10
-5
0
5
10
Y
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
the standard SAEM
Let’s set-up the “standard” SAEM. We need the complete likelihood
and sufficient statistics.
Easy for this model.
p(Y, X) = p(Y|X)p(X) =
n
j=1
p(Yj|Xj)p(Xj|Xj−1)
Yj|Xj ∼ N(Xj, σ2
y)
Xj|Xj−1 ∼ N(2 sin(eXj−1
), σ2
x)
Sσ2
x
= n
j=1(Xj − 2 sin(eXj−1 ))2 and Sσ2
y
= n
j=1(Yj − Xj)2 are
sufficient for σ2
x and σ2
y
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Plug the sufficient statistics in the complete (log)likelihood, and set to
zero the gradient w.r.t. (σ2
x, σ2
y).
Explicit M-step at kth iteration:
ˆσ2(k)
x = Sσ2
x
/n
ˆσ2(k)
y = Sσ2
y
/n
To run SAEM the only left thing needed is a way to sample X(k)|Y.
For this we use sequential Monte Carlo, e.g. the bootstrap filter (in
backup slides, if needed).
I skip this sampling step. Just know that this is easily accomplished
for state space models.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
SAEM-SL: SAEM with synthetic likelihoods
To implement SAEM-SL no knowledge of the complete likelihood is
required, nor analytic derivation of the sufficient statistics.
We just have to postulate some “reasonable” summaries for X and Y.
For each synthetic likelihood step, we simulate R = 500 realizations
of S(Xr) and S(Yr), containing:
the sample median of Xr, r = 1, ..., R;
the median absolute deviation of Xr;
the 10th, 20th, 75th and 90th percentile of Xr.
the sample median of Yr;
the median absolute deviation of Yr;
the 10th, 20th, 75th and 90th percentile of Yr.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Results with SAEM-SL on 30 different datasets
Starting parameter values are randomly initialised. Here R = 500.
0 10 20 30 40 50 60 70
σ
x
-2
0
2
4
6
8
10
12
14
16
18
20
0 10 20 30 40 50 60 70
σ
y
0
5
10
15
20
25
30
Figure: trace plots for SAEM-SL (σx, left; σy, right) for the thirty estimation
procedures. Horizontal lines are true parameter values.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
(M, ¯M) (500,200) (1000,200) (1000,20)
σx (true value 2.23)
SAEM-SMC 2.54 [2.53,2.54] 2.55 [2.54,2.56] 1.99 [1.85,2.14]
IF2 1.26 [1.21,1.41] 1.35 [1.28,1.41] 1.35 [1.28,1.41]
σy (true value 2.23)
SAEM-SMC 0.11 [0.10,0.13] 0.06 [0.06,0.07] 1.23 [1.00,1.39]
IF2 1.62 [1.56,1.75] 1.64 [1.58,1.67] 1.64 [1.58,1.67]
Table: SAEM with bostrap filter using M particles; IF2=iterated filtering.
R 500 1000
σx (true value 2.23)
SAEM-SL 1.67 [0.42,1.97] 1.51 [0.82,2.03]
σy (true value 2.23)
SAEM-SL 2.40 [2.01,2.63] 2.27 [1.57,2.57]
Table: SAEM with synthetic likelihoods. K = 60 iterations.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Example: state-space SDE model [P., 2016]
We consider a one-dimensional state-space model driven by a SDE.
Suppose we administer 4 mg of theophylline [Dose] to a subject.
Xt is the level of theophylline concentration in blood at time t (hrs).
Consider the following state-space model:



Yj = Xj + εj, εj ∼iid N(0, σ2
ε)
dXt = Dose·Ka·Ke
Cl e−Kat − KeXt dt + σ
√
XtdWt, t t0
Ke is the elimination rate constant
Ka is the absorption rate constant
Cl the clearance of the drug
σ the intensity of intrinsic stochastic noise.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
We simulate a set of n = 30 observations from the model at
equispaced times.
But how to simulate from this model? No analytic solution for the
SDE is available.
We resort to the Euler-Maruyama discretization with a small stepsize
h = 0.05 on the time interval [0,30]:
Xt+h = Xt +
Dose · Ka · Ke
Cl
e−Kat
− KeXt h + (σ h · Xt)Zt+h,
{Zt} ∼iid N(0, h)
This implies a latent simulated process of length N:
X0:N = {X0, Xh, ..., XN}.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
A typical relation of the process:
time (hrs)
0 5 10 15 20 25 30
0
2
4
6
8
10
12
14
Figure: data (circles) and the latent process (black line).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The classic SAEM
Applying the “standard” SAEM is not really trivial here.
The complete likelihood:
p(Y, X) = p(Y|X)p(X) =
n
j=1
p(Yj|Xj)
N
i=1
p(Xi|Xi−1)
Yj|Xj ∼ N(Xj, σ2
y)
Xi|Xi−1 ∼ not available.
Euler-Maruyama induces a Gaussian approximation:
p(xi|xi−1) ≈
1
σ
√
2πxi−1h
exp −
xi − xi−1 − (Dose·Ka·Ke
Cl e−Kaτi−1
− Kexi−1)h
2
2σ2xi−1h
.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
The classic SAEM
I am not going to show how to obtain all the sufficient summary
statistics (see the paper).
Just trust me that it requires a bit of work.
And this is just a one-dimensional model!
We sample X(k)|Y using the bootstrap filter sequential Monte Carlo
method.
If you are not familiar with sequential Monte Carlo, worry not. Just
consider it a method returning a “best” filtered X(k) based on Y (for
linear Gaussian models you would use Kalman).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
SAEM-SL with synthetic likelihoods
User-defined summaries for a simulation r: (s(x∗
r ), s(y∗
r )).
s(x∗
r ) contains:
(i) the median values of X∗
0:N ;
(ii) the median absolute deviation of X∗
0:N,
(iii) a statistic for σ computed from X∗
0:N (see next slide).
(iv) ( j(Y∗
j − X∗
j )2/n)1/2.
s(y∗
r ) contains:
(i) the median value of y∗
r ;
(ii) its median absolute deviation;
(iii) the slope of the line connecting the first and last simulated
observation (Y∗
n − Y∗
1 )/(tn − t1).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
In Miao 2014: for an SDE of the type dXt = µ(Xt)dt + σg(Xt)dWt
with t ∈ [0, T], we have
Γ |Xi+1 − Xi|2
Γ g(Xi)(ti+1 − ti)
→ σ2
as |Γ| → 0
where the convergence is in probability and Γ a partition of [0, T].
We deduce that using the discretization {X0, X1, ..., XN} produced by
the Euler-Maruyama scheme, we can take the square root of the left
hand side in the limit above, which should be informative for σ.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
100 different datasets are simulated from ground-truth parameters.
All optimizations start away from ground truth values.
SAEM-SL: at each iteration of the M-step simulates R = 500
summaries, with L = 10 Nelder-Mead iterations (M-step) and
K = 100 SAEM iterations.
0 20 40 60 80 100 120
Ke
0
0.05
0.1
0.15
0.2
0 20 40 60 80 100 120
Cl
0
0.05
0.1
0.15
0.2
0 20 40 60 80 100 120
σ
0
0.1
0.2
0.3
0 20 40 60 80 100 120
σǫ
0
0.2
0.4
0.6
0.8
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
SAEM-SMC using the bootstrap filter with M = 500 particles to
obtain a X(k)|Y.
Cl and σ are essentially unidentified.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Ke Cl σ σε
true values 0.050 0.040 0.100 0.319
SAEM-SMC 0.045 [0.042,0.049] 0.085 [0.078,0.094] 0.171 [0.158,0.184] 0.395 [0.329,0.465]
SAEM-SL 0.044 [0.038,0.051] 0.033 [0.028,0.039] 0.106 [0.083,0.132] 0.266 [0.209,0.307]
optim. SL 0.063 [0.054,0.069] 0.089 [0.068,0.110] 0.304 [0.249,0.370] 0.543 [0.485,0.625]
SAEM-SMC: uses M = 500 particles to filter X(k)|Y via SMC. Runs
for K = 300 SAEM iterations.
SAEM-SL at each iteration of the M-step simulates R = 500
summaries, with L = 10 Nelder-Mead iterations (M-step) and
K = 100 SAEM iterations.
“optim. SL” denotes the direct maximization of Wood’s synthetic
(incomplete) likelihood:
ˆθ = arg max
θ∈Θ
log N(S(Y); µ(θ), Σ(θ)). (6)
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
How about Gaussianity of the summaries?
Here we have qq-normal plots from the 7 postulated summaries at the
obtained optimum (500 simulations each).
-4 -2 0 2 4
sx
(1)
4
6
8
10
12
-4 -2 0 2 4
sx
(2)
1.5
2
2.5
3
3.5
-4 -2 0 2 4
sx
(3)
1.8
2
2.2
2.4
-4 -2 0 2 4
sx
(4)
0.1
0.2
0.3
0.4
-4 -2 0 2 4
sy
(1)
4
6
8
10
12
-4 -2 0 2 4
sy
(2)
1.5
2
2.5
3
3.5
-4 -2 0 2 4
sy
(3)
-0.4
-0.3
-0.2
-0.1
The summaries quantiles nicely follow the line (not visible) for the
perfect match with Gaussian quantiles.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Summary
We introduced SAEM-SL, a version of SAEM that is able to deal
with intractable likelihoods;
It only requires the formulation and simulation of “informative”
summaries s.
How to construct informative summaries automatically is a
difficult open problem.
if said user-defined summaries s are sufficient for θ (very
unlikely), and if s ∼ N(·) then SAEM-SL converges to the true
maximum likelihood estimates for p(Y|θ).
The method can be used for intractable models, or even just to
initialize starting values for more refined algorithms (e.g.
particle MCMC).
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Key references
Andrieu et al. 2010. Particle Markov chain Monte Carlo methods.
JRSS-B.
Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic
approximation version of the EM algorithm. Annals of Statistics.
Dempster, Laird and Rubin, 1977. Maximum likelihood from
incomplete data via the EM algorithm. JRSS-B.
Ionides et al. 2015. Inference for dynamic and latent variable models
via iterated, perturbed Bayes maps. PNAS.
Marin et al. 2012. Approximate Bayesian computational methods.
Stat. Comput.
Picchini 2016. Likelihood-free stochastic approximation EM for
inference in complex models, arXiv:1609.03508.
Wood 2010. Statistical inference for noisy nonlinear ecological
dynamic systems. Nature.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Appendix
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Justification of Gaussianity (Wood 2010)
Assuming Gaussianity for summaries s(·) can be justified from a
standard Taylor expansion.
Say that fθ(s) is the true (unknown) joint density of s.
Expand fθ(s) around its mode µθ:
log fθ(s) ≈ log fθ(µθ) +
1
2
(s − µθ)
∂2 log fθ
∂s∂s
(s − µθ)
hence
fθ(s) ≈ const × exp −
1
2
(s − µθ) −
∂2 log fθ
∂s∂s
(s − µθ)
s ∼ N µθ, −
∂2 log fθ
∂s∂s
−1
, approximately when s ≈ µθ
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Asymptotic properties for synthetic likelihoods (Wood
2010)
As the number of simulated statistics R → ∞
the maximizer ˆθ of liks(θ) is a consistent estimator.
ˆθ is an unbiased estimator.
ˆθ might not be in general Gaussian. It will be Gaussian if Σθ
depends weakly on θ or when d = dim(s) is large.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
Algorithm 1 Bootstrap filter with M particles and threshold 1 ¯M
M. Resamples only when ESS < ¯M.
Step 0. Set j = 1: for m = 1, ..., M sample X
(m)
1 ∼ p(X0), compute weights
W
(m)
1 = f(Y1|X
(m)
1 ) and normalize weights w
(m)
1 := W
(m)
1 / M
m=1 W
(m)
1 .
Step 1.
if ESS({w
(m)
j }) < ¯M then
resample M particles {X
(m)
j , w
(m)
j } and set W
(m)
j = 1/M.
end if
Set j := j + 1 and if j = n + 1, stop and return all constructed weights
{W
(m)
j }m=1:M
j=1:n to sample a single path. Otherwise go to step 2.
Step 2. For m = 1, ..., M sample X
(m)
j ∼ p(·|X
(m)
j−1). Compute
W
(m)
j := w
(m)
j−1p(Yj|X
(m)
j )
normalize weights w
(m)
j := W
(m)
j / M
m=1 W
(m)
j and go to step 1.
Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini

More Related Content

What's hot

Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Christian Robert
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceArthur Charpentier
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015Christian Robert
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionKonkuk University, Korea
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABCChristian Robert
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite differenceManthan Chavda
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysistuxette
 

What's hot (20)

Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
Machine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & InsuranceMachine Learning in Actuarial Science & Insurance
Machine Learning in Actuarial Science & Insurance
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015NBBC15, Reyjavik, June 08, 2015
NBBC15, Reyjavik, June 08, 2015
 
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier DetectionA Tutorial of the EM-algorithm and Its Application to Outlier Detection
A Tutorial of the EM-algorithm and Its Application to Outlier Detection
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
from model uncertainty to ABC
from model uncertainty to ABCfrom model uncertainty to ABC
from model uncertainty to ABC
 
Boston talk
Boston talkBoston talk
Boston talk
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite difference
 
Lausanne 2019 #2
Lausanne 2019 #2Lausanne 2019 #2
Lausanne 2019 #2
 
Influence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data AnalysisInfluence of the sampling on Functional Data Analysis
Influence of the sampling on Functional Data Analysis
 

Similar to A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
 
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...Umberto Picchini
 
Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Pierre BAUDOT
 
!Business statistics tekst
!Business statistics tekst!Business statistics tekst
!Business statistics tekstKing Nisar
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdfElio Laureano
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrisonComputer Science Club
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationUmberto Picchini
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion PresentationMichael Hankin
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Hector Zenil
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationWork-Bench
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesChristoph Molnar
 

Similar to A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models (20)

Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Stratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computationStratified sampling and resampling for approximate Bayesian computation
Stratified sampling and resampling for approximate Bayesian computation
 
Ecmi presentation
Ecmi presentationEcmi presentation
Ecmi presentation
 
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...Information topology, Deep Network generalization and Consciousness quantific...
Information topology, Deep Network generalization and Consciousness quantific...
 
Talk 5
Talk 5Talk 5
Talk 5
 
Clustering-beamer.pdf
Clustering-beamer.pdfClustering-beamer.pdf
Clustering-beamer.pdf
 
!Business statistics tekst
!Business statistics tekst!Business statistics tekst
!Business statistics tekst
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
2_GLMs_printable.pdf
2_GLMs_printable.pdf2_GLMs_printable.pdf
2_GLMs_printable.pdf
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison20130928 automated theorem_proving_harrison
20130928 automated theorem_proving_harrison
 
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computationStratified Monte Carlo and bootstrapping for approximate Bayesian computation
Stratified Monte Carlo and bootstrapping for approximate Bayesian computation
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Matrix Completion Presentation
Matrix Completion PresentationMatrix Completion Presentation
Matrix Completion Presentation
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...Graph Spectra through Network Complexity Measures: Information Content of Eig...
Graph Spectra through Network Complexity Measures: Information Content of Eig...
 
One Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical ComputationOne Algorithm to Rule Them All: How to Automate Statistical Computation
One Algorithm to Rule Them All: How to Automate Statistical Computation
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two Cultures
 

Recently uploaded

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 

Recently uploaded (20)

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 

A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models

  • 1. A likelihood-free version of the stochastic approximation EM algorithm (SAEM) for parameter estimation in complex models Umberto Picchini Centre for Mathematical Sciences, Lund University twitter: @uPicchini umberto@maths.lth.se 18 October 2016, Department of Computer and Information Science, Linköping University. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 2. This presentation is based on the working paper: P. (2016). Likelihood-free stochastic approximation EM for inference in complex models, arXiv:1609.03508. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 3. I will consider: the problem of parameter inference with “complex models”, i.e. models having an intractable likelihood. the inference problem for “incomplete data”, in the sense given by the seminal EM-paper [Dempster et al. 1977]. In two words, what I investigate is: we have data Y arising from a generic model depending on the unobservable X and parameter θ. How do we estimate θ from Y, in presence of the latent X? Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 4. The presence of the latent (unobservable) X means that we deal with an incomplete data problem. The EM algorithm1 is the standard way to conduct maximum-likelihood inference for θ in presence of incomplete data. The complete data is the couple (Y, X), and the corresponding complete likelihood is p(Y, X; θ). The incomplete (data) likelihood is p(Y; θ). We are interested in finding the MLE ˆθ = arg max θ∈Θ p(Y; θ) given observations Y = (Y1, ..., Yn). 1 Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. JRSS-B. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 5. In the rest of this presentation I will discuss: SAEM: a popular stochastic version of EM, for when EM is not directly applicable. Implementing SAEM is difficult! And impossible for models with intractable likelihoods. What to do? A quick intro to Wood’s synthetic likelihoods (SL). Our contribution embedding SL within SAEM. Simulation studies. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 6. EM in one slide EM is a two steps procedure: E-step followed by the M-step. Define Q(θ|θ ) = log pY,X(Y, X; θ)pX|Y(X|Y; θ )dX ≡ EX|Y log pY,X(Y, X; θ). At iteration k 1 E-step: compute Q(θ| ˆθ (k−1) ); M-step: obtain ˆθ (k) = arg maxθ∈Θ Q(θ| ˆθ (k−1) ). As k → ∞ the sequence { ˆθ (k) }k converges to a stationary point of the data likelihood p(Y; θ) under weak assumptions. Typically, E-step is hard while M-step is “easy”. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 7. How to get around the E-step The E-step requires the evaluation of: Q(θ|θ ) = log pY,X(Y, X; θ)pX|Y(X|Y; θ )dX. This is hard, as pX|Y(X|Y; ·) is typically unknown. MCEM [Wei-Tanner 1990] Assume we are able to simulate draws from pX|Y(X|Y; ·) say mk times → Monte-Carlo approximation: generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk; Q(θ|θ ) ≈ 1 mk mk r=1 log pY,X(Y, xr; θ). Problem: mk needs to increase as k increases. Double asymptotic problem! Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 8. SAEM (stochastic approximation EM) A more efficient approximation to the E-step is is given by SAEM2 generate xr ∼ pX|Y(X|Y; ·), r = 1, ..., mk; ˜Q(θ| ˆθ (k) ) = (1 − γk) ˜Q(θ| ˆθ (k−1) ) + γk 1 mk mk r=1 log pY,X(Y, xr; θ) . With {γk} a decreasing sequence such that k γk = ∞, k γ2 k < ∞. As k → ∞, it is not required for mk to increase, in fact it is possible to take mk ≡ 1 for all k, however see next slide for convergence properties. 2 Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 9. Beautiful things happen if you manage to write log p(Y, X) as a member of the curved exponential family, e.g. log p(Y, X; θ) = −Λ(θ) + Sc(Y, X), Γ(θ) . (1) Here ... is the scalar product, Λ and Γ are two functions of θ and Sc(Y, X) is the minimal sufficient statistic of the complete model. Then we only need to update the sufficient statistics sk = sk−1 + γk(Sc(Y, X(k) ) − sk−1). Computing Sc(Y, X) for most non-trivial models is hard! But if you manage, the M-step is often explicit: θ (k) = arg max θ∈Θ (−Λ(θ) + sk, Γ(θ) ) Only for case (1) Delyon et al. (1999) prove convergence of the sequence {θk}k to a stationary point of p(Y; θ) under weak conditions. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 10. Some considerations General problem with all EM-type algorithms: we assumed the ability to simulate latent states from p(X|Y). This is often not trivial. For state-space models, plenty of possibilities given by particle filters (sequential Monte Carlo). In this case, the sampling issue is “solvable”. What to do outside of state-space models? What if the model has no dynamic structure? What if the model is so complex that we can’t write pY,X(Y, X) in closed form? Example, for SDE models the transition density of the underlying Markov process is unknown. Then we cannot write p(X0:n) = n j=1 p(Xj|Xj−1), hence we cannot write pY,X(Y0:n, X0:n) = p(Y0:n|X0:n)p(X0:n). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 11. If we can’t write the complete likelihood certainly we cannot hope to find the sufficient statistics Sc(·). Specifically: it is impossible to apply SAEM for models having intractable likelihoods, e.g. models for which we can’t write p(Y, X) in closed form. Likelihood-free methods use the ability to simulate from a model to compensate for our ignorance about the underlying likelihood. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 12. Say we formulate a statistical model p(Y; θ) such that n observations are assumed Yj ∼ p(Y; θ), j = 1, .., n. Suppose we do not know p(Y; ·), however we do know how to implement a simulator to generate draws from p(Y; ·). Trivial example (but you get the idea) y = x + ε, x ∼ px, ε ∼ N(0, σ2 ε) simulate x∗ ∼ px(X) [possible even when px unknown!] simulate y∗ ∼ N(x∗ , σ2 ε), then y∗ ∼ py(Y|σε) Therefore, in the following we consider the case where the only thing we know is how to forward simulate from an assumed model. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 13. Bayes: complex networks might not allow for trivial sampling (Gibbs-type), i.e. when the conditional densities are unknown. [Pic from Schadt et al. (2009) doi:10.1038/nrd2826] Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 14. The ability to simulate from a model even when we have no knowledge of the analytic expressions of the underlying likelihood(s), is central in likelihood-free methods for intractable likelihoods. Several ways to deal with “intractable likelihoods”. “Plug-and-play methods”: the only requirements is the ability to simulate from the data-generating-model. particle marginal methods (PMMH, PMCMC) based on SMC filters [Andrieu et al. 2010]. (improved) Iterated filtering [Ionides et al. 2015] approximate Bayesian computation (ABC) [Marin et al. 2012]. Synthetic likelihoods [Wood 2010]. In the following I focus on Synthetic Likelihoods. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 15. A nearly chaotic model Two realizations from a Ricker model. yt ∼ Poi(φNt) Nt = r · Nt−1 · e−Nt−1 . Small changes in r cause major departures from data.051015 nt Time 5 10 15 20 25 −260−220−180−140 Log−likelihood log(r 2.5 3.0 3.5 Figure: One path generated with log r = 3.8 (black) and one generated with log r = 3.799 (red). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 16. The resulting likelihood can be difficult to explore if algorithms are badly initialized. 2.5 3.0 3.5 4.0 −15−10−5 Ricker log(r) 271217 nt −5 −4 −35−25−15−50 Pen lo −50 Varley 0−5 Mayna Log−likelihood(103 ) Figure: The loglikelihood is in black. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 17. A change of paradigm from S. Wood, Nature 2010: “Naive methods of statistical inference try to make the model reproduce the exact course of the observed data in a way that the real system itself would not do if repeated.” “What is important is to identify a set of statistics that is sensitive to the scientifically important and repeatable features of the data, but insensitive to replicate-specific details of phase.” In other words, with complex, stochastic and/or chaotic model we could try to match features of the data, not the path of the data itself. A similar approach is considered in ABC (approximate Bayesian computation). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 18. Synthetic likelihoods y: observed data, from static or dynamic models s(y): (vector of) summary statistics of data, e.g. mean, autocorrelations, marginal quantiles etc. assume s(y) ∼ N(µθ, Σθ) an assumption justifiable via second order Taylor expansion (same as in Laplace approximations). µθ and Σθ unknown: estimate them via simulations. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 19. nature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319... Figure: Schematic representation of the synthetic likelihoods procedure. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 20. For fixed θ simulate R artificial datasets y∗ 1 , ..., y∗ R from your model and compute corresponding (possibly vector valued) summaries s∗ 1 , ..., s∗ R. compute ˆµθ = 1 R R r=1 s∗ r , ˆΣθ = 1 R − 1 R r=1 (s∗ r − ˆµθ)(s∗ r − ˆµθ) compute the statistics sobs for the observed data y. evaluate a multivariate Gaussian likelihood at sobs liksyn(θ) := N(sobs; ˆµθ, ˆΣθ) ∝ | ˆΣθ|−1/2 exp −(sobs − ˆµθ) ˆΣ −1 θ (sobs − ˆµθ) 2 This likelihood can be maximized for a varying θ or be plugged within an MCMC algorithm targeting ˆπ(θ|sobs) ∝ liksyn(θ)π(θ). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 21. So the synthetic likelihood methodology assumes no specific knowledge of the probabilistic features of the model. Only assumes the ability to forward-generate from the model. assumes that the analyst is able to specify “informative” summaries. assumes that said summaries are (approximately) Gaussian s ∼ N(·). Transform the summaries to be ≈ N is often not an issue (just as we do in linear regression). Of course the major issue (still open, also in ABC) is how to build informative summaries. This is left unsolved. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 22. I intend to use the synthetic likelihoods approach to enable likelihood-free inference using SAEM. This should allow SAEM to be applied to intractable likelihood models. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 23. We use synthetic likelihoods to construct a Gaussian approximation over a set of complete summaries (S(Y), S(X)) to define a complete synthetic loglikelihood. the complete synthetic loglikelihood log p(s; θ) = log N(s; µ(θ), Σ(θ)), (2) with s = (S(Y), S(X)) In (2) µ(θ) and Σ(θ) are unknown but can be estimated using synthetic likelihoods (SL), conditionally on θ. However we need to obtain a maximizer for the (incomplete) synthetic loglikelihood log p(S(Y); θ). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 24. SAEM with synthetic likelihoods (SL) For given θ SL returns estimates ˆµ(θ) and ˆΣ(θ) (sample mean and sample covariance). Crucial result For a Gaussian likelihood ˆµ(θ) and ˆΣ(θ) are sufficient statistics for µ(θ) and Σ(θ). And a Gaussian is member of the exponential family. Recall: what SAEM does is to update sufficient statistics, perfect for us! At kth SAEM iteration: ˆµ(k) (θ) = ˆµ(k−1) (θ) + γ(k) ( ˆµ(θ) − ˆµ(k−1) (θ)) (3) ˆΣ (k) (θ) = ˆΣ (k−1) (θ) + γ(k) ( ˆΣ(θ) − ˆΣ (k−1) (θ)). (4) Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 25. Updating the latent variable X At kth iteration of SAEM we need to sample S(X(k))|S(Y). This is trivial!. We have S(X(k) )|S(Y) ∼ N( ˆµ (k) x|y (θ), ˆΣ (k) x|y (θ)) where ˆµ (k) x|y = ˆµx + ˆΣxy ˆΣ −1 y (S(Y) − ˆµy) ˆΣ (k) x|y = ˆΣx − ˆΣxy ˆΣ −1 y ˆΣyx where ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx are extracted from ( ˆµ(k) , ˆΣ (k) ). That is ˆµ(k) (θ) = ( ˆµx, ˆµy) and ˆΣ (k) (θ) = Σx Σxy Σyx Σy . Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 26. The M-step Now that we have simulated a S(X(k)) (conditional on data) lets produce the complete summaries at iteration k: s(k) := (S(Y), S(X(k) )) and maximize (M-step) the complete synthetic loglikelihood: ˆθ (k) = arg max θ∈Θ log N(s(k) ; µ(θ), Σ(θ)) (5) For each perturbation of θ the M-step performs a synthetic likelihood simulation. It returns the best found maximizer for (5) and corresponding best ( ˆµ, ˆΣ). Plug these in the updating moments equations (3)-(4). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 27. The slide that follows describes a single iteration of SAEM-SL. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 28. Input: observed summaries S(Y), positive integers L and R. Values for ˆθ (k−1) , ˆµ(k−1) and ˆΣ (k−1) . Output: ˆθ (k) . At iteration k: 1. Extract ˆµx, ˆµy, ˆΣx, ˆΣy, ˆΣxy and ˆΣyx from ˆµ(k−1) and ˆΣ (k−1) . Compute conditional moments ˆµx|y, ˆΣx|y. 2. Sample S(X(k−1))|S(Y) ∼ N( ˆµ (k−1) x|y (θ), ˆΣ (k−1) x|y (θ)) and form s(k−1) := (S(Y), S(X(k−1))). 3. Obtain (θ(k), µ(k), Σ(k)) from InternalSL(s(k−1), ˆθ (k−1) , R) starting at ˆθ (k−1) . 4. Increase k := k + 1 and go to step 1. Function InternalSL(s(k−1), θstart, R): Input: s(k−1), starting parameters θstart, a positive integer R. Functions to compute simulated summaries S(y∗) and S(x∗) must be available. Output: the best found θ∗ maximizing log N(s(k); ˆµ, ˆΣ) and corresponding (µ∗, Σ∗). Here θc denotes a generic candidate value. i. Simulate x∗ r ∼ pX(X0:N ; θc), y∗ r ∼ pY|X(Y1:n|X1:n; θc) for r = 1, ..., R. ii. Compute user-defined summaries s∗ r = (S(y∗ r ), S(x∗ r )) for r = 1, ..., R. Construct the corresponding ( ˆµ, ˆΣ). iii. Evaluate log N(s(k); ˆµ, ˆΣ). Use a numerical procedure that performs (i)–(iii) L times to find the best θ∗ maximizing log N(s(k); ˆµ, ˆΣ) for varying θc. Denote with (µ∗, ˆΣ ∗ ) the simulated moments corresponding to the best found θ∗. Set θ(k) := θ∗. iv. Update moments: ˆµ(k) = ˆµ(k−1) + γ(k)( ˆµ∗ − ˆµ(k−1)) ˆΣ (k) = ˆΣ (k−1) + γ(k)( ˆΣ ∗ − ˆΣ (k−1) ). Return (θ(k), ˆµ(k), ˆΣ (k) ). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 29. We have now completed all the steps required to implement a likelihood free version of SAEM. Main inference problem: not clear how to construct a set of informative (S(Y), S(X)) for θ. These are user-defined, hence arbitrary. Main computational bottleneck: compared to the regular SAEM, our M-step is a numerical optimization routine. We used Nelder-Mead, which is rather slow. Ideal case (typically unattainable) If we have: 1 s = (S(Y), S(X)) is jointly sufficient for θ and 2 s is multivariate Gaussian then our likelihood free SAEM converges to a stationary point of p(Y; θ) under the conditions given in Delyon et al 1999. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 30. I have two examples to show: a state-space model driven by an SDE: I compare SAEM-SL with the regular SAEM and with direct optimzation of the synthetic likelihood. a simple Gaussian state-model: I compare SAEM-SML vs the regular SAEM, iterated filtering and particle marginal methods. A “static model” example is available in my paper3. 3 P. 2016. Likelihood-free stochastic approximation EM for inference in complex models, arXiv:1609.03508. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 31. Example: a nonlinear Gaussian state-space model We study a standard toy-model (e.g. Jasra et al. 20104). Yj = Xj + σyνj, j 1 Xj = 2 sin(eXj−1 ) + σxτj, with νj, τj ∼ N(0, 1) i.i.d. and X0 = 0. θ = (σx, σy). 4 Jasra, Singh, Martin and McCoy, 2012. Filtering via approximate Bayesian computation. Statistics and Computing. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 32. We generate n = 50 observations from the model with σx = σy = 2.23. 0 5 10 15 20 25 30 35 40 45 50 time -10 -5 0 5 10 Y Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 33. the standard SAEM Let’s set-up the “standard” SAEM. We need the complete likelihood and sufficient statistics. Easy for this model. p(Y, X) = p(Y|X)p(X) = n j=1 p(Yj|Xj)p(Xj|Xj−1) Yj|Xj ∼ N(Xj, σ2 y) Xj|Xj−1 ∼ N(2 sin(eXj−1 ), σ2 x) Sσ2 x = n j=1(Xj − 2 sin(eXj−1 ))2 and Sσ2 y = n j=1(Yj − Xj)2 are sufficient for σ2 x and σ2 y Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 34. Plug the sufficient statistics in the complete (log)likelihood, and set to zero the gradient w.r.t. (σ2 x, σ2 y). Explicit M-step at kth iteration: ˆσ2(k) x = Sσ2 x /n ˆσ2(k) y = Sσ2 y /n To run SAEM the only left thing needed is a way to sample X(k)|Y. For this we use sequential Monte Carlo, e.g. the bootstrap filter (in backup slides, if needed). I skip this sampling step. Just know that this is easily accomplished for state space models. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 35. SAEM-SL: SAEM with synthetic likelihoods To implement SAEM-SL no knowledge of the complete likelihood is required, nor analytic derivation of the sufficient statistics. We just have to postulate some “reasonable” summaries for X and Y. For each synthetic likelihood step, we simulate R = 500 realizations of S(Xr) and S(Yr), containing: the sample median of Xr, r = 1, ..., R; the median absolute deviation of Xr; the 10th, 20th, 75th and 90th percentile of Xr. the sample median of Yr; the median absolute deviation of Yr; the 10th, 20th, 75th and 90th percentile of Yr. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 36. Results with SAEM-SL on 30 different datasets Starting parameter values are randomly initialised. Here R = 500. 0 10 20 30 40 50 60 70 σ x -2 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 σ y 0 5 10 15 20 25 30 Figure: trace plots for SAEM-SL (σx, left; σy, right) for the thirty estimation procedures. Horizontal lines are true parameter values. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 37. (M, ¯M) (500,200) (1000,200) (1000,20) σx (true value 2.23) SAEM-SMC 2.54 [2.53,2.54] 2.55 [2.54,2.56] 1.99 [1.85,2.14] IF2 1.26 [1.21,1.41] 1.35 [1.28,1.41] 1.35 [1.28,1.41] σy (true value 2.23) SAEM-SMC 0.11 [0.10,0.13] 0.06 [0.06,0.07] 1.23 [1.00,1.39] IF2 1.62 [1.56,1.75] 1.64 [1.58,1.67] 1.64 [1.58,1.67] Table: SAEM with bostrap filter using M particles; IF2=iterated filtering. R 500 1000 σx (true value 2.23) SAEM-SL 1.67 [0.42,1.97] 1.51 [0.82,2.03] σy (true value 2.23) SAEM-SL 2.40 [2.01,2.63] 2.27 [1.57,2.57] Table: SAEM with synthetic likelihoods. K = 60 iterations. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 38. Example: state-space SDE model [P., 2016] We consider a one-dimensional state-space model driven by a SDE. Suppose we administer 4 mg of theophylline [Dose] to a subject. Xt is the level of theophylline concentration in blood at time t (hrs). Consider the following state-space model:    Yj = Xj + εj, εj ∼iid N(0, σ2 ε) dXt = Dose·Ka·Ke Cl e−Kat − KeXt dt + σ √ XtdWt, t t0 Ke is the elimination rate constant Ka is the absorption rate constant Cl the clearance of the drug σ the intensity of intrinsic stochastic noise. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 39. We simulate a set of n = 30 observations from the model at equispaced times. But how to simulate from this model? No analytic solution for the SDE is available. We resort to the Euler-Maruyama discretization with a small stepsize h = 0.05 on the time interval [0,30]: Xt+h = Xt + Dose · Ka · Ke Cl e−Kat − KeXt h + (σ h · Xt)Zt+h, {Zt} ∼iid N(0, h) This implies a latent simulated process of length N: X0:N = {X0, Xh, ..., XN}. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 40. A typical relation of the process: time (hrs) 0 5 10 15 20 25 30 0 2 4 6 8 10 12 14 Figure: data (circles) and the latent process (black line). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 41. The classic SAEM Applying the “standard” SAEM is not really trivial here. The complete likelihood: p(Y, X) = p(Y|X)p(X) = n j=1 p(Yj|Xj) N i=1 p(Xi|Xi−1) Yj|Xj ∼ N(Xj, σ2 y) Xi|Xi−1 ∼ not available. Euler-Maruyama induces a Gaussian approximation: p(xi|xi−1) ≈ 1 σ √ 2πxi−1h exp − xi − xi−1 − (Dose·Ka·Ke Cl e−Kaτi−1 − Kexi−1)h 2 2σ2xi−1h . Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 42. The classic SAEM I am not going to show how to obtain all the sufficient summary statistics (see the paper). Just trust me that it requires a bit of work. And this is just a one-dimensional model! We sample X(k)|Y using the bootstrap filter sequential Monte Carlo method. If you are not familiar with sequential Monte Carlo, worry not. Just consider it a method returning a “best” filtered X(k) based on Y (for linear Gaussian models you would use Kalman). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 43. SAEM-SL with synthetic likelihoods User-defined summaries for a simulation r: (s(x∗ r ), s(y∗ r )). s(x∗ r ) contains: (i) the median values of X∗ 0:N ; (ii) the median absolute deviation of X∗ 0:N, (iii) a statistic for σ computed from X∗ 0:N (see next slide). (iv) ( j(Y∗ j − X∗ j )2/n)1/2. s(y∗ r ) contains: (i) the median value of y∗ r ; (ii) its median absolute deviation; (iii) the slope of the line connecting the first and last simulated observation (Y∗ n − Y∗ 1 )/(tn − t1). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 44. In Miao 2014: for an SDE of the type dXt = µ(Xt)dt + σg(Xt)dWt with t ∈ [0, T], we have Γ |Xi+1 − Xi|2 Γ g(Xi)(ti+1 − ti) → σ2 as |Γ| → 0 where the convergence is in probability and Γ a partition of [0, T]. We deduce that using the discretization {X0, X1, ..., XN} produced by the Euler-Maruyama scheme, we can take the square root of the left hand side in the limit above, which should be informative for σ. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 45. 100 different datasets are simulated from ground-truth parameters. All optimizations start away from ground truth values. SAEM-SL: at each iteration of the M-step simulates R = 500 summaries, with L = 10 Nelder-Mead iterations (M-step) and K = 100 SAEM iterations. 0 20 40 60 80 100 120 Ke 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 Cl 0 0.05 0.1 0.15 0.2 0 20 40 60 80 100 120 σ 0 0.1 0.2 0.3 0 20 40 60 80 100 120 σǫ 0 0.2 0.4 0.6 0.8 Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 46. SAEM-SMC using the bootstrap filter with M = 500 particles to obtain a X(k)|Y. Cl and σ are essentially unidentified. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 47. Ke Cl σ σε true values 0.050 0.040 0.100 0.319 SAEM-SMC 0.045 [0.042,0.049] 0.085 [0.078,0.094] 0.171 [0.158,0.184] 0.395 [0.329,0.465] SAEM-SL 0.044 [0.038,0.051] 0.033 [0.028,0.039] 0.106 [0.083,0.132] 0.266 [0.209,0.307] optim. SL 0.063 [0.054,0.069] 0.089 [0.068,0.110] 0.304 [0.249,0.370] 0.543 [0.485,0.625] SAEM-SMC: uses M = 500 particles to filter X(k)|Y via SMC. Runs for K = 300 SAEM iterations. SAEM-SL at each iteration of the M-step simulates R = 500 summaries, with L = 10 Nelder-Mead iterations (M-step) and K = 100 SAEM iterations. “optim. SL” denotes the direct maximization of Wood’s synthetic (incomplete) likelihood: ˆθ = arg max θ∈Θ log N(S(Y); µ(θ), Σ(θ)). (6) Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 48. How about Gaussianity of the summaries? Here we have qq-normal plots from the 7 postulated summaries at the obtained optimum (500 simulations each). -4 -2 0 2 4 sx (1) 4 6 8 10 12 -4 -2 0 2 4 sx (2) 1.5 2 2.5 3 3.5 -4 -2 0 2 4 sx (3) 1.8 2 2.2 2.4 -4 -2 0 2 4 sx (4) 0.1 0.2 0.3 0.4 -4 -2 0 2 4 sy (1) 4 6 8 10 12 -4 -2 0 2 4 sy (2) 1.5 2 2.5 3 3.5 -4 -2 0 2 4 sy (3) -0.4 -0.3 -0.2 -0.1 The summaries quantiles nicely follow the line (not visible) for the perfect match with Gaussian quantiles. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 49. Summary We introduced SAEM-SL, a version of SAEM that is able to deal with intractable likelihoods; It only requires the formulation and simulation of “informative” summaries s. How to construct informative summaries automatically is a difficult open problem. if said user-defined summaries s are sufficient for θ (very unlikely), and if s ∼ N(·) then SAEM-SL converges to the true maximum likelihood estimates for p(Y|θ). The method can be used for intractable models, or even just to initialize starting values for more refined algorithms (e.g. particle MCMC). Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 50. Key references Andrieu et al. 2010. Particle Markov chain Monte Carlo methods. JRSS-B. Delyon, Lavielle and Moulines, 1999. Convergence of a stochastic approximation version of the EM algorithm. Annals of Statistics. Dempster, Laird and Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. JRSS-B. Ionides et al. 2015. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. PNAS. Marin et al. 2012. Approximate Bayesian computational methods. Stat. Comput. Picchini 2016. Likelihood-free stochastic approximation EM for inference in complex models, arXiv:1609.03508. Wood 2010. Statistical inference for noisy nonlinear ecological dynamic systems. Nature. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 52. Justification of Gaussianity (Wood 2010) Assuming Gaussianity for summaries s(·) can be justified from a standard Taylor expansion. Say that fθ(s) is the true (unknown) joint density of s. Expand fθ(s) around its mode µθ: log fθ(s) ≈ log fθ(µθ) + 1 2 (s − µθ) ∂2 log fθ ∂s∂s (s − µθ) hence fθ(s) ≈ const × exp − 1 2 (s − µθ) − ∂2 log fθ ∂s∂s (s − µθ) s ∼ N µθ, − ∂2 log fθ ∂s∂s −1 , approximately when s ≈ µθ Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 53. Asymptotic properties for synthetic likelihoods (Wood 2010) As the number of simulated statistics R → ∞ the maximizer ˆθ of liks(θ) is a consistent estimator. ˆθ is an unbiased estimator. ˆθ might not be in general Gaussian. It will be Gaussian if Σθ depends weakly on θ or when d = dim(s) is large. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini
  • 54. Algorithm 1 Bootstrap filter with M particles and threshold 1 ¯M M. Resamples only when ESS < ¯M. Step 0. Set j = 1: for m = 1, ..., M sample X (m) 1 ∼ p(X0), compute weights W (m) 1 = f(Y1|X (m) 1 ) and normalize weights w (m) 1 := W (m) 1 / M m=1 W (m) 1 . Step 1. if ESS({w (m) j }) < ¯M then resample M particles {X (m) j , w (m) j } and set W (m) j = 1/M. end if Set j := j + 1 and if j = n + 1, stop and return all constructed weights {W (m) j }m=1:M j=1:n to sample a single path. Otherwise go to step 2. Step 2. For m = 1, ..., M sample X (m) j ∼ p(·|X (m) j−1). Compute W (m) j := w (m) j−1p(Yj|X (m) j ) normalize weights w (m) j := W (m) j / M m=1 W (m) j and go to step 1. Umberto Picchini umberto@maths.lth.se, twitter:@uPicchini