My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling

INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARY
My data are incomplete and noisy!
Information-reduction statistical methods for knowledge
extraction can save your day: tools and opportunities for
modelling
Docent/Readership lecture
Lund, 7 June 2016
Umberto Picchini
Centre for Mathematical Sciences
Lund University
www.maths.lth.se/matstat/staff/umberto/
1 / 54

PREAMBLE: WHAT IS A DOCENT/READERSHIP
LECTURE?
It’s a lecture in a popular science context.
Target are research students within the entire Faculty of Science
at Lund University.
It should cover my own subject area, but not my own research
output.
My notation and level of exposition will be subject to the above.
2 / 54

I will discuss a few methods for parameter inference (aka
calibration, inverse problem) in presence of incomplete and
noisy data.
However I will start mentioning:
what I mean with incomplete and noisy observations;
why use stochastic modelling;
issues with state-of-art inference methods for dynamic
models.
I will focus on two methods based on information reduction
that use summary statistics of data: synthetic likelihoods and
approximate Bayesian computation.
These are powerful, general and ﬂexible methods but also very
easy to introduce to a general audience of researchers.
3 / 54

Goal is to show how state-of-art exact methods for parameter
estimation can spectacularly fail in some scenarios, including:
near chaotic systems;
noisy (stochastic) dynamical systems;
badly initialized optimization or Bayesian MCMC
algorithms.
But ﬁrst, what do I mean with “incomplete data”?
4 / 54

INCOMPLETE OBSERVATIONS
There can be many situations where we only have partial
information of the system of scenario of interest.
We oversimplify and call “incomplete” or “partially observed”
those experiments where not all the variables of interest are
observable.
This could mean that at least one of the following holds:
1. some variables are completely unobserved (measurements
for those variables are unavailable)
2. we have a system evolving continuously in time but we
only observe it a discrete time points;
3. the variables we observe are perturbed measurements of
the actual variables of interest.
5 / 54

Consider a noise perturbed “signal”
Typically values of our measurements do not exclusively
represent what we are really interested in measuring.
The unperturbed signal is unavailable because measurements
are corrupted with noise.
6 / 54

Therefore we have a system (think about a physical system)
having
an observable component Y
an unobservable/latent signal X
noise ε
And of course if we model time dynamics we could think of n
discrete-time measurements
Yti
= Xti
+ εti
, i = 1, ..., n
Or more in general
Yti
= f(Xti
, εti
)
for some arbitrary yet known function f(·).
7 / 54

AN EXAMPLE: CONCENTRATION OF A DRUG
Time course of a certain drug concentration in blood. This is
measured at discrete times.
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
8 / 54

We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
9 / 54

We may postulate a deterministic model:
dCt
dt
= −µCt
Ct = C0e−µt
, µ > 0
With measurements Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
There’s some discrepancy with the
ﬁt (residual error).
9 / 54

In previous slides you might have noticed the introduction of a
parameter µ > 0.
Even though an expert researcher might have a clue about
some reasonable value for µ, in reality this is at best nothing
more than a guess.
In fact, when we previously wrote that we can assume
Yti
= f(Xti
, εti
) = f(Cti
, εti
)
most often what we have is
Yti
= f(Xti
, θ, εti
) = f(Cti
, µ, εti
)
that is a dependence from an unknown (vector) parameter θ.
In our example θ ≡ µ.
10 / 54

My main interest:
to study and develop principled methods that return an
estimate of θ and its uncertainty analysis.
In particular, I am interested in modelling stochastic dynamics.
In the previous example Ct was evolving in a deterministic
fashion, given a ﬁxed starting value C0 and a ﬁxed value for µ.
In next slide I show an alternative, stochastic approach.
11 / 54

Add systemic (white) noise:
dC(t)
dt
= −µC(t) + “white noise”
And again Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
12 / 54

Add systemic (white) noise:
dC(t)
dt
= −µC(t) + “white noise”
And again Yti
= Cti
+ εti
, with εti
∼ N(0, σ2
ε)
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
020406080100
time in minutes
C12concentration
Dynamics are stochastic.
Residual error can’t be eliminated.
It’s always “there”.
12 / 54

The problem with stochastic models is that:
1. realizations from the model might not really resemble the
observed data...
2. this can happen even when simulate using a θ which is
close to its true value...
3. ...and even when we are simulating from the “true model”
(simulation studies).
Although this is understandable, it is also upsetting when our
methods are based on producing realizations meant to get close
to observations.
13 / 54

A NEARLY CHAOTIC MODEL
Two realizations from the Ricker model (discussed later), without
measurement noise.
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Small changes in a parameter cause major departures from data.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
log(r
2.5 3.0 3.5
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
14 / 54

Now, a modeller (say a statistician) does not necessarily
have the expertise nor the time to study the qualitative
behaviour of solutions of a series of candidate models.
most often the modeller wants to test a range of possible
models against available data.
Simulations from postulated models should produce
trajectories approximately resembling data.
Parameter estimation should be performed to identify
“best ﬁtting” parameters.
15 / 54

Notice for many years even simple stochastic models have been
terribly difﬁcult to calibrate against data.
It is usually impossible to write the likelihood function p(y1:T|θ)
in closed form.
It is also impossible to write ﬁltering densities for the states
p(x1:T|y1:T; θ).
Notation:
I am going to use lower case letters both for random
variables and for their realized values.
I use a1:T to denote a sequence (a1, a2, ..., aT) where T is the
time horizon.
I assume integer time indeces to ease notation.
16 / 54

SSM: STATE–SPACE MODEL
Graphically:
"!
#
"!
#
"!
#
"!
#
"!
#
"!
#
Yt−1 Yt Yt+1
Xt−1 Xt Xt+1
- - - -
6 6 6
... ... (Markov chain)
(Observations)
(Yt|Xt = xt) ∼ p(yt|xt) (Observation density)
(Xt+1|Xt = xt) ∼ p(xt+1|xt) (Transition density)
X0 ∼ π(x0) (Initial distribution)
Example: Yt a subject’s measured glycemia at time t. Xt the
actual glycemia at t.
17 / 54

THE LIKELIHOOD FUNCTION 1/2
It turns out that, even for such a simple construct, it is difﬁcult
to write the likelihood function.
In a SSM data are not independent, they are only conditionally
independent→ complication!:
p(y1:T|θ) = p(y1|θ)
T
t=2
p(yt|y1:t−1, θ) =?
We don’t have a closed for expression for the product above
because we do not know how to calculate p(yt|y1:t−1, θ).
Let’s see why.
18 / 54

THE LIKELIHOOD FUNCTION 2/2
In a SSM the observed process is assumed to depend on the latent
Markov process {Xt}: we can write
p(y1:T|θ) = p(y1:T, x0:T|θ)dx0:T = p(y1:T|x0:T, θ)
use cond. indep.
× p(x0:T|θ)
use Markovianity
dx0:T
=
T
t=1
p(yt|xt, θ) × p(x0|θ)
T
t=1
p(xt|xt−1, θ) dx0:T
Problems
The expression above is a (T + 1)-dimensional integral
For most (nontrivial) models, transition densities p(xt|xt−1) are
unknown
19 / 54

However today we have quite a number of reliable Monte Carlo
solutions to the integration problem.
I am not going to introduce state-of-art methods for SSM but these
are essentially based on particle filters (or sequential Monte Carlo)
methods.
particle marginal methods and particle MCMC (Andrieu and
Roberts 2009; Andrieu et al. 2010) for Bayesian inference.
iterated filtering (Ionides et al. 2011, 2015) for maximum
likelihood inference.
As shown in Andrieu and Roberts 2009:
1. obtain an approximation of the likelihood ˆp(y1:T|θ) using particle
filters;
2. plug ˆp(y1:T|θ) into a MCMC algorithm for Bayesian inference on
θ.
3. then the MCMC returns samples exactly from π(θ|y1:T).
20 / 54

If you are interested in a quick review of particle methods for
parameters inference (not state inference) check my slides on
SlideShare http://goo.gl/4aZxL1
As I said in this presentation I focus on synthetic likelihoods
and approximate Bayesian computation.
I now consider a simple example where the celebrated particle
marginal methodology of Andrieu and Roberts1, which is
supposed to return exact Bayesian inference, does not work.
1
C. Andrieu and G. Roberts 2009, Annals of Statistics 37(2).
21 / 54

THE STOCHASTIC RICKER MODEL
yt ∼ Pois(φNt)
Nt = r · Nt−1 · e−Nt−1+et , et ∼ N(0, σ2)
It can be used to describe the evolution in time of a population
of size Nt.
r is the intrinsic growth rate of the population;
φ is a scale parameter
The process noise et can be interpreted as environmental
noise. Assume the et as iid.
This is a hidden Markov model, as the dynamics of {Nt} are
Markovian and we assume measurements y1:T to be
conditionally independent given {Nt}.
22 / 54

Here is a realization of length T = 50 from the Ricker model.
050150250
y
0510152025
N
−0.40.00.20.4
0 10 20 30 40 50
e
time
myobservedricker
Ground-truth parameters are log r = 3.8, φ = 10 and σ = 0.3 (and
imposing N0 = 7 and e0 = 0). Same settings as in S. Wood 20102
2
S. Wood 2010, Nature 466, pp. 1102–1104.
23 / 54

SOFTWARE CODE
The simulation results in the next three slides can be
reproduced using the code at
https://github.com/umbertopicchini/pomp-ricker
24 / 54

RUNNING pomp ricker − pmcmc.R FOR EXACT BAYESIAN INFERENCE
Suppose we are only interested in estimating the parameter r from
data, while remaining quantities are ﬁxed to their ground-truth
values.
Particle MCMC works well here. So exact inference is possible.
Here is an MCMC chain of length 2000 with likelihood estimated via
bootstrap ﬁlter (1000 particles used). We let r start at r = 12.2.
True value is r = 44.7, estimated posterior mean is 44.6 [38.5,51.7]
0 500 1000 1500 2000
20304050
Iterations
25 / 54

So all good in the considered example.
We might reasonably imagine that if our model is “less
stochastic” (a smaller σ) it should be even easier to conduct
inference.
Recall that
Nt = r · Nt−1 · e−Nt−1+et
, et ∼ N(0, σ2
)
Modellers don’t know a-priori the values of underlying
parameters.
Suppose we now use data generated with σ = 0.01 (instead of
σ = 0.3)
26 / 54

Here is what happens with the same conditions as before,
except for using data generated with σ = 0.01.
−1200−1050−900
loglik
−7−6−5−4−3
log.prior
12141618
nfail
131517
r
0 500 1000 1500 2000
PMCMC iteration
PMCMC convergence diagnostics
The chain (lower panel) is stuck at the wrong mode! r ≈ 18
hence log r = 2.9.
It gets stuck even if we use more particles. 27 / 54

As beautifully illustrated in Fasiolo et al.3
, the very interesting reason
why the estimation fails for nearly deterministic dynamics is the
following:
2.0 2.5 3.0 3.5 4.0 4.5
−120−110−100−90−80
σ = 0.3
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.1
2.0 2.5 3.0 3.5 4.0 4.5
−140−120−100−80
σ = 0.05
2.0 2.5 3.0 3.5 4.0 4.5
−180−140−100
σ = 0.01
Log−likelihood
log(r)
Figure: Black: the true likelihood for log r. Red: the particle ﬁlter
approximation. Figure from Fasiolo et al. 2016.
3
M. Fasiolo, N. Pya and S. Wood 2016. Statistical Science 31(1),pp. 96-118
28 / 54

However the ﬁlter being unable to approximate the likelihood is only
consequence of something more subtle: look at the exact loglikelihood
for non-stochastic dynamics, i.e. here σ = 0.
2.5 3.0 3.5 4.0
−15−10−5
Ricker
log(r)
271217
nt
−5 −4
−35−25−15−50
Pen
lo
−50
Varley
0−5
Mayna
Log−likelihood(103
)
Figure: Black: the true loglikelihood when σ = 0. Grey: bifurcation diagram.
Figure from Fasiolo et al. 2016.
Go here if you are unfamiliar with bifurcations.
29 / 54

The instability of some models for a small quantity of noise (σ)
produces major differences in simulated trajectories for small
perturbations in the parameters.
051015
nt
Time
5 10 15 20 25
−260−220−180−140
Log−likelihood
2.5 3.0
Figure: One path generated with log r = 3.8 (black) and one generated with
log r = 3.799 (red) when σ = 0.
30 / 54

A CHANGE OF PARADIGM
from S. Wood, Nature 2010:
“Naive methods of statistical inference try to make the model
reproduce the exact course of the observed data in a way that the real
system itself would not do if repeated.”
“What is important is to identify a set of statistics that is sensitive to
the scientiﬁcally important and repeatable features of the data, but
insensitive to replicate-speciﬁc details of phase.”
In other words, with complex, stochastic and/or chaotic model
we should try to match features of the data, not the path of the
data themselves.
31 / 54

SYNTHETIC LIKELIHOODS
y: observed data, from static or dynamic models
s(y): (vector of) summary statistics of data, e.g. mean,
autocorrelations, marginal quantiles etc.
assume
s(y) ∼ N(µθ, Σθ)
an assumption justiﬁable via second order Taylor
expansion (same as in Laplace approximations).
µθ and Σθ unknown: estimate them via simulations.
32 / 54

INTRODUCTION GOAL INCOMPLETE DATA STOCHASTIC MODELLING SYNTHETIC LIKELIHOODS ABC SUMMARYnature09319-f2.2.jpg (JPEG Image, 946 × 867 pixels) - Scaled (84%) http://www.nature.com/nature/journal/v466/n7310/images/nature09319...
Figure: Schematic representation of the synthetic likelihoods procedure.
33 / 54

For ﬁxed θ we simulate Nr artiﬁcial datasets y∗
1 , ..., y∗
Nr
and
compute corresponding (possibly vector valued) summaries
s∗
1 , ..., s∗
Nr
.
compute
ˆµθ =
1
Nr
Nr
i=1
s∗
i , ˆΣθ =
1
Nr − 1
Nr
i=1
(s∗
i − ˆµθ)(s∗
i − ˆµθ)
compute the statistics sobs for the observed data y.
evaluate a multivariate Gaussian likelihood at sobs
liks(θ) := exp(ls(θ)) = N(sobs; ˆµθ, ˆΣθ) ∝
1
|ˆΣθ|
e−(sobs−ˆµθ)ˆΣ−1
θ (sobs−ˆµθ)/2
This likelihood can be maximized for a varying θ or be plugged
within an MCMC algorithm for Bayesian inference
ˆπ(θ|sobs) ∝ liks(θ)π(θ).
34 / 54

For the Ricker model Wood (2010) uses 13 summaries,
including:
the sample mean of observations ¯y;
number of zeros in the dataset;
autocovariances up to lag 5;
estimates for β0 and β1 from ﬁtting
E(y0.3
t ) = β0y0.3
t−1 + β1y0.6
t−1
and a few more summaries...(not important to be
mentioned here, but you got the idea).
So we have sobs = (¯y, #zeros, autocovars, β0, β1, ...).
35 / 54

RUNNING pomp ricker − synlik.R
We consider the dataset generated with the settings where
particle methods have failed (same starting values etc.), i.e.
σ = 0.01.
Here follow synthetic likelihood estimates:
r σ φ
starting value 12.18 1 20
true value 44.7 0.01 10
SL estimate 45.25 0.23 10.16
We maximised liks(θ) using Nelder–Mead with Nr = 5000
simulations.
Standard errors could be found via proﬁle likelihood (not
reported).
36 / 54

BAYESIAN SYNTHETIC LIKELIHOODS
Just plug liks(θ) in a Metropolis-Hastings MCMC procedure to
sample from ˆπ(θ|sobs) ∝ liks(θ)π(θ).
Figure is from Price et al. (2016). They consider the “nicely
behaved” data obtained with σ = 0.3.
mated log SL at θ = (3.8, 10, 0.3) . A ‘-’ indicates that a result is not available for uBSL
as the value of n is too small.
n acc. rate (%) ESS log r ESS σe ESS φ sd log pA,n(y|θ = (3.8, 10, 0.3) )
20 2.4/1.8 11/8 13/10 18/13 large
30 8.2/7.7 23/22 27/25 38/31 4.1
40 12.5/12.6 29/28 34/32 44/42 2.5
50 15.9/15.7 30/31 35/35 45/45 1.9
80 21.8/21.4 29/29 34/32 47/47 1.2
100 24.1/23.7 25/27 30/30 41/39 1.0
250 29.6/29.5 13/14 15/17 20/23 0.6
3 3.5 4 4.5
0
1
2
n=30
n=100
n=250
(a) log r
8 10 12
0
0.2
0.4
0.6
0.8
n=30
n=100
n=250
(b) φ
-0.2 0 0.2 0.4 0.6
0
1
2
3
4
n=30
n=100
n=250
(c) σe
Figure 6: Posterior estimates for log r, σe and φ of the Ricker model when using BSL with
various values of n.Figure: Posteriors for several values of Nr. True parameter values are:
log r = 3.8, φ = 10, σ = 0.3 37 / 54

APPROXIMATE BAYESIAN COMPUTATION (ABC)
Another methodology, also using information-reduction via
summaries, is Approximate Bayesian Computation (ABC).
ABC has received enormous interest. A list of review papers is
provided in the references section.
You can also check my own intro to ABC.
38 / 54

LIKELIHOOD FREE REJECTION SAMPLING
Recall y is data.
We wish to sample from π(θ|y) ∝ p(y|θ)π(θ).
1. simulate from the prior θ∗ ∼ π(θ)
2. plug θ∗ in your model and simulate artiﬁcial data y∗ [this
is the same as writing y∗ ∼ p(y|θ∗)]
3. if y∗ = y store θ∗. Go to step 1 and repeat.
The above is a likelihood free algorithm: it does not require
knowledge of the expression of the likelihood p(y|θ).
Each accepted θ∗ is such that θ∗ ∼ π(θ|y) exactly.
39 / 54

ABC REJECTION (PRITCHARD ET AL.4
)
Same as before, but comparing s(y) with s(y∗) for “appropriate”
summaries s(·) and a small tolerance > 0.
1. simulate from the prior θ∗ ∼ π(θ)
2. simulate a y∗ ∼ p(y|θ∗), compute s(y∗)
3. if s(y∗) − s(y) < store θ∗. Go to 1 and repeat.
Samples are from π (θ|s(y)) with
π (θ|s(y)) ∝
Y
p(y∗
|θ∗
)π(θ∗
)IA ,y (y∗
)dy∗
A ,y(y∗) = {y∗ ∈ Y; s(y∗) − s(y) < }.
4
Pritchard et al. 1999, Molecular Biology and Evolution, 16:1791-1798.
40 / 54

EXAMPLE: g-AND-k DISTRIBUTIONS
Mixtures of Gaussians are often used to describe complex,
nonstandard distributions.
a mixture of two-Gaussians requires specifying 5
parameters.
apparently, it is sometimes challenging to estimate such
parameters due to unidentiﬁability (Marin and Robert5).
simulating (sampling) from mixtures is an optimization
problem (can be computer intensive).
5
Marin and Robert, Bayesian Core, Springer 2007.
41 / 54

g-and-k distributions only require 4 parameters and it’s fast to
simulate from them.
g-and-k distributions have no closed-form likelihood, but we can
simulate their quantiles.
F−1
(x; A, B, g, k) = A + B 1 + 0.8 ·
1 − exp(−g · r(x))
1 + exp(−g · r(x))
(1 + r2
(x))k
r(x)
with r(x) ∼ N(0, 1) the xth quantile from a standard Gaussian.
data
0 5 10 15 20 25 30
0
500
1000
1500
2000
2500
3000
3500
4000
4500
n = 10, 000 samples generated with A = 3, B = 1, g = 2, k = 0.5.
42 / 54

I wrote an MCMC algorithm using ABC with summaries the
20th, 40th, 60th, 80th empirical quantiles and the skewness
value
s(y) = (q20, q40, q60, q80, skew(y)).
Again, we simulate datasets y∗, compute s(y∗) and compare
them with s(y) at each MCMC iteration.
I let the threshold decrease progressively during the
algorithm.
43 / 54

Red lines are true parameter values.
A ×104
0 1 2 3 4 5 6
2.5
3
3.5
4
4.5
5
B ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
1
2
3
4
5
6
g ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
10
k ×104
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.5
1
1.5
2
It is evident that I reduced at iteration 10,000 and then at
iteration 20,000.
44 / 54

SUMMARY
We have discussed plug-and-play methods that only require the
ability to simulate from the model.
How to choose summaries s(·) is a delicate and open problem.
However there exist much literature available.
Information-reduction methods are very general and in many
cases are the only possible option for inference.
ABC and SL can be used for a short pilot run, to obtain a better
starting value for θ to be used in more accurate methods.
For example Owen et al.6
use ABC to obtain a starting value for
particle MCMC.
6
Owen et al. 2015. Likelihood free inference for Markov processes: a
comparison. Statistical applications in genetics and molecular biology, 14(2),
pp.189-209.
45 / 54

Our suggestion, when dealing with a new model or a new
dataset, is not to settle on a single methodology, but instead
try to perform a battery of algorithms.
This require method developers to share their software codes to
ease methods reproducibility.
This is still not common practice.
46 / 54

If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
7
umberto@maths.lth.se, ofﬁce MH:321, Matematikcentrum
47 / 54

If you wish to review some of the presented concepts, slides for this
presentation are available on SlideShare at
http://goo.gl/gg0Cqq.
R code to reproduce the Ricker model results is at
https://github.com/umbertopicchini/pomp-ricker/
MATLAB code for ABC and g-and-k distributions is at
https://github.com/umbertopicchini/abc_g-and-k/
Feel free to write me or drop by7
for a chat.
Thank You
7
umberto@maths.lth.se, ofﬁce MH:321, Matematikcentrum
47 / 54

REFERENCES ON SYNTHETIC LIKELIHOODS
1. Fasiolo, M., Pya, N. and Wood, S.N., 2016. A Comparison of
Inferential Methods for Highly Nonlinear State Space Models in
Ecology and Epidemiology. Statistical Science, 31(1), pp.96-118.
2. Meeds, E. and Welling, M., 2014. GPS-ABC: Gaussian process
surrogate approximate Bayesian computation. arXiv:1401.2838.
3. Price, L.F., Drovandi, C.C., Lee, A. and Nott, D.J., 2016. Bayesian
synthetic likelihood. http://eprints.qut.edu.au/92795/
4. Wood, S.N., 2010. Statistical inference for noisy nonlinear
ecological dynamic systems. Nature, 466(7310), pp.1102-1104.
(4) is where SL got ﬁrst introduced. Our talk is largely based on (1).
(2) merges SL with ABC. (3) studies the performance of SL in a
(pseudo-marginal) Bayesian setting.
48 / 54

SOFTWARE FOR SYNTHETIC LIKELIHOODS
R package pomp: http://kingaa.github.io/pomp
R package synlik:
https://mfasiolo.github.io/synlik_release/
49 / 54

ABC SOFTWARE
EasyABC, R package. Research article.
abc, R package. Research article
abctools, R package. Research article. Focusses on tuning.
Lists with more options here and here .
examples with implemented model simulators (useful to
incorporate in your programs).
50 / 54

REVIEWS ON ABC
Fairly extensive but accessible reviews:
1. Sisson and Fan 2010
2. (with applications in ecology) Beaumont 2010
3. Marin et al. 2010
Simpler introductions:
1. Sunn˚aker et al. 2013
2. (with applications in ecology) Hartig et al. 2013
Review speciﬁc for dynamical models:
1. Jasra 2015
51 / 54

NON-REVIEWS ON ABC, SPECIFIC FOR DYNAMICAL
MODELS
1. SMC for Parameter estimation and model comparison:
Toni et al. 2009
2. Markov models: White et al. 2015
3. SMC: Sisson et al. 2007
4. SMC: Dean et al. 2014
5. SMC: Jasra et al. 2010
6. MCMC: Picchini 2013
52 / 54

MORE SPECIALISTIC ABC RESOURCES
selection of summary statistics: Fearnhead and Prangle
2012.
review on summary statistics selection: Blum et al. 2013
expectation-propagation ABC: Barthelme and Chopin 2012
Gaussian Processes ABC: Meeds and Welling 2014
ABC model choice: Pudlo et al 2015
53 / 54

BLOG POSTS AND SLIDES ON ABC
1. Christian P. Robert often blogs about ABC (and beyond:
it’s a fantastic blog!)
2. an intro to ABC by Darren J. Wilkinson
3. Two posts by Rasmus B˚a˚ath here and here
4. my own intro to ABC
5. Tons of slides at SlideShare.
54 / 54

VERIFY SUMMARIES DISTRIBUTION
The crucial assumption used in synthetic likelihoods is that
s ∼ N(µθ, Σθ).
We could check that (appendix in Wood 2010 for details):
(s∗ − ˆµθ) ˆΣ−1
θ (s∗ − ˆµθ) ∼ χ2
d with d = dim(s).
That is for Nr simulated summaries, plot the ordered
values of (s∗
j − ˆµθ) ˆΣ−1
θ (s∗
j − ˆµθ) versus the quantiles of
χ2
d, j = 1, ..., Nr.
For graphical purposes this can be done on log scale.
56 / 54

we could separately check that each dimension of the
simulated summaries is approximately normal (basically
produce d separate qqplots).
same as the above but for the observed summaries.
Next slide produces the three types of quantile plots for the
Ricker model when log(r) = 3.8, φ = 10, σ = 0.3.
Recall for the Ricker model we have d = 13 summaries.
57 / 54

1.0 1.5 2.0 2.5 3.0 3.5
123456
log theoretical quantiles
logobservedquantiles
−3 −2 −1 0 1 2 3
−4024
Marginal Q−Q Plot
N(0,1) quantiles
marginalquantiles
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−1.50.01.0
Normal Q−Q Plot
Theoretical Quantiles
SampleQuantiles
Figure: Upper left: log χ2
d quantiles. Upper right N(0, 1) quantiles
separately for the 13 summaries. Lower left: N(0, 1) quantiles for the
observed summaries. 58 / 54

JUSTIFICATION OF GAUSSIANITY
Assuming Gaussianity for summaries s(·) can be justiﬁed from
a standard Taylor expansion.
Say that fθ(s) is the true (unknown) joint density of s.
Expand fθ(s) around its mode µθ:
log fθ(s) ≈ log fθ(µθ) +
1
2
(s − µθ)
∂2 log fθ
∂s∂s
(s − µθ)
hence
fθ(s) ≈ const × exp −
1
2
(s − µθ) −
∂2 log fθ
∂s∂s
(s − µθ)
s ∼ N µθ, −
∂2 log fθ
∂s∂s
−1
, approximately when s ≈ µθ
59 / 54

ASYMPTOTIC PROPERTIES FOR SYNTHETIC
LIKELIHOODS
As the number of simulated statistics Nr → ∞
the maximizer ˆθ of liks(θ) is a consistent estimator.
ˆθ is an unbiased estimator.
ˆθ might not be in general Gaussian. It will be Gaussian if
Σθ depends weakly on θ or when d = dim(s) is large.
60 / 54

My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling

Recommended

Recommended

More Related Content

Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling

Similar to My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling (20)

Recently uploaded

Recently uploaded (20)

My data are incomplete and noisy: Information-reduction statistical methods for knowledge extraction can save your day: tools and opportunities for modelling