SlideShare a Scribd company logo
1 of 65
Download to read offline
Bayesian inference for mixed-effects
models driven by SDEs and other
stochastic models: a scalable approach.
Umberto Picchini
Dept. Mathematical Sciences, Chalmers and Gothenburg University
7@uPicchini
Statistics seminar at Maths dept., Bristol University, 1 April, 2022
1
2
A classical problem of interest in biomedicine is the analysis of
repeated measurements data.
For example modelling repeated measurements of drug
concentrations (pharmacokinetics/pharmacodynamics)
Here we have concentrations of theophylline across 12 subjects.
3
Tumor growth in mice 1.
0 5 10 15 20 25 30 35 40
days
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
log
volume
(mm
3
)
group 3
Modelling tumor growth on 8 mice (we compared between
different treatments).
1
P and Forman (2019). Journal of the Royal Statistical Society: Series C
4
Neuronal data:
215
t was
e de-
ation,
mulus
ercial
level
d set
each
oten-
brane
es (if
esent
ticle.
with
Figure 1: Depolarization [mV] vs time [sec].
We may focus on what happens between spikes. So called
inter-spikes-intervals data (ISIs).
5
Inter-spikes-intervals data (ISIs):
0 50 100 150 200 250 300 350
Time (msec)
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
depolarization
mV
Figure 2: Observations from 100 ISIs.
P, Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion
leaky integrate-and-fire neuronal model for a slowly fluctuating signal.
Neural Computation, 20(11), 2696-2714. 6
With mixed-effects models (aka random-effects) we fit
simultaneously discretely observed data from M “subjects”
(= units).
The reason to do this is to perform inference at the population
level and better account for all information at hand.
Assume for example that for some covariate X
yi
j
|{z}
observation j in unit i
= Xi
j(φi
) + i
j, i = 1, ..., M; j = 1, ..., ni
φi
∼ N(η, σ2
η), individual random effects
The random effects have “population mean” η and “population
variance σ2
η”.
It’s typically of interest to estimate population parameters
(η, σ2
η) not the subject-specific φi.
7
So in this case each trajectory is guided by its own φi.
However all trajectories have something in common, the shared
parameters (η, σ2
η), since each φi ∼ N(η, σ2
η).
8
Mixed-effect methodology is now standard.
About 40 years of literature available.
It could turn tricky though to use this methodology when data
are observations from stochastic processes.
For example, when mixed-effects modelling are driven by
stochastic differential equations (SDEs).
There exist about 50 papers on fitting SDEs with mixed effects,
but these always consider some constraint that makes the
models not very general.
https://umbertopicchini.github.io/sdemem/
9
(this slide: courtesy of Susanne Ditlevsen)
 
The concentration of a drug in blood
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
0
20
40
60
80
100
time in minutes
C12
concentration
1
Exponential decay
dC(t)
dt
= −µC(t)
C(t) = C(0)e−µt
*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
0
20
40
60
80
100
time in minutes
C12
concentration
2
Exponential decay with noise
dC(t) = −µC(t)dt + σC(t)dW(t)
C(t) = C(0) exp −(µ + 1
2 σ2
)t + σW(t)

*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
0
20
40
60
80
100
time in minutes
C12
concentration
Different realizations
dC(t) = −µC(t)dt + σC(t)dW(t)
C(t) = C(0) exp −(µ + 1
2 σ2
)t + σW(t)

*
*
*
*
* *
*
* * *
0 20 40 60 80 100 120
0
20
40
60
80
100
time in minutes
C12
concentration
10
Stochastic differential equation
mixed-effects models
(SDEMEMs)
SDEMEMs: model structure
The state-space SDEMEM follows









Y i
tj
= h(Xi
tj
, i
tj
) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t , φi
) dt +
p
β(Xi
t , φi) dWi
t , i = 1, ..., M
φi
∼ π(φi
|η)
dWi
t ∼iid N(0, dt)
φi
and η are vectors of random and fixed (population) parameters.
• example: Y i
tj
= Xi
tj
+ i
tj
, but we are allowed to take h(·)
nonlinear with non-additive errors.
• Latent diffusions Xi
t share a common functional form, but have
individual parameters φi
, and are driven by individual Brownian
motions Wi
t .
11





Y i
t = h(Xi
t, i
t) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t, φi) dt +
p
β(Xi
t, φi) dWi
t , i = 1, ..., M
φi ∼ π(φi|η)
SDEMEMs are flexible. Allow explanation of three levels of
variation:
• Intra-subject random variability modelled by a diffusion
process Xi
t.
• Variation between different units is taken into account
according to the (assumed) distribution of the φi’s.
• Residual variation is modeled via a measurement error ξ.
Goal: exact Bayesian inference for θ = [η, ξ].
12
What we want to do is:
produce (virtually) exact Bayesian inference for general,
nonlinear SDEMEMs.
“General” means:
• the SDEs can be nonlinear in the states Xt;
• the error-model for Yt does not have to be linear in the Xt;
• the error-model does not have to be additive, i.e. does not
have to be of the type Yt = F · Xt + t
• t does not have to be Gaussian distributed;
• random effects φi can have any distribution.
What we come up with is essentially an instance of the
pseudomarginal method (Andrieu,Roberts 2009), embedded
into a Gibbs sampler with careful use of blocking strategies
(and more...).
13
As it sometimes happen, independent work similar to ours was
carried out simultaneously in
Botha, I., Kohn, R.,  Drovandi, C. (2021). Particle methods
for stochastic differential equation mixed effects models.
Bayesian Analysis, 16(2), 575-609.
14
Bayesian inference for SDEMEMs
The joint posterior





Y i
t = h(Xi
t , i
t) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t , φi
) dt +
p
β(Xi
t , φi) dWi
t , i = 1, ..., M
φi
∼ π(φi
|η), i = 1, ..., M
• observed data y = (Y i
1:ni
)M
i=1 across M individuals;
• latent x = (Xi
1:ni
)M
i=1 at discrete time-points;
We have the joint posterior
π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ),
where (from now on assume ni ≡ n for all units).
π(φ|η) =
M
Y
i=1
π(φi
|η), π(x|φ) =
M
Y
i=1
π(xi
1)
n
Y
j=2
π(xi
j|xi
j−1, φi
)
| {z }
Markovianity
,
π(y|x, ξ) =
M
Y
i=1
n
Y
j=1
π(yi
j|xi
j, ξ)
| {z }
condit. independence
.
15
π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ),
while several components of the joint π(η, ξ, φ, x|y) may have
tractable conditionals, sampling from such a joint posterior can
still be an horrendous task.→ slow parameters surface
exploration.
Reason being that unknown parameters and x are highly
correlated.
Hence a Gibbs sampler would very badly mix.
The best is, in fact, to sample from either of the following
marginals
π(η, ξ, φ|y) =
Z
π(η, ξ, φ, x|y)dx
or
π(η, ξ|y) =
Z Z
π(η, ξ, φ, x|y)dxdφ 16
Marginal posterior over parameters and random effects
• By integrating x out, the resulting marginal is
π(η, ξ, φ|y) ∝ π(η)π(ξ)
M
Y
i=1
π(φi
|η)π(yi
|ξ, φi
).
• The data likelihood π(yi
|ξ, φi
) for the generic i-th unit is
π(yi
|ξ, φi
) ∝
Z n
Y
j=1
π(yi
j|xi
j, ξ) × π(xi
1)
n
Y
j=2
π(xi
j|xi
j−1, φi
)dxi
j
• Typical problem: transitions density π(xi
j|xi
j−1, ·) unknown;
• integral is generally intractable but we can estimate it via Monte
Carlo.
• For (very) simple cases, such as linear SDEs, we can apply the
Kalman filter and obtain an exact solution. 17
We assume a generic nonlinear SDE.
• Transition density π(xi
j|xi
j−1, φi
) is unknown,
• luckily we can still approximate the likelihood integral unbiasedly
• sequential Monte Carlo (SMC) can be used for the task
• when π(xi
j|xi
j−1, φi
) is unknown, we are still able to run a
numerical discretization methods with step-size h  0 and
simulate from the approximate πh(xi
j|xi
j−1, φi
), e.g.
xi
t+h = xi
t + α(xi
t, φi
)h +
q
β(xi
t, φi) · ui
t
ui
t ∼iid N(0, h)
this is the Euler-Maruyama discretization scheme (possible to use
more advances schemes).
Hence xi
t+h|xi
t ∼ πh(xi
t+h|xi
t, φi
) (which is clearly Gaussian).
18
We approximate the observed data likelihood as
πh(yi
|ξ, φi
) ∝
Z n
Y
j=1
π(yi
j|xi
j(h), ξ) × π(xi
1)
n
Y
j=2
πh(xi
j|xi
j−1, φi
)dxi
j
but now for simplicity we stop emphasizing the reference to h.
So we have the Monte-Carlo approximation
π(yi
|ξ, φi
) = E
 n
Y
j=1
π(yi
j|xi
j, ξ)

≈
1
N
N
X
k=1
n
Y
j=1
π(yi
j|xi
j,k, ξ),
xi
j,k ∼iid πh(xi
j|xi
j−1, φi
), (k = 1, ..., N)
and the last sampling can of course be performed numerically (say by
Euler-Maruyama).
19
The efficient way to produce Monte Carlo approximations for
nonlinear time-series observed with error is Sequential Monte
Carlo (aka particle filters).
With SMC the N Monte Carlo draws are called “particles”.
The secret to SMC is
• “propagate particles xi
t forward”: xi
t → xi
t+h,
• “weight” the particles proportionally to π(y|x),
• “resample particles according to their weight”. The last
operation is essential to let particles track the observations.
I won’t get into details. But the simplest particle filter is the
bootstrap filter (Gordon et al. 1993)2.
2
Useful intro from colleagues in Linköping and Uppsala:
Naesseth, Lindsten, Schön. Elements of Sequential Monte Carlo. Foundations and Trends in
Machine Learning, 12(3):307–392, 2019
20
Estimating the observed data likelihood with the bootstrap
filter
An unbiased non-negative estimation of the data likelihood
can be computed with the bootstrap filter SMC method
using N particles:
π̂ui (yi
|ξ, φi
) =
1
Nn
n
Y
t=1
N
X
k=1
π(yi
t|xi
t,k, ξ), i = 1, ..., M
Recall, for particle k we have
xi
t+h,k = xi
t,k + α(xi
t,k, φi
)h +
q
β(xi
t,k, φi) · ui
t,k
ui
t,k ∼iid N(0, h)
We will soon see that it is important to keep track of the
apparently uninteresting ui
t,k variates. 21
The “Blocked” Gibbs algorithm
Recall:
• random effects φi ∼ π(φi|η), i = 1, .., M
• population parameters η ∼ π(η)
• measurement error ξ ∼ π(ξ)
• SMC variates: ui ∼ g(ui), i = 1, .., M.
We found very important to “block” the generation of the u
variates.
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Once the u variables are sampled and accepted in step 1, we
reuse them when computing (in step 2) π̂ui (yi|ξ, φi).
Better performance compared to generating new u in step 2. 22
In practice it is a Metropolis-Hastings within Gibbs
• Using the approximated likelihood π̂u, we construct a
Metropolis-Hastings within Gibbs algorithm:
1. π(φi
, ui
|η, ξ, yi
) ∝ π(φi
|η)π̂ui (yi
|ξ, φi
)g(ui
), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi
|ξ, φi
),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi
|η).
• With this scheme the acceptance probability in the first
step is
min

1 ,
π(φi∗|·)
π(φi|·)
×
π̂ui∗ (yi|φi∗, ·)
π̂ui (yi|φi, ·)
×
q(φi|φi∗)
q(φi∗|φi)

.
• Often computer expensive due to the many particles
needed to keep the variance of π̂ui∗ (yi|φi∗, ·) small.
23
CPMMH: correlating the likelihood approximations
Smart idea proposed by Deligiannidis et al. (2018): control instead
the variance of the likelihood ratio.
• Let us consider the acceptance probability in step 1
min

1 ,
π(φi∗
|·)
π(φi|·)
×
π̂ui∗ (yi
|φi∗
, ·)
π̂ui (yi|φi, ·)
×
q(φi
|φi∗
)
q(φi∗|φi)

.
• The main idea in CPMMH is to induce a positive correlation
between π̂ui∗ (yi
|φi∗
, ·) and π̂ui (yi
|φi
, ·). Which reduces the ratio
variance while using fewer particles in the particle filter.
• Correlation induced via Crank–Nicolson :
ui∗
= ρ · ui,(j−1)
+
p
1 − ρ2 · ω, ω ∼ N(0, Id)
ρ ∈ (0.9, 0.999)
24
CPMMH: selecting number of particles
• For PMMH (no correlated particles) we selected the
number of particles N such that the variance of the
log-likelihood σ2
N is σ2
N ≈ 2 at some fixed parameter value.3
• For CPMMH N is selected such that σ2
N ≈ 2.162/(1 − ρ2
l )
where ρl is the estimated correlation between π̂ui (yi|ξ, φi)
and π̂ui∗ (yi|ξ, φi).4
• A drawback with the CPMMH algorithm is that we have
to store the random numbers u = (u1, . . . , uM )T in
memory, which can be problematic if we have a very large
number of particles N, many subjects, or long time-series.
3
Sherlock, Thiery, Roberts, Rosenthal (2015). AoS.
4
Choppala, Gunawan, Chen, M.-N Tran, Kohn, 2016.
25
We want to show how to improve scalability for increasing M.
But first, some illustrative applications.
26
Applications
Ornstein-Uhlenbeck SDEMEM: model structure
Let us consider the following Ornstein-Uhlenbeck SDEMEM
(
Y i
t = Xi
t + i
t, i
t
indep
∼ N(0, σ2
 ), i = 1, ..., 40
dXi
t = θi
1(θi
2 − Xi
t)dt + θi
3dWi
t .
• The random effects φi = (log θi
1, log θi
2, log θi
3) follow
φi
j|η
indep
∼ N(µj, τ−1
j ), j = 1, . . . , 3,
where η = (µ1, µ2, µ3, τ1, τ2, τ3)
• This induces a semi-conjugate prior on η. Thus, we have a
tractable Gibbs step when updating η.
27
Ornstein-Uhlenbeck SDEMEM: simulated data
We have M = 40 individuals.
0 2 4 6 8 10
Time
0
5
10
15
20
25
30
Figure 3: Simulated data from the OU-SDEMEM model.
28
Ornstein-Uhlenbeck SDEMEM: different inference meth-
ods
We compare the following MCMC methods: we always use the
outlined Metropolis-within-Gibbs sampler, with likelihood
computed with several flavours:
• “Kalman”: Computing the data likelihood exactly with the
Kalman filter.
• “PMMH”: Estimating the data likelihood with the
bootstrap filter and no correlated likelihoods.
• “CPMMH-099”: Estimating the data likelihood with the
bootstrap filter with correlated likelihoods, with
correlation ρ = 0.99.
• “CPMMH-0999”: same as above, with correlation
ρ = 0.999.
29
Ornstein-Uhlenbeck SDEMEM: inference results for η
−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2
µ1
0
1
2
3
4
Density
1.8 2.0 2.2 2.4 2.6 2.8
µ2
0
1
2
3
4
5
6
Density
−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2
µ3
0
1
2
3
4
Density
2 4 6 8 10 12
τ1
0.0
0.1
0.2
0.3
0.4
Density
2 4 6 8 10 12 14
τ2
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Density
2 4 6 8 10
τ3
0.0
0.1
0.2
0.3
0.4
0.5
Density
Figure 4: OU SDEMEM: marginal posterior distributions for
η = (µ1, µ2, µ3, τ1, τ2, τ3). Almost overlapping lines are: Kalman, PMMH,
CPMMH-099, vertical lines are ground truth.
30
Ornstein-Uhlenbeck SDEMEM: comparing efficiency
The ones below are all MH-within-Gibbs algorithms:
Algorithm ρ N CPU (min) mESS mESS/min Rel.
Kalman - - 1.23 488.51 396.37 5684.46
PMMH 0 3000 4076.94 450.13 0.11 1
CPMMH-099 0.99 100 200.92 418.22 2.09 19
CPMMH-0999 0.999 50 110.66 323.77 2.93 26.6
Figure 5: OU SDEMEM. Correlation ρ, number of particles N,
CPU time (minutes), minimum ESS (mESS), minimum ESS per
minute (mESS/min) and relative minimum ESS per minute (Rel.) as
compared to PMMH-naive. All results are based on 50k iterations of
each scheme, and are medians over 5 independent runs of each
algorithm on different data sets. We could only produce 5 runs due to
the very high computational cost of PMMH.
31
Tumor growth simulation study
This example is inspired by another publication: there, P. and
Forman analyzed real experimental data of tumor growth on
mice, using SDEMEMs.
However, here to illustrate the use of our inference method we
use a slightly simpler model.
32
Figure 6: Source http://www.nature.com/articles/srep04384
33
34
Tumor growth SDEMEM: model structure
Let us now consider the following SDEMEM 5





Y i
t = log V i
t + i
t, i
t
indep
∼ N(0, σ2
e).
dXi
1,t = βi + (γi)2/2

Xi
1,tdt + γiXi
1,tdWi
1,t,
dXi
2,t = −δi + (ψi)2/2

Xi
2,tdt + ψiXi
2,tdWi
2,t.
• Xi
1,t the volume of surviving tumor cells.
• Xi
2,t the volume of cells “killed by a treatment”.
• V i
t = Xi
1,t + Xi
2,t the total tumor volume.
5
P  Forman. (2019). Bayesian inference for stochastic differential
equation mixed effects models of a tumor xenography study. JRSS-C.
35
Tumor growth SDEMEM: random effects model
• The random effects φi = (log βi, log γi, log δi, log ψi) follow
φi
j|η
indep
∼ N(µj, τ−1
j ), j = 1, . . . , 4,
where η = (µ1, . . . , µ4, τ1, . . . , τ4).
36
Tumor growth SDEMEM: simulated data
We assume M = 10 subjects with n = 20 datapoints each.
5 10 15 20
6
8
10
12
14
Time
Figure 7: Simulated data from the tumour growth model.
37
Tumor growth SDEMEM: different inference methods
We use the following inference methods:
• “PMMH”: Estimating the data likelihood with the
bootstrap filter and no correlation in the likelihoods.
• “CPMMH”: Estimating the data likelihood with the
bootstrap filter and inducing correlation in the
likelihoods, with ρ = 0.999.
Kalman here cannot be used due to nonlinear
Yt = log(X1,t + X2,t) + t
38
Tumor growth SDEMEM: inference results for η
-2.0 -1.5 -1.0
0
1
2
3
µ1
Density
0 20 40 60
0.00
0.02
0.04
0.06
τ1
Density
-2.5 -2.0 -1.5 -1.0
0.0
0.5
1.0
1.5
2.0
µ2
Density
0 20 40 60
0.00
0.02
0.04
0.06
0.08
τ2
Density
-3.5 -3.0 -2.5 -2.0 -1.5 -1.0
0.0
0.5
1.0
1.5
µ3
Density
0 20 40 60 80
0.00
0.02
0.04
0.06
τ3
Density
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
0.0
0.2
0.4
0.6
0.8
1.0
µ4
Density
0 20 40 60
0.00
0.05
0.10
0.15
0.20
0.25
τ4
Density
Figure 8: Marginal posterior distributions for µi and τi, i = 1, . . . , 4.
Dotted line shows results from LNA scheme, solid line is from the CPMMH
scheme and dashed line is the PMMH Scheme.
39
Tumor growth SDEMEM: comparing efficiency
Algorithm ρ N CPU (m) mESS mESS/m Rel.
PMMH 0 30 2963 2559 0.864 1
CPMMH 0.999 10 957 2311 2.415 3
Figure 9: Tumour model. Correlation ρ, number of particles N, CPU
time (in minutes m), minimum ESS (mESS), minimum ESS per minute
(mESS/m) and relative minimum ESS per minute (Rel.) as compared to
PMMH. All results are based on 500k iterations of each scheme.
40
Improving scalability for
increasing individuals
Obviously, when the number of individuals M increases,
problems emerge...
Recall the blocked Gibbs steps:
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Steps 1 and 2 are the hard ones because each require M runs of
a particle filter, one for each likelihood π̂ui (yi|ξ, φi)g(ui).
Moreover, steps 2 involves the product of individual
loglikelihoods. Too keep the variance of the product low, many
particles may be needed for the individual terms.
41
The “trick”
However co-author Sebastian Persson had the intuition to
borrow a trick from the Monolix6 software, which is specialised
in inference for mixed-effects models.
Quite simply, consider a “perturbation” of the original
SDEMEM, where we allow the constant parameter ξ (and
possibly other fixed parameters) to be slightly varying between
subjects as
ξi
∼ N(ξpop, δ), i = 1, ..., M
where ξpop is the original ξ to be inferred.
6
https://lixoft.com/products/monolix/
42
Gibbs for unperturbed model:
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Now introduce
ξi
∼ N(ξpop, δ), i = 1, ..., M
Gibbs for perturbed model:
1. π(φi, ξi, ui|η, ξpop, yi) ∝ π(φi|η)π(ξi|ξpop)π̂ui (yi|ξi, φi)g(ui),
i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
The expensive Step 2 has disappeared!
43
Notice step 1 allows to target each random effect separately
from the others.
So controlling the variance of each individual likelihood
π̂ui (yi|ξi, φi) is much easier than controlling the variance of
the “joint likelihood”
QM
i=1 π̂ui (yi|ξ, φi).
44
Perturbed and non-perturbed Ornstein-Uhlenbeck
Gibbs on perturbed vs Gibbs on non-perturbed model (OU
model).
45
Another model (forgot which one)
Gibbs on perturbed vs Gibbs on non-perturbed model .
46
The perturbation variance
ξi
∼ N(ξpop, δ), i = 1, ..., M
δ  0 is a small tuning parameter specified by the user.
We set δ somewhat arbitrarily, but we found that for
parameters having magnitude between 1–10 a value of δ = 0.01
worked well.
47
Try our PEPSDI package
Everything is coded in Julia for efficient inference at
https://github.com/cvijoviclab/PEPSDI
Includes:
• tutorials and notebooks on how to run the package;
• several adaptive MCMC samplers benchmarked (tl;dr best
one is Matti Vihola’s RAM sampler).
• “guided” particle filters better suited for informative
observations (low measurement error).
• nontrivial case studies are in the paper.
• it’s not just SDEs with mixed effects! Mixed-effects
stochastic kinetic models are implemented, and several
numerical integrators typical in systems biology are
supported (tau leaping, Gillespie).
48
Thanks to great co-authors!
(a) Marija
Cvijovic
(b) Samuel
Wiqvist
(c)
Sebastian
Persson
(d) Andrew
Golightly
(e) Ash-
leigh McLean
(f) Niek
Welkenhuy-
sen
(g)
Sviatlana
Shashkova
(h)
Patrick
Reith
(i) Gregor
Schmidt
49
Thank you
7@uPicchini
50
Appendix
CPMMH: updating step
• When correlating the particles, i.e. using CPMMH, step 1
in the Metroplis-Hastings within Gibbs scheme becomes:
1: For i = 1, . . . , M:
(1a) Propose φi∗
∼ q(·|φi,(j−1)
). Draw ω ∼ N(0, Id) and put
ui∗
= ρui,(j−1)
+
p
1 − ρ2ω.
(1b) Compute π̂ui∗ (yi
|ξ(j−1)
, φi∗
) by running the particle filter
with ui∗
, φi∗
, ξ(j−1)
and yi
.
(1c) With probability
put φi,(j)
= φi∗
and ui,(j)
= ui∗
. Otherwise, store the
current values φi,(j)
= φi,(j−1)
and ui,(j)
= ui,(j−1)
.
CPMMH: selecting number of particles
• For PMMH (no correlated particles) we selected the
number of particles N such that the variance of the
log-likelihood σ2
N is σ2
N ≈ 2 at some fixed parameter value.7
• For CPMMH N is selected such that σ2
N = 2.162/(1 − ρ2
l )
where ρl is the estimated correlation between π̂ui (yi|ξ, φi)
and π̂ui∗ (yi|ξ, φi).8
• A drawback with the CPMMH algorithm is that we have
to store the random numbers u = (u1, . . . , uM )T in
memory, which can be problematic if we have a very large
number of particles N, many subjects, or long time-series.
7
doucet15, sherlock2015.
8
tran2016block.
CPMMH: updating step
• When correlating the particles, step 1 in the
MH-within-Gibbs scheme becomes:
1: For i = 1, . . . , M:
(1a) Propose φi∗
∼ q(·|φi,(j−1)
). Draw ω ∼ N(0, Id) and put
ui∗
= ρui,(j−1)
+
p
1 − ρ2ω.
(1b) Compute π̂ui∗ (yi
|ξ(j−1)
, φi∗
) by running the particle filter
with ui∗
, φi∗
, ξ(j−1)
and yi
.
(1c) With probability
min

1 ,
π(φi∗
|·)
π(φi|·)
×
π̂ui∗ (yi
|φi∗
, ·)
π̂ui (yi|φi, ·)
×
q(φi
|φi∗
)
q(φi∗|φi)

.
put φi,(j)
= φi∗
and ui,(j)
= ui∗
. Otherwise, store the
current values φi,(j)
= φi,(j−1)
and ui,(j)
= ui,(j−1)
.
Using stochastic modelling is important!
[This slide refers to the tumor-growth data]
And what if we produced inference using a deterministic model
(ODEMEM) while observations come from a stochastic model?
Here follows the etimation of the measurement error variance σ2
e
(truth is log σ2
e = −1.6).
True value is massively overestimated by the ODE-based approach
Application: neuronal data with informative observations
215
ea, it was
trode de-
lification,
stimulus
mmercial
DC) level
d and set
fter each
ne poten-
membrane
spikes (if
e present
s article.
mV) with
d 0−501
Figure 11: Depolarization [mV] vs time [sec].
We may focus on what happens between spikes. So called
inter-spikes-intervals data (ISIs).
Inter-spikes-intervals data (ISIs):
0 50 100 150 200 250 300 350
Time (msec)
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
depolarization
mV
Figure 12: Observations from M = 100 ISIs.
P., Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion
leaky integrate-and-fire neuronal model for a slowly fluctuating signal.
Neural Computation, 20(11), 2696-2714.
So we have about 1.6 × 105 measurements of membrane
potential, across M = 100 units.
Membrane potential dynamics are assumed governed by an
Ornstein-Uhleneck process, observed with error:
(
Y i
t = Xi
t + i
t, i
t
indep
∼ N(0, σ2
 ), i = 1, ..., M,
dXi
t = (−λiXi
t + νi)dt + σidWi
t .
• µi [mV/msec] is the electrical input into the neuron;
• 1/λi [msec] is the spontaneous voltage decay (in the
absence of input)
In this example data are informative:
This means that the measurement error term is negligible.
Contrary to intuition, having informative observations
complicates things from the computational side.
Shortly: the “particles” propagated forward via the bootstrap
filter will have hard time, since π(yt|xt, ·) now has a very
narrow support.
Hence many particles will receive a tiny weight (→ poorly
approximated likelihood).
Solution: at time t, let the particles be “guided forward” to get
close to the next datapoint yt+1.
We used the guided scheme in Golightly, A.,  Wilkinson, D. J.
(2011). Interface focus, 1(6), 807-820.
With the “guided” particles having N = 1 is sufficient to get
good inference (not reported here).
Algorithm ρ N CPU (m) mESS mESS/m Rel.
Kalman - - 56 666 12.0 20.0
PMMH - 1 481 287 0.6 1.0
CPMMH-09 0.9 1 653 381 0.58 1.0
CPMMH-0999 0.999 1 655 326 0.50 0.8
Figure 13: Neuronal model. Correlation ρ, number of particles N,
CPU time (in minutes m), minimum ESS (mESS), minimum ESS per
minute (mESS/m), and relative minimum ESS per minute (Rel.) as
compared to PMMH. All results are based on 100k iterations of each
scheme.
Several adaptive MCMC samplers
We compare ESS and Wasserstein distance (wrt to true posterior
when available) across several MCMC samplers.

More Related Content

Similar to Bayesian inference for mixed-effects models driven by SDEs and other stochastic models: a scalable approach

Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filternozomuhamada
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm Quang Hoang
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsJulyan Arbel
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility usingkkislas
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Christian Robert
 

Similar to Bayesian inference for mixed-effects models driven by SDEs and other stochastic models: a scalable approach (20)

main
mainmain
main
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter2012 mdsp pr05 particle filter
2012 mdsp pr05 particle filter
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
5. cem granger causality ecm
5. cem granger causality  ecm 5. cem granger causality  ecm
5. cem granger causality ecm
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
isi
isiisi
isi
 
intro
introintro
intro
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
On estimating the integrated co volatility using
On estimating the integrated co volatility usingOn estimating the integrated co volatility using
On estimating the integrated co volatility using
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 

Recently uploaded

Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Recently uploaded (20)

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Bayesian inference for mixed-effects models driven by SDEs and other stochastic models: a scalable approach

  • 1. Bayesian inference for mixed-effects models driven by SDEs and other stochastic models: a scalable approach. Umberto Picchini Dept. Mathematical Sciences, Chalmers and Gothenburg University 7@uPicchini Statistics seminar at Maths dept., Bristol University, 1 April, 2022 1
  • 2. 2
  • 3. A classical problem of interest in biomedicine is the analysis of repeated measurements data. For example modelling repeated measurements of drug concentrations (pharmacokinetics/pharmacodynamics) Here we have concentrations of theophylline across 12 subjects. 3
  • 4. Tumor growth in mice 1. 0 5 10 15 20 25 30 35 40 days 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 log volume (mm 3 ) group 3 Modelling tumor growth on 8 mice (we compared between different treatments). 1 P and Forman (2019). Journal of the Royal Statistical Society: Series C 4
  • 5. Neuronal data: 215 t was e de- ation, mulus ercial level d set each oten- brane es (if esent ticle. with Figure 1: Depolarization [mV] vs time [sec]. We may focus on what happens between spikes. So called inter-spikes-intervals data (ISIs). 5
  • 6. Inter-spikes-intervals data (ISIs): 0 50 100 150 200 250 300 350 Time (msec) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 depolarization mV Figure 2: Observations from 100 ISIs. P, Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion leaky integrate-and-fire neuronal model for a slowly fluctuating signal. Neural Computation, 20(11), 2696-2714. 6
  • 7. With mixed-effects models (aka random-effects) we fit simultaneously discretely observed data from M “subjects” (= units). The reason to do this is to perform inference at the population level and better account for all information at hand. Assume for example that for some covariate X yi j |{z} observation j in unit i = Xi j(φi ) + i j, i = 1, ..., M; j = 1, ..., ni φi ∼ N(η, σ2 η), individual random effects The random effects have “population mean” η and “population variance σ2 η”. It’s typically of interest to estimate population parameters (η, σ2 η) not the subject-specific φi. 7
  • 8. So in this case each trajectory is guided by its own φi. However all trajectories have something in common, the shared parameters (η, σ2 η), since each φi ∼ N(η, σ2 η). 8
  • 9. Mixed-effect methodology is now standard. About 40 years of literature available. It could turn tricky though to use this methodology when data are observations from stochastic processes. For example, when mixed-effects modelling are driven by stochastic differential equations (SDEs). There exist about 50 papers on fitting SDEs with mixed effects, but these always consider some constraint that makes the models not very general. https://umbertopicchini.github.io/sdemem/ 9
  • 10. (this slide: courtesy of Susanne Ditlevsen)   The concentration of a drug in blood * * * * * * * * * * 0 20 40 60 80 100 120 0 20 40 60 80 100 time in minutes C12 concentration 1 Exponential decay dC(t) dt = −µC(t) C(t) = C(0)e−µt * * * * * * * * * * 0 20 40 60 80 100 120 0 20 40 60 80 100 time in minutes C12 concentration 2 Exponential decay with noise dC(t) = −µC(t)dt + σC(t)dW(t) C(t) = C(0) exp −(µ + 1 2 σ2 )t + σW(t) * * * * * * * * * * 0 20 40 60 80 100 120 0 20 40 60 80 100 time in minutes C12 concentration Different realizations dC(t) = −µC(t)dt + σC(t)dW(t) C(t) = C(0) exp −(µ + 1 2 σ2 )t + σW(t) * * * * * * * * * * 0 20 40 60 80 100 120 0 20 40 60 80 100 time in minutes C12 concentration 10
  • 12. SDEMEMs: model structure The state-space SDEMEM follows          Y i tj = h(Xi tj , i tj ) i t|ξ indep ∼ p(ξ), tj = 1, ..., ni dXi t = α(Xi t , φi ) dt + p β(Xi t , φi) dWi t , i = 1, ..., M φi ∼ π(φi |η) dWi t ∼iid N(0, dt) φi and η are vectors of random and fixed (population) parameters. • example: Y i tj = Xi tj + i tj , but we are allowed to take h(·) nonlinear with non-additive errors. • Latent diffusions Xi t share a common functional form, but have individual parameters φi , and are driven by individual Brownian motions Wi t . 11
  • 13.      Y i t = h(Xi t, i t) i t|ξ indep ∼ p(ξ), tj = 1, ..., ni dXi t = α(Xi t, φi) dt + p β(Xi t, φi) dWi t , i = 1, ..., M φi ∼ π(φi|η) SDEMEMs are flexible. Allow explanation of three levels of variation: • Intra-subject random variability modelled by a diffusion process Xi t. • Variation between different units is taken into account according to the (assumed) distribution of the φi’s. • Residual variation is modeled via a measurement error ξ. Goal: exact Bayesian inference for θ = [η, ξ]. 12
  • 14. What we want to do is: produce (virtually) exact Bayesian inference for general, nonlinear SDEMEMs. “General” means: • the SDEs can be nonlinear in the states Xt; • the error-model for Yt does not have to be linear in the Xt; • the error-model does not have to be additive, i.e. does not have to be of the type Yt = F · Xt + t • t does not have to be Gaussian distributed; • random effects φi can have any distribution. What we come up with is essentially an instance of the pseudomarginal method (Andrieu,Roberts 2009), embedded into a Gibbs sampler with careful use of blocking strategies (and more...). 13
  • 15. As it sometimes happen, independent work similar to ours was carried out simultaneously in Botha, I., Kohn, R., Drovandi, C. (2021). Particle methods for stochastic differential equation mixed effects models. Bayesian Analysis, 16(2), 575-609. 14
  • 17. The joint posterior      Y i t = h(Xi t , i t) i t|ξ indep ∼ p(ξ), tj = 1, ..., ni dXi t = α(Xi t , φi ) dt + p β(Xi t , φi) dWi t , i = 1, ..., M φi ∼ π(φi |η), i = 1, ..., M • observed data y = (Y i 1:ni )M i=1 across M individuals; • latent x = (Xi 1:ni )M i=1 at discrete time-points; We have the joint posterior π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ), where (from now on assume ni ≡ n for all units). π(φ|η) = M Y i=1 π(φi |η), π(x|φ) = M Y i=1 π(xi 1) n Y j=2 π(xi j|xi j−1, φi ) | {z } Markovianity , π(y|x, ξ) = M Y i=1 n Y j=1 π(yi j|xi j, ξ) | {z } condit. independence . 15
  • 18. π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ), while several components of the joint π(η, ξ, φ, x|y) may have tractable conditionals, sampling from such a joint posterior can still be an horrendous task.→ slow parameters surface exploration. Reason being that unknown parameters and x are highly correlated. Hence a Gibbs sampler would very badly mix. The best is, in fact, to sample from either of the following marginals π(η, ξ, φ|y) = Z π(η, ξ, φ, x|y)dx or π(η, ξ|y) = Z Z π(η, ξ, φ, x|y)dxdφ 16
  • 19. Marginal posterior over parameters and random effects • By integrating x out, the resulting marginal is π(η, ξ, φ|y) ∝ π(η)π(ξ) M Y i=1 π(φi |η)π(yi |ξ, φi ). • The data likelihood π(yi |ξ, φi ) for the generic i-th unit is π(yi |ξ, φi ) ∝ Z n Y j=1 π(yi j|xi j, ξ) × π(xi 1) n Y j=2 π(xi j|xi j−1, φi )dxi j • Typical problem: transitions density π(xi j|xi j−1, ·) unknown; • integral is generally intractable but we can estimate it via Monte Carlo. • For (very) simple cases, such as linear SDEs, we can apply the Kalman filter and obtain an exact solution. 17
  • 20. We assume a generic nonlinear SDE. • Transition density π(xi j|xi j−1, φi ) is unknown, • luckily we can still approximate the likelihood integral unbiasedly • sequential Monte Carlo (SMC) can be used for the task • when π(xi j|xi j−1, φi ) is unknown, we are still able to run a numerical discretization methods with step-size h 0 and simulate from the approximate πh(xi j|xi j−1, φi ), e.g. xi t+h = xi t + α(xi t, φi )h + q β(xi t, φi) · ui t ui t ∼iid N(0, h) this is the Euler-Maruyama discretization scheme (possible to use more advances schemes). Hence xi t+h|xi t ∼ πh(xi t+h|xi t, φi ) (which is clearly Gaussian). 18
  • 21. We approximate the observed data likelihood as πh(yi |ξ, φi ) ∝ Z n Y j=1 π(yi j|xi j(h), ξ) × π(xi 1) n Y j=2 πh(xi j|xi j−1, φi )dxi j but now for simplicity we stop emphasizing the reference to h. So we have the Monte-Carlo approximation π(yi |ξ, φi ) = E n Y j=1 π(yi j|xi j, ξ) ≈ 1 N N X k=1 n Y j=1 π(yi j|xi j,k, ξ), xi j,k ∼iid πh(xi j|xi j−1, φi ), (k = 1, ..., N) and the last sampling can of course be performed numerically (say by Euler-Maruyama). 19
  • 22. The efficient way to produce Monte Carlo approximations for nonlinear time-series observed with error is Sequential Monte Carlo (aka particle filters). With SMC the N Monte Carlo draws are called “particles”. The secret to SMC is • “propagate particles xi t forward”: xi t → xi t+h, • “weight” the particles proportionally to π(y|x), • “resample particles according to their weight”. The last operation is essential to let particles track the observations. I won’t get into details. But the simplest particle filter is the bootstrap filter (Gordon et al. 1993)2. 2 Useful intro from colleagues in Linköping and Uppsala: Naesseth, Lindsten, Schön. Elements of Sequential Monte Carlo. Foundations and Trends in Machine Learning, 12(3):307–392, 2019 20
  • 23. Estimating the observed data likelihood with the bootstrap filter An unbiased non-negative estimation of the data likelihood can be computed with the bootstrap filter SMC method using N particles: π̂ui (yi |ξ, φi ) = 1 Nn n Y t=1 N X k=1 π(yi t|xi t,k, ξ), i = 1, ..., M Recall, for particle k we have xi t+h,k = xi t,k + α(xi t,k, φi )h + q β(xi t,k, φi) · ui t,k ui t,k ∼iid N(0, h) We will soon see that it is important to keep track of the apparently uninteresting ui t,k variates. 21
  • 24. The “Blocked” Gibbs algorithm Recall: • random effects φi ∼ π(φi|η), i = 1, .., M • population parameters η ∼ π(η) • measurement error ξ ∼ π(ξ) • SMC variates: ui ∼ g(ui), i = 1, .., M. We found very important to “block” the generation of the u variates. 1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M, 2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ) QM i=1 π̂ui (yi|ξ, φi), 3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η) QM i=1 π(φi|η). Once the u variables are sampled and accepted in step 1, we reuse them when computing (in step 2) π̂ui (yi|ξ, φi). Better performance compared to generating new u in step 2. 22
  • 25. In practice it is a Metropolis-Hastings within Gibbs • Using the approximated likelihood π̂u, we construct a Metropolis-Hastings within Gibbs algorithm: 1. π(φi , ui |η, ξ, yi ) ∝ π(φi |η)π̂ui (yi |ξ, φi )g(ui ), i = 1, . . . , M, 2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ) QM i=1 π̂ui (yi |ξ, φi ), 3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η) QM i=1 π(φi |η). • With this scheme the acceptance probability in the first step is min 1 , π(φi∗|·) π(φi|·) × π̂ui∗ (yi|φi∗, ·) π̂ui (yi|φi, ·) × q(φi|φi∗) q(φi∗|φi) . • Often computer expensive due to the many particles needed to keep the variance of π̂ui∗ (yi|φi∗, ·) small. 23
  • 26. CPMMH: correlating the likelihood approximations Smart idea proposed by Deligiannidis et al. (2018): control instead the variance of the likelihood ratio. • Let us consider the acceptance probability in step 1 min 1 , π(φi∗ |·) π(φi|·) × π̂ui∗ (yi |φi∗ , ·) π̂ui (yi|φi, ·) × q(φi |φi∗ ) q(φi∗|φi) . • The main idea in CPMMH is to induce a positive correlation between π̂ui∗ (yi |φi∗ , ·) and π̂ui (yi |φi , ·). Which reduces the ratio variance while using fewer particles in the particle filter. • Correlation induced via Crank–Nicolson : ui∗ = ρ · ui,(j−1) + p 1 − ρ2 · ω, ω ∼ N(0, Id) ρ ∈ (0.9, 0.999) 24
  • 27. CPMMH: selecting number of particles • For PMMH (no correlated particles) we selected the number of particles N such that the variance of the log-likelihood σ2 N is σ2 N ≈ 2 at some fixed parameter value.3 • For CPMMH N is selected such that σ2 N ≈ 2.162/(1 − ρ2 l ) where ρl is the estimated correlation between π̂ui (yi|ξ, φi) and π̂ui∗ (yi|ξ, φi).4 • A drawback with the CPMMH algorithm is that we have to store the random numbers u = (u1, . . . , uM )T in memory, which can be problematic if we have a very large number of particles N, many subjects, or long time-series. 3 Sherlock, Thiery, Roberts, Rosenthal (2015). AoS. 4 Choppala, Gunawan, Chen, M.-N Tran, Kohn, 2016. 25
  • 28. We want to show how to improve scalability for increasing M. But first, some illustrative applications. 26
  • 30. Ornstein-Uhlenbeck SDEMEM: model structure Let us consider the following Ornstein-Uhlenbeck SDEMEM ( Y i t = Xi t + i t, i t indep ∼ N(0, σ2 ), i = 1, ..., 40 dXi t = θi 1(θi 2 − Xi t)dt + θi 3dWi t . • The random effects φi = (log θi 1, log θi 2, log θi 3) follow φi j|η indep ∼ N(µj, τ−1 j ), j = 1, . . . , 3, where η = (µ1, µ2, µ3, τ1, τ2, τ3) • This induces a semi-conjugate prior on η. Thus, we have a tractable Gibbs step when updating η. 27
  • 31. Ornstein-Uhlenbeck SDEMEM: simulated data We have M = 40 individuals. 0 2 4 6 8 10 Time 0 5 10 15 20 25 30 Figure 3: Simulated data from the OU-SDEMEM model. 28
  • 32. Ornstein-Uhlenbeck SDEMEM: different inference meth- ods We compare the following MCMC methods: we always use the outlined Metropolis-within-Gibbs sampler, with likelihood computed with several flavours: • “Kalman”: Computing the data likelihood exactly with the Kalman filter. • “PMMH”: Estimating the data likelihood with the bootstrap filter and no correlated likelihoods. • “CPMMH-099”: Estimating the data likelihood with the bootstrap filter with correlated likelihoods, with correlation ρ = 0.99. • “CPMMH-0999”: same as above, with correlation ρ = 0.999. 29
  • 33. Ornstein-Uhlenbeck SDEMEM: inference results for η −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 µ1 0 1 2 3 4 Density 1.8 2.0 2.2 2.4 2.6 2.8 µ2 0 1 2 3 4 5 6 Density −1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2 µ3 0 1 2 3 4 Density 2 4 6 8 10 12 τ1 0.0 0.1 0.2 0.3 0.4 Density 2 4 6 8 10 12 14 τ2 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Density 2 4 6 8 10 τ3 0.0 0.1 0.2 0.3 0.4 0.5 Density Figure 4: OU SDEMEM: marginal posterior distributions for η = (µ1, µ2, µ3, τ1, τ2, τ3). Almost overlapping lines are: Kalman, PMMH, CPMMH-099, vertical lines are ground truth. 30
  • 34. Ornstein-Uhlenbeck SDEMEM: comparing efficiency The ones below are all MH-within-Gibbs algorithms: Algorithm ρ N CPU (min) mESS mESS/min Rel. Kalman - - 1.23 488.51 396.37 5684.46 PMMH 0 3000 4076.94 450.13 0.11 1 CPMMH-099 0.99 100 200.92 418.22 2.09 19 CPMMH-0999 0.999 50 110.66 323.77 2.93 26.6 Figure 5: OU SDEMEM. Correlation ρ, number of particles N, CPU time (minutes), minimum ESS (mESS), minimum ESS per minute (mESS/min) and relative minimum ESS per minute (Rel.) as compared to PMMH-naive. All results are based on 50k iterations of each scheme, and are medians over 5 independent runs of each algorithm on different data sets. We could only produce 5 runs due to the very high computational cost of PMMH. 31
  • 35. Tumor growth simulation study This example is inspired by another publication: there, P. and Forman analyzed real experimental data of tumor growth on mice, using SDEMEMs. However, here to illustrate the use of our inference method we use a slightly simpler model. 32
  • 36. Figure 6: Source http://www.nature.com/articles/srep04384 33
  • 37. 34
  • 38. Tumor growth SDEMEM: model structure Let us now consider the following SDEMEM 5      Y i t = log V i t + i t, i t indep ∼ N(0, σ2 e). dXi 1,t = βi + (γi)2/2 Xi 1,tdt + γiXi 1,tdWi 1,t, dXi 2,t = −δi + (ψi)2/2 Xi 2,tdt + ψiXi 2,tdWi 2,t. • Xi 1,t the volume of surviving tumor cells. • Xi 2,t the volume of cells “killed by a treatment”. • V i t = Xi 1,t + Xi 2,t the total tumor volume. 5 P Forman. (2019). Bayesian inference for stochastic differential equation mixed effects models of a tumor xenography study. JRSS-C. 35
  • 39. Tumor growth SDEMEM: random effects model • The random effects φi = (log βi, log γi, log δi, log ψi) follow φi j|η indep ∼ N(µj, τ−1 j ), j = 1, . . . , 4, where η = (µ1, . . . , µ4, τ1, . . . , τ4). 36
  • 40. Tumor growth SDEMEM: simulated data We assume M = 10 subjects with n = 20 datapoints each. 5 10 15 20 6 8 10 12 14 Time Figure 7: Simulated data from the tumour growth model. 37
  • 41. Tumor growth SDEMEM: different inference methods We use the following inference methods: • “PMMH”: Estimating the data likelihood with the bootstrap filter and no correlation in the likelihoods. • “CPMMH”: Estimating the data likelihood with the bootstrap filter and inducing correlation in the likelihoods, with ρ = 0.999. Kalman here cannot be used due to nonlinear Yt = log(X1,t + X2,t) + t 38
  • 42. Tumor growth SDEMEM: inference results for η -2.0 -1.5 -1.0 0 1 2 3 µ1 Density 0 20 40 60 0.00 0.02 0.04 0.06 τ1 Density -2.5 -2.0 -1.5 -1.0 0.0 0.5 1.0 1.5 2.0 µ2 Density 0 20 40 60 0.00 0.02 0.04 0.06 0.08 τ2 Density -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 0.0 0.5 1.0 1.5 µ3 Density 0 20 40 60 80 0.00 0.02 0.04 0.06 τ3 Density -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 µ4 Density 0 20 40 60 0.00 0.05 0.10 0.15 0.20 0.25 τ4 Density Figure 8: Marginal posterior distributions for µi and τi, i = 1, . . . , 4. Dotted line shows results from LNA scheme, solid line is from the CPMMH scheme and dashed line is the PMMH Scheme. 39
  • 43. Tumor growth SDEMEM: comparing efficiency Algorithm ρ N CPU (m) mESS mESS/m Rel. PMMH 0 30 2963 2559 0.864 1 CPMMH 0.999 10 957 2311 2.415 3 Figure 9: Tumour model. Correlation ρ, number of particles N, CPU time (in minutes m), minimum ESS (mESS), minimum ESS per minute (mESS/m) and relative minimum ESS per minute (Rel.) as compared to PMMH. All results are based on 500k iterations of each scheme. 40
  • 45. Obviously, when the number of individuals M increases, problems emerge... Recall the blocked Gibbs steps: 1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M, 2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ) QM i=1 π̂ui (yi|ξ, φi), 3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η) QM i=1 π(φi|η). Steps 1 and 2 are the hard ones because each require M runs of a particle filter, one for each likelihood π̂ui (yi|ξ, φi)g(ui). Moreover, steps 2 involves the product of individual loglikelihoods. Too keep the variance of the product low, many particles may be needed for the individual terms. 41
  • 46. The “trick” However co-author Sebastian Persson had the intuition to borrow a trick from the Monolix6 software, which is specialised in inference for mixed-effects models. Quite simply, consider a “perturbation” of the original SDEMEM, where we allow the constant parameter ξ (and possibly other fixed parameters) to be slightly varying between subjects as ξi ∼ N(ξpop, δ), i = 1, ..., M where ξpop is the original ξ to be inferred. 6 https://lixoft.com/products/monolix/ 42
  • 47. Gibbs for unperturbed model: 1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M, 2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ) QM i=1 π̂ui (yi|ξ, φi), 3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η) QM i=1 π(φi|η). Now introduce ξi ∼ N(ξpop, δ), i = 1, ..., M Gibbs for perturbed model: 1. π(φi, ξi, ui|η, ξpop, yi) ∝ π(φi|η)π(ξi|ξpop)π̂ui (yi|ξi, φi)g(ui), i = 1, . . . , M, 2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ) QM i=1 π̂ui (yi|ξ, φi), 3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η) QM i=1 π(φi|η). The expensive Step 2 has disappeared! 43
  • 48. Notice step 1 allows to target each random effect separately from the others. So controlling the variance of each individual likelihood π̂ui (yi|ξi, φi) is much easier than controlling the variance of the “joint likelihood” QM i=1 π̂ui (yi|ξ, φi). 44
  • 49. Perturbed and non-perturbed Ornstein-Uhlenbeck Gibbs on perturbed vs Gibbs on non-perturbed model (OU model). 45
  • 50. Another model (forgot which one) Gibbs on perturbed vs Gibbs on non-perturbed model . 46
  • 51. The perturbation variance ξi ∼ N(ξpop, δ), i = 1, ..., M δ 0 is a small tuning parameter specified by the user. We set δ somewhat arbitrarily, but we found that for parameters having magnitude between 1–10 a value of δ = 0.01 worked well. 47
  • 52. Try our PEPSDI package Everything is coded in Julia for efficient inference at https://github.com/cvijoviclab/PEPSDI Includes: • tutorials and notebooks on how to run the package; • several adaptive MCMC samplers benchmarked (tl;dr best one is Matti Vihola’s RAM sampler). • “guided” particle filters better suited for informative observations (low measurement error). • nontrivial case studies are in the paper. • it’s not just SDEs with mixed effects! Mixed-effects stochastic kinetic models are implemented, and several numerical integrators typical in systems biology are supported (tau leaping, Gillespie). 48
  • 53. Thanks to great co-authors! (a) Marija Cvijovic (b) Samuel Wiqvist (c) Sebastian Persson (d) Andrew Golightly (e) Ash- leigh McLean (f) Niek Welkenhuy- sen (g) Sviatlana Shashkova (h) Patrick Reith (i) Gregor Schmidt 49
  • 56. CPMMH: updating step • When correlating the particles, i.e. using CPMMH, step 1 in the Metroplis-Hastings within Gibbs scheme becomes: 1: For i = 1, . . . , M: (1a) Propose φi∗ ∼ q(·|φi,(j−1) ). Draw ω ∼ N(0, Id) and put ui∗ = ρui,(j−1) + p 1 − ρ2ω. (1b) Compute π̂ui∗ (yi |ξ(j−1) , φi∗ ) by running the particle filter with ui∗ , φi∗ , ξ(j−1) and yi . (1c) With probability put φi,(j) = φi∗ and ui,(j) = ui∗ . Otherwise, store the current values φi,(j) = φi,(j−1) and ui,(j) = ui,(j−1) .
  • 57. CPMMH: selecting number of particles • For PMMH (no correlated particles) we selected the number of particles N such that the variance of the log-likelihood σ2 N is σ2 N ≈ 2 at some fixed parameter value.7 • For CPMMH N is selected such that σ2 N = 2.162/(1 − ρ2 l ) where ρl is the estimated correlation between π̂ui (yi|ξ, φi) and π̂ui∗ (yi|ξ, φi).8 • A drawback with the CPMMH algorithm is that we have to store the random numbers u = (u1, . . . , uM )T in memory, which can be problematic if we have a very large number of particles N, many subjects, or long time-series. 7 doucet15, sherlock2015. 8 tran2016block.
  • 58. CPMMH: updating step • When correlating the particles, step 1 in the MH-within-Gibbs scheme becomes: 1: For i = 1, . . . , M: (1a) Propose φi∗ ∼ q(·|φi,(j−1) ). Draw ω ∼ N(0, Id) and put ui∗ = ρui,(j−1) + p 1 − ρ2ω. (1b) Compute π̂ui∗ (yi |ξ(j−1) , φi∗ ) by running the particle filter with ui∗ , φi∗ , ξ(j−1) and yi . (1c) With probability min 1 , π(φi∗ |·) π(φi|·) × π̂ui∗ (yi |φi∗ , ·) π̂ui (yi|φi, ·) × q(φi |φi∗ ) q(φi∗|φi) . put φi,(j) = φi∗ and ui,(j) = ui∗ . Otherwise, store the current values φi,(j) = φi,(j−1) and ui,(j) = ui,(j−1) .
  • 59. Using stochastic modelling is important! [This slide refers to the tumor-growth data] And what if we produced inference using a deterministic model (ODEMEM) while observations come from a stochastic model? Here follows the etimation of the measurement error variance σ2 e (truth is log σ2 e = −1.6). True value is massively overestimated by the ODE-based approach
  • 60. Application: neuronal data with informative observations 215 ea, it was trode de- lification, stimulus mmercial DC) level d and set fter each ne poten- membrane spikes (if e present s article. mV) with d 0−501 Figure 11: Depolarization [mV] vs time [sec]. We may focus on what happens between spikes. So called inter-spikes-intervals data (ISIs).
  • 61. Inter-spikes-intervals data (ISIs): 0 50 100 150 200 250 300 350 Time (msec) 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 depolarization mV Figure 12: Observations from M = 100 ISIs. P., Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion leaky integrate-and-fire neuronal model for a slowly fluctuating signal. Neural Computation, 20(11), 2696-2714.
  • 62. So we have about 1.6 × 105 measurements of membrane potential, across M = 100 units. Membrane potential dynamics are assumed governed by an Ornstein-Uhleneck process, observed with error: ( Y i t = Xi t + i t, i t indep ∼ N(0, σ2 ), i = 1, ..., M, dXi t = (−λiXi t + νi)dt + σidWi t . • µi [mV/msec] is the electrical input into the neuron; • 1/λi [msec] is the spontaneous voltage decay (in the absence of input)
  • 63. In this example data are informative: This means that the measurement error term is negligible. Contrary to intuition, having informative observations complicates things from the computational side. Shortly: the “particles” propagated forward via the bootstrap filter will have hard time, since π(yt|xt, ·) now has a very narrow support. Hence many particles will receive a tiny weight (→ poorly approximated likelihood). Solution: at time t, let the particles be “guided forward” to get close to the next datapoint yt+1. We used the guided scheme in Golightly, A., Wilkinson, D. J. (2011). Interface focus, 1(6), 807-820.
  • 64. With the “guided” particles having N = 1 is sufficient to get good inference (not reported here). Algorithm ρ N CPU (m) mESS mESS/m Rel. Kalman - - 56 666 12.0 20.0 PMMH - 1 481 287 0.6 1.0 CPMMH-09 0.9 1 653 381 0.58 1.0 CPMMH-0999 0.999 1 655 326 0.50 0.8 Figure 13: Neuronal model. Correlation ρ, number of particles N, CPU time (in minutes m), minimum ESS (mESS), minimum ESS per minute (mESS/m), and relative minimum ESS per minute (Rel.) as compared to PMMH. All results are based on 100k iterations of each scheme.
  • 65. Several adaptive MCMC samplers We compare ESS and Wasserstein distance (wrt to true posterior when available) across several MCMC samplers.