An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
Bayesian inference for mixed-effects models driven by SDEs and other stochastic models: a scalable approach
1. Bayesian inference for mixed-effects
models driven by SDEs and other
stochastic models: a scalable approach.
Umberto Picchini
Dept. Mathematical Sciences, Chalmers and Gothenburg University
7@uPicchini
Statistics seminar at Maths dept., Bristol University, 1 April, 2022
1
3. A classical problem of interest in biomedicine is the analysis of
repeated measurements data.
For example modelling repeated measurements of drug
concentrations (pharmacokinetics/pharmacodynamics)
Here we have concentrations of theophylline across 12 subjects.
3
4. Tumor growth in mice 1.
0 5 10 15 20 25 30 35 40
days
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
log
volume
(mm
3
)
group 3
Modelling tumor growth on 8 mice (we compared between
different treatments).
1
P and Forman (2019). Journal of the Royal Statistical Society: Series C
4
5. Neuronal data:
215
t was
e de-
ation,
mulus
ercial
level
d set
each
oten-
brane
es (if
esent
ticle.
with
Figure 1: Depolarization [mV] vs time [sec].
We may focus on what happens between spikes. So called
inter-spikes-intervals data (ISIs).
5
6. Inter-spikes-intervals data (ISIs):
0 50 100 150 200 250 300 350
Time (msec)
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
depolarization
mV
Figure 2: Observations from 100 ISIs.
P, Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion
leaky integrate-and-fire neuronal model for a slowly fluctuating signal.
Neural Computation, 20(11), 2696-2714. 6
7. With mixed-effects models (aka random-effects) we fit
simultaneously discretely observed data from M “subjects”
(= units).
The reason to do this is to perform inference at the population
level and better account for all information at hand.
Assume for example that for some covariate X
yi
j
|{z}
observation j in unit i
= Xi
j(φi
) + i
j, i = 1, ..., M; j = 1, ..., ni
φi
∼ N(η, σ2
η), individual random effects
The random effects have “population mean” η and “population
variance σ2
η”.
It’s typically of interest to estimate population parameters
(η, σ2
η) not the subject-specific φi.
7
8. So in this case each trajectory is guided by its own φi.
However all trajectories have something in common, the shared
parameters (η, σ2
η), since each φi ∼ N(η, σ2
η).
8
9. Mixed-effect methodology is now standard.
About 40 years of literature available.
It could turn tricky though to use this methodology when data
are observations from stochastic processes.
For example, when mixed-effects modelling are driven by
stochastic differential equations (SDEs).
There exist about 50 papers on fitting SDEs with mixed effects,
but these always consider some constraint that makes the
models not very general.
https://umbertopicchini.github.io/sdemem/
9
12. SDEMEMs: model structure
The state-space SDEMEM follows
Y i
tj
= h(Xi
tj
, i
tj
) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t , φi
) dt +
p
β(Xi
t , φi) dWi
t , i = 1, ..., M
φi
∼ π(φi
|η)
dWi
t ∼iid N(0, dt)
φi
and η are vectors of random and fixed (population) parameters.
• example: Y i
tj
= Xi
tj
+ i
tj
, but we are allowed to take h(·)
nonlinear with non-additive errors.
• Latent diffusions Xi
t share a common functional form, but have
individual parameters φi
, and are driven by individual Brownian
motions Wi
t .
11
13.
Y i
t = h(Xi
t, i
t) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t, φi) dt +
p
β(Xi
t, φi) dWi
t , i = 1, ..., M
φi ∼ π(φi|η)
SDEMEMs are flexible. Allow explanation of three levels of
variation:
• Intra-subject random variability modelled by a diffusion
process Xi
t.
• Variation between different units is taken into account
according to the (assumed) distribution of the φi’s.
• Residual variation is modeled via a measurement error ξ.
Goal: exact Bayesian inference for θ = [η, ξ].
12
14. What we want to do is:
produce (virtually) exact Bayesian inference for general,
nonlinear SDEMEMs.
“General” means:
• the SDEs can be nonlinear in the states Xt;
• the error-model for Yt does not have to be linear in the Xt;
• the error-model does not have to be additive, i.e. does not
have to be of the type Yt = F · Xt + t
• t does not have to be Gaussian distributed;
• random effects φi can have any distribution.
What we come up with is essentially an instance of the
pseudomarginal method (Andrieu,Roberts 2009), embedded
into a Gibbs sampler with careful use of blocking strategies
(and more...).
13
15. As it sometimes happen, independent work similar to ours was
carried out simultaneously in
Botha, I., Kohn, R., Drovandi, C. (2021). Particle methods
for stochastic differential equation mixed effects models.
Bayesian Analysis, 16(2), 575-609.
14
17. The joint posterior
Y i
t = h(Xi
t , i
t) i
t|ξ
indep
∼ p(ξ), tj = 1, ..., ni
dXi
t = α(Xi
t , φi
) dt +
p
β(Xi
t , φi) dWi
t , i = 1, ..., M
φi
∼ π(φi
|η), i = 1, ..., M
• observed data y = (Y i
1:ni
)M
i=1 across M individuals;
• latent x = (Xi
1:ni
)M
i=1 at discrete time-points;
We have the joint posterior
π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ),
where (from now on assume ni ≡ n for all units).
π(φ|η) =
M
Y
i=1
π(φi
|η), π(x|φ) =
M
Y
i=1
π(xi
1)
n
Y
j=2
π(xi
j|xi
j−1, φi
)
| {z }
Markovianity
,
π(y|x, ξ) =
M
Y
i=1
n
Y
j=1
π(yi
j|xi
j, ξ)
| {z }
condit. independence
.
15
18. π(η, ξ, φ, x|y) ∝ π(η)π(ξ)π(φ|η)π(x|φ)π(y|x, ξ),
while several components of the joint π(η, ξ, φ, x|y) may have
tractable conditionals, sampling from such a joint posterior can
still be an horrendous task.→ slow parameters surface
exploration.
Reason being that unknown parameters and x are highly
correlated.
Hence a Gibbs sampler would very badly mix.
The best is, in fact, to sample from either of the following
marginals
π(η, ξ, φ|y) =
Z
π(η, ξ, φ, x|y)dx
or
π(η, ξ|y) =
Z Z
π(η, ξ, φ, x|y)dxdφ 16
19. Marginal posterior over parameters and random effects
• By integrating x out, the resulting marginal is
π(η, ξ, φ|y) ∝ π(η)π(ξ)
M
Y
i=1
π(φi
|η)π(yi
|ξ, φi
).
• The data likelihood π(yi
|ξ, φi
) for the generic i-th unit is
π(yi
|ξ, φi
) ∝
Z n
Y
j=1
π(yi
j|xi
j, ξ) × π(xi
1)
n
Y
j=2
π(xi
j|xi
j−1, φi
)dxi
j
• Typical problem: transitions density π(xi
j|xi
j−1, ·) unknown;
• integral is generally intractable but we can estimate it via Monte
Carlo.
• For (very) simple cases, such as linear SDEs, we can apply the
Kalman filter and obtain an exact solution. 17
20. We assume a generic nonlinear SDE.
• Transition density π(xi
j|xi
j−1, φi
) is unknown,
• luckily we can still approximate the likelihood integral unbiasedly
• sequential Monte Carlo (SMC) can be used for the task
• when π(xi
j|xi
j−1, φi
) is unknown, we are still able to run a
numerical discretization methods with step-size h 0 and
simulate from the approximate πh(xi
j|xi
j−1, φi
), e.g.
xi
t+h = xi
t + α(xi
t, φi
)h +
q
β(xi
t, φi) · ui
t
ui
t ∼iid N(0, h)
this is the Euler-Maruyama discretization scheme (possible to use
more advances schemes).
Hence xi
t+h|xi
t ∼ πh(xi
t+h|xi
t, φi
) (which is clearly Gaussian).
18
21. We approximate the observed data likelihood as
πh(yi
|ξ, φi
) ∝
Z n
Y
j=1
π(yi
j|xi
j(h), ξ) × π(xi
1)
n
Y
j=2
πh(xi
j|xi
j−1, φi
)dxi
j
but now for simplicity we stop emphasizing the reference to h.
So we have the Monte-Carlo approximation
π(yi
|ξ, φi
) = E
n
Y
j=1
π(yi
j|xi
j, ξ)
≈
1
N
N
X
k=1
n
Y
j=1
π(yi
j|xi
j,k, ξ),
xi
j,k ∼iid πh(xi
j|xi
j−1, φi
), (k = 1, ..., N)
and the last sampling can of course be performed numerically (say by
Euler-Maruyama).
19
22. The efficient way to produce Monte Carlo approximations for
nonlinear time-series observed with error is Sequential Monte
Carlo (aka particle filters).
With SMC the N Monte Carlo draws are called “particles”.
The secret to SMC is
• “propagate particles xi
t forward”: xi
t → xi
t+h,
• “weight” the particles proportionally to π(y|x),
• “resample particles according to their weight”. The last
operation is essential to let particles track the observations.
I won’t get into details. But the simplest particle filter is the
bootstrap filter (Gordon et al. 1993)2.
2
Useful intro from colleagues in Linköping and Uppsala:
Naesseth, Lindsten, Schön. Elements of Sequential Monte Carlo. Foundations and Trends in
Machine Learning, 12(3):307–392, 2019
20
23. Estimating the observed data likelihood with the bootstrap
filter
An unbiased non-negative estimation of the data likelihood
can be computed with the bootstrap filter SMC method
using N particles:
π̂ui (yi
|ξ, φi
) =
1
Nn
n
Y
t=1
N
X
k=1
π(yi
t|xi
t,k, ξ), i = 1, ..., M
Recall, for particle k we have
xi
t+h,k = xi
t,k + α(xi
t,k, φi
)h +
q
β(xi
t,k, φi) · ui
t,k
ui
t,k ∼iid N(0, h)
We will soon see that it is important to keep track of the
apparently uninteresting ui
t,k variates. 21
24. The “Blocked” Gibbs algorithm
Recall:
• random effects φi ∼ π(φi|η), i = 1, .., M
• population parameters η ∼ π(η)
• measurement error ξ ∼ π(ξ)
• SMC variates: ui ∼ g(ui), i = 1, .., M.
We found very important to “block” the generation of the u
variates.
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Once the u variables are sampled and accepted in step 1, we
reuse them when computing (in step 2) π̂ui (yi|ξ, φi).
Better performance compared to generating new u in step 2. 22
25. In practice it is a Metropolis-Hastings within Gibbs
• Using the approximated likelihood π̂u, we construct a
Metropolis-Hastings within Gibbs algorithm:
1. π(φi
, ui
|η, ξ, yi
) ∝ π(φi
|η)π̂ui (yi
|ξ, φi
)g(ui
), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi
|ξ, φi
),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi
|η).
• With this scheme the acceptance probability in the first
step is
min
1 ,
π(φi∗|·)
π(φi|·)
×
π̂ui∗ (yi|φi∗, ·)
π̂ui (yi|φi, ·)
×
q(φi|φi∗)
q(φi∗|φi)
.
• Often computer expensive due to the many particles
needed to keep the variance of π̂ui∗ (yi|φi∗, ·) small.
23
26. CPMMH: correlating the likelihood approximations
Smart idea proposed by Deligiannidis et al. (2018): control instead
the variance of the likelihood ratio.
• Let us consider the acceptance probability in step 1
min
1 ,
π(φi∗
|·)
π(φi|·)
×
π̂ui∗ (yi
|φi∗
, ·)
π̂ui (yi|φi, ·)
×
q(φi
|φi∗
)
q(φi∗|φi)
.
• The main idea in CPMMH is to induce a positive correlation
between π̂ui∗ (yi
|φi∗
, ·) and π̂ui (yi
|φi
, ·). Which reduces the ratio
variance while using fewer particles in the particle filter.
• Correlation induced via Crank–Nicolson :
ui∗
= ρ · ui,(j−1)
+
p
1 − ρ2 · ω, ω ∼ N(0, Id)
ρ ∈ (0.9, 0.999)
24
27. CPMMH: selecting number of particles
• For PMMH (no correlated particles) we selected the
number of particles N such that the variance of the
log-likelihood σ2
N is σ2
N ≈ 2 at some fixed parameter value.3
• For CPMMH N is selected such that σ2
N ≈ 2.162/(1 − ρ2
l )
where ρl is the estimated correlation between π̂ui (yi|ξ, φi)
and π̂ui∗ (yi|ξ, φi).4
• A drawback with the CPMMH algorithm is that we have
to store the random numbers u = (u1, . . . , uM )T in
memory, which can be problematic if we have a very large
number of particles N, many subjects, or long time-series.
3
Sherlock, Thiery, Roberts, Rosenthal (2015). AoS.
4
Choppala, Gunawan, Chen, M.-N Tran, Kohn, 2016.
25
28. We want to show how to improve scalability for increasing M.
But first, some illustrative applications.
26
30. Ornstein-Uhlenbeck SDEMEM: model structure
Let us consider the following Ornstein-Uhlenbeck SDEMEM
(
Y i
t = Xi
t + i
t, i
t
indep
∼ N(0, σ2
), i = 1, ..., 40
dXi
t = θi
1(θi
2 − Xi
t)dt + θi
3dWi
t .
• The random effects φi = (log θi
1, log θi
2, log θi
3) follow
φi
j|η
indep
∼ N(µj, τ−1
j ), j = 1, . . . , 3,
where η = (µ1, µ2, µ3, τ1, τ2, τ3)
• This induces a semi-conjugate prior on η. Thus, we have a
tractable Gibbs step when updating η.
27
31. Ornstein-Uhlenbeck SDEMEM: simulated data
We have M = 40 individuals.
0 2 4 6 8 10
Time
0
5
10
15
20
25
30
Figure 3: Simulated data from the OU-SDEMEM model.
28
32. Ornstein-Uhlenbeck SDEMEM: different inference meth-
ods
We compare the following MCMC methods: we always use the
outlined Metropolis-within-Gibbs sampler, with likelihood
computed with several flavours:
• “Kalman”: Computing the data likelihood exactly with the
Kalman filter.
• “PMMH”: Estimating the data likelihood with the
bootstrap filter and no correlated likelihoods.
• “CPMMH-099”: Estimating the data likelihood with the
bootstrap filter with correlated likelihoods, with
correlation ρ = 0.99.
• “CPMMH-0999”: same as above, with correlation
ρ = 0.999.
29
33. Ornstein-Uhlenbeck SDEMEM: inference results for η
−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2
µ1
0
1
2
3
4
Density
1.8 2.0 2.2 2.4 2.6 2.8
µ2
0
1
2
3
4
5
6
Density
−1.4 −1.2 −1.0 −0.8 −0.6 −0.4 −0.2
µ3
0
1
2
3
4
Density
2 4 6 8 10 12
τ1
0.0
0.1
0.2
0.3
0.4
Density
2 4 6 8 10 12 14
τ2
0.00
0.05
0.10
0.15
0.20
0.25
0.30
Density
2 4 6 8 10
τ3
0.0
0.1
0.2
0.3
0.4
0.5
Density
Figure 4: OU SDEMEM: marginal posterior distributions for
η = (µ1, µ2, µ3, τ1, τ2, τ3). Almost overlapping lines are: Kalman, PMMH,
CPMMH-099, vertical lines are ground truth.
30
34. Ornstein-Uhlenbeck SDEMEM: comparing efficiency
The ones below are all MH-within-Gibbs algorithms:
Algorithm ρ N CPU (min) mESS mESS/min Rel.
Kalman - - 1.23 488.51 396.37 5684.46
PMMH 0 3000 4076.94 450.13 0.11 1
CPMMH-099 0.99 100 200.92 418.22 2.09 19
CPMMH-0999 0.999 50 110.66 323.77 2.93 26.6
Figure 5: OU SDEMEM. Correlation ρ, number of particles N,
CPU time (minutes), minimum ESS (mESS), minimum ESS per
minute (mESS/min) and relative minimum ESS per minute (Rel.) as
compared to PMMH-naive. All results are based on 50k iterations of
each scheme, and are medians over 5 independent runs of each
algorithm on different data sets. We could only produce 5 runs due to
the very high computational cost of PMMH.
31
35. Tumor growth simulation study
This example is inspired by another publication: there, P. and
Forman analyzed real experimental data of tumor growth on
mice, using SDEMEMs.
However, here to illustrate the use of our inference method we
use a slightly simpler model.
32
38. Tumor growth SDEMEM: model structure
Let us now consider the following SDEMEM 5
Y i
t = log V i
t + i
t, i
t
indep
∼ N(0, σ2
e).
dXi
1,t = βi + (γi)2/2
Xi
1,tdt + γiXi
1,tdWi
1,t,
dXi
2,t = −δi + (ψi)2/2
Xi
2,tdt + ψiXi
2,tdWi
2,t.
• Xi
1,t the volume of surviving tumor cells.
• Xi
2,t the volume of cells “killed by a treatment”.
• V i
t = Xi
1,t + Xi
2,t the total tumor volume.
5
P Forman. (2019). Bayesian inference for stochastic differential
equation mixed effects models of a tumor xenography study. JRSS-C.
35
39. Tumor growth SDEMEM: random effects model
• The random effects φi = (log βi, log γi, log δi, log ψi) follow
φi
j|η
indep
∼ N(µj, τ−1
j ), j = 1, . . . , 4,
where η = (µ1, . . . , µ4, τ1, . . . , τ4).
36
40. Tumor growth SDEMEM: simulated data
We assume M = 10 subjects with n = 20 datapoints each.
5 10 15 20
6
8
10
12
14
Time
Figure 7: Simulated data from the tumour growth model.
37
41. Tumor growth SDEMEM: different inference methods
We use the following inference methods:
• “PMMH”: Estimating the data likelihood with the
bootstrap filter and no correlation in the likelihoods.
• “CPMMH”: Estimating the data likelihood with the
bootstrap filter and inducing correlation in the
likelihoods, with ρ = 0.999.
Kalman here cannot be used due to nonlinear
Yt = log(X1,t + X2,t) + t
38
42. Tumor growth SDEMEM: inference results for η
-2.0 -1.5 -1.0
0
1
2
3
µ1
Density
0 20 40 60
0.00
0.02
0.04
0.06
τ1
Density
-2.5 -2.0 -1.5 -1.0
0.0
0.5
1.0
1.5
2.0
µ2
Density
0 20 40 60
0.00
0.02
0.04
0.06
0.08
τ2
Density
-3.5 -3.0 -2.5 -2.0 -1.5 -1.0
0.0
0.5
1.0
1.5
µ3
Density
0 20 40 60 80
0.00
0.02
0.04
0.06
τ3
Density
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
0.0
0.2
0.4
0.6
0.8
1.0
µ4
Density
0 20 40 60
0.00
0.05
0.10
0.15
0.20
0.25
τ4
Density
Figure 8: Marginal posterior distributions for µi and τi, i = 1, . . . , 4.
Dotted line shows results from LNA scheme, solid line is from the CPMMH
scheme and dashed line is the PMMH Scheme.
39
43. Tumor growth SDEMEM: comparing efficiency
Algorithm ρ N CPU (m) mESS mESS/m Rel.
PMMH 0 30 2963 2559 0.864 1
CPMMH 0.999 10 957 2311 2.415 3
Figure 9: Tumour model. Correlation ρ, number of particles N, CPU
time (in minutes m), minimum ESS (mESS), minimum ESS per minute
(mESS/m) and relative minimum ESS per minute (Rel.) as compared to
PMMH. All results are based on 500k iterations of each scheme.
40
45. Obviously, when the number of individuals M increases,
problems emerge...
Recall the blocked Gibbs steps:
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Steps 1 and 2 are the hard ones because each require M runs of
a particle filter, one for each likelihood π̂ui (yi|ξ, φi)g(ui).
Moreover, steps 2 involves the product of individual
loglikelihoods. Too keep the variance of the product low, many
particles may be needed for the individual terms.
41
46. The “trick”
However co-author Sebastian Persson had the intuition to
borrow a trick from the Monolix6 software, which is specialised
in inference for mixed-effects models.
Quite simply, consider a “perturbation” of the original
SDEMEM, where we allow the constant parameter ξ (and
possibly other fixed parameters) to be slightly varying between
subjects as
ξi
∼ N(ξpop, δ), i = 1, ..., M
where ξpop is the original ξ to be inferred.
6
https://lixoft.com/products/monolix/
42
47. Gibbs for unperturbed model:
1. π(φi, ui|η, ξ, yi) ∝ π(φi|η)π̂ui (yi|ξ, φi)g(ui), i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
Now introduce
ξi
∼ N(ξpop, δ), i = 1, ..., M
Gibbs for perturbed model:
1. π(φi, ξi, ui|η, ξpop, yi) ∝ π(φi|η)π(ξi|ξpop)π̂ui (yi|ξi, φi)g(ui),
i = 1, . . . , M,
2. π(ξ|η, φ, y, u) = π(ξ|φ, y, u) ∝ π(ξ)
QM
i=1 π̂ui (yi|ξ, φi),
3. π(η|ξ, φ, y, u) = π(η|φ) ∝ π(η)
QM
i=1 π(φi|η).
The expensive Step 2 has disappeared!
43
48. Notice step 1 allows to target each random effect separately
from the others.
So controlling the variance of each individual likelihood
π̂ui (yi|ξi, φi) is much easier than controlling the variance of
the “joint likelihood”
QM
i=1 π̂ui (yi|ξ, φi).
44
49. Perturbed and non-perturbed Ornstein-Uhlenbeck
Gibbs on perturbed vs Gibbs on non-perturbed model (OU
model).
45
50. Another model (forgot which one)
Gibbs on perturbed vs Gibbs on non-perturbed model .
46
51. The perturbation variance
ξi
∼ N(ξpop, δ), i = 1, ..., M
δ 0 is a small tuning parameter specified by the user.
We set δ somewhat arbitrarily, but we found that for
parameters having magnitude between 1–10 a value of δ = 0.01
worked well.
47
52. Try our PEPSDI package
Everything is coded in Julia for efficient inference at
https://github.com/cvijoviclab/PEPSDI
Includes:
• tutorials and notebooks on how to run the package;
• several adaptive MCMC samplers benchmarked (tl;dr best
one is Matti Vihola’s RAM sampler).
• “guided” particle filters better suited for informative
observations (low measurement error).
• nontrivial case studies are in the paper.
• it’s not just SDEs with mixed effects! Mixed-effects
stochastic kinetic models are implemented, and several
numerical integrators typical in systems biology are
supported (tau leaping, Gillespie).
48
53. Thanks to great co-authors!
(a) Marija
Cvijovic
(b) Samuel
Wiqvist
(c)
Sebastian
Persson
(d) Andrew
Golightly
(e) Ash-
leigh McLean
(f) Niek
Welkenhuy-
sen
(g)
Sviatlana
Shashkova
(h)
Patrick
Reith
(i) Gregor
Schmidt
49
56. CPMMH: updating step
• When correlating the particles, i.e. using CPMMH, step 1
in the Metroplis-Hastings within Gibbs scheme becomes:
1: For i = 1, . . . , M:
(1a) Propose φi∗
∼ q(·|φi,(j−1)
). Draw ω ∼ N(0, Id) and put
ui∗
= ρui,(j−1)
+
p
1 − ρ2ω.
(1b) Compute π̂ui∗ (yi
|ξ(j−1)
, φi∗
) by running the particle filter
with ui∗
, φi∗
, ξ(j−1)
and yi
.
(1c) With probability
put φi,(j)
= φi∗
and ui,(j)
= ui∗
. Otherwise, store the
current values φi,(j)
= φi,(j−1)
and ui,(j)
= ui,(j−1)
.
57. CPMMH: selecting number of particles
• For PMMH (no correlated particles) we selected the
number of particles N such that the variance of the
log-likelihood σ2
N is σ2
N ≈ 2 at some fixed parameter value.7
• For CPMMH N is selected such that σ2
N = 2.162/(1 − ρ2
l )
where ρl is the estimated correlation between π̂ui (yi|ξ, φi)
and π̂ui∗ (yi|ξ, φi).8
• A drawback with the CPMMH algorithm is that we have
to store the random numbers u = (u1, . . . , uM )T in
memory, which can be problematic if we have a very large
number of particles N, many subjects, or long time-series.
7
doucet15, sherlock2015.
8
tran2016block.
58. CPMMH: updating step
• When correlating the particles, step 1 in the
MH-within-Gibbs scheme becomes:
1: For i = 1, . . . , M:
(1a) Propose φi∗
∼ q(·|φi,(j−1)
). Draw ω ∼ N(0, Id) and put
ui∗
= ρui,(j−1)
+
p
1 − ρ2ω.
(1b) Compute π̂ui∗ (yi
|ξ(j−1)
, φi∗
) by running the particle filter
with ui∗
, φi∗
, ξ(j−1)
and yi
.
(1c) With probability
min
1 ,
π(φi∗
|·)
π(φi|·)
×
π̂ui∗ (yi
|φi∗
, ·)
π̂ui (yi|φi, ·)
×
q(φi
|φi∗
)
q(φi∗|φi)
.
put φi,(j)
= φi∗
and ui,(j)
= ui∗
. Otherwise, store the
current values φi,(j)
= φi,(j−1)
and ui,(j)
= ui,(j−1)
.
59. Using stochastic modelling is important!
[This slide refers to the tumor-growth data]
And what if we produced inference using a deterministic model
(ODEMEM) while observations come from a stochastic model?
Here follows the etimation of the measurement error variance σ2
e
(truth is log σ2
e = −1.6).
True value is massively overestimated by the ODE-based approach
60. Application: neuronal data with informative observations
215
ea, it was
trode de-
lification,
stimulus
mmercial
DC) level
d and set
fter each
ne poten-
membrane
spikes (if
e present
s article.
mV) with
d 0−501
Figure 11: Depolarization [mV] vs time [sec].
We may focus on what happens between spikes. So called
inter-spikes-intervals data (ISIs).
61. Inter-spikes-intervals data (ISIs):
0 50 100 150 200 250 300 350
Time (msec)
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
depolarization
mV
Figure 12: Observations from M = 100 ISIs.
P., Ditlevsen, De Gaetano and Lansky (2008). Parameters of the diffusion
leaky integrate-and-fire neuronal model for a slowly fluctuating signal.
Neural Computation, 20(11), 2696-2714.
62. So we have about 1.6 × 105 measurements of membrane
potential, across M = 100 units.
Membrane potential dynamics are assumed governed by an
Ornstein-Uhleneck process, observed with error:
(
Y i
t = Xi
t + i
t, i
t
indep
∼ N(0, σ2
), i = 1, ..., M,
dXi
t = (−λiXi
t + νi)dt + σidWi
t .
• µi [mV/msec] is the electrical input into the neuron;
• 1/λi [msec] is the spontaneous voltage decay (in the
absence of input)
63. In this example data are informative:
This means that the measurement error term is negligible.
Contrary to intuition, having informative observations
complicates things from the computational side.
Shortly: the “particles” propagated forward via the bootstrap
filter will have hard time, since π(yt|xt, ·) now has a very
narrow support.
Hence many particles will receive a tiny weight (→ poorly
approximated likelihood).
Solution: at time t, let the particles be “guided forward” to get
close to the next datapoint yt+1.
We used the guided scheme in Golightly, A., Wilkinson, D. J.
(2011). Interface focus, 1(6), 807-820.
64. With the “guided” particles having N = 1 is sufficient to get
good inference (not reported here).
Algorithm ρ N CPU (m) mESS mESS/m Rel.
Kalman - - 56 666 12.0 20.0
PMMH - 1 481 287 0.6 1.0
CPMMH-09 0.9 1 653 381 0.58 1.0
CPMMH-0999 0.999 1 655 326 0.50 0.8
Figure 13: Neuronal model. Correlation ρ, number of particles N,
CPU time (in minutes m), minimum ESS (mESS), minimum ESS per
minute (mESS/m), and relative minimum ESS per minute (Rel.) as
compared to PMMH. All results are based on 100k iterations of each
scheme.
65. Several adaptive MCMC samplers
We compare ESS and Wasserstein distance (wrt to true posterior
when available) across several MCMC samplers.