1. Generic Importance Sampling via Optimal Control
for Stochastic Reaction Networks
Chiheb Ben Hammouda
Joint work with
Nadhir Ben Rached (University of Leeds, UK),
Raúl Tempone (RWTH Aachen, Germany; KAUST, KSA) and
Sophia Wiechert (RWTH Aachen, Germany)
Leiden/VU/Delft-Seminar
TU Delft, November 10, 2023
1
2. Main Ideas of the Talk
1 Design efficient Monte Carlo (MC) estimators for rare event
probabilities for a particular class of continuous-time Markov
chains, namely stochastic reaction networks (SRNs).
2 Generic path dependent measure change is derived based on a
connection between finding optimal importance sampling (IS)
parameters and a stochastic optimal control (SOC) formulation.
3 Address the curse of dimensionality when solving the SOC
problem
(a) Learning-based approach:
C. Ben Hammouda et al. “Learning-based importance sampling via
stochastic optimal control for stochastic reaction networks”. In: Statistics
and Computing 33.3 (2023), p. 58.
(b) Markovian projection-based approach:
Chiheb Ben Hammouda et al. “Automated Importance Sampling via
Optimal Control for Stochastic Reaction Networks: A Markovian
Projection-based Approach”. In: arXiv preprint arXiv:2306.02660 (2023).
2
3. Outline
1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
2
4. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
5. Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species/Agents of small population ⇒ stochastic effects.
⇒ Modeling based on Stochastic Reaction Networks (SRNs) using
Poisson processes.
Examples of SRNs Applications:
▸ Epidemics (Brauer et al. 2012; Anderson et al. 2015).
▸ Transcription and translation in genomics and virus Kinetics (e.g.,
Gene switch) (Hensel et al. 2009; Roberts et al. 2011)
▸ Manufacturing supply chain networks (Raghavan et al. 2002)
▸ (Bio)chemical reactions (e.g., Michaelis-Menten enzym kinetics)
(Rao et al. 2003; Briat et al. 2015)
6. Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species/Agents of small population ⇒ stochastic effects.
⇒ Modeling based on Stochastic Reaction Networks (SRNs) using
Poisson processes.
Examples of SRNs Applications:
▸ Epidemics (Brauer et al. 2012; Anderson et al. 2015).
▸ Transcription and translation in genomics and virus Kinetics (e.g.,
Gene switch) (Hensel et al. 2009; Roberts et al. 2011)
▸ Manufacturing supply chain networks (Raghavan et al. 2002)
▸ (Bio)chemical reactions (e.g., Michaelis-Menten enzym kinetics)
(Rao et al. 2003; Briat et al. 2015)
7. Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species/Agents of small population ⇒ stochastic effects.
⇒ Modeling based on Stochastic Reaction Networks (SRNs) using
Poisson processes.
Examples of SRNs Applications:
▸ Epidemics (Brauer et al. 2012; Anderson et al. 2015).
▸ Transcription and translation in genomics and virus Kinetics (e.g.,
Gene switch) (Hensel et al. 2009; Roberts et al. 2011)
▸ Manufacturing supply chain networks (Raghavan et al. 2002)
▸ (Bio)chemical reactions (e.g., Michaelis-Menten enzym kinetics)
(Rao et al. 2003; Briat et al. 2015)
8. Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species/Agents of small population ⇒ stochastic effects.
⇒ Modeling based on Stochastic Reaction Networks (SRNs) using
Poisson processes.
Examples of SRNs Applications:
▸ Epidemics (Brauer et al. 2012; Anderson et al. 2015).
▸ Transcription and translation in genomics and virus Kinetics (e.g.,
Gene switch) (Hensel et al. 2009; Roberts et al. 2011)
▸ Manufacturing supply chain networks (Raghavan et al. 2002)
▸ (Bio)chemical reactions (e.g., Michaelis-Menten enzym kinetics)
(Rao et al. 2003; Briat et al. 2015)
3
9. Stochastic Reaction Networks (SRNs): Motivation
Deterministic models describe an average (macroscopic) behavior
and are only valid for large populations.
Species/Agents of small population ⇒ stochastic effects.
⇒ Modeling based on Stochastic Reaction Networks (SRNs) using
Poisson processes.
Examples of SRNs Applications:
▸ Epidemics (Brauer et al. 2012; Anderson et al. 2015).
▸ Transcription and translation in genomics and virus Kinetics (e.g.,
Gene switch) (Hensel et al. 2009; Roberts et al. 2011)
▸ Manufacturing supply chain networks (Raghavan et al. 2002)
▸ (Bio)chemical reactions (e.g., Michaelis-Menten enzym kinetics)
(Rao et al. 2003; Briat et al. 2015)
enzyme
substrate
θ1
θ2
enzyme-
substrate
complex
θ3 enzyme
product
E + S
θ1
→ C
C
θ2
→ E + S
C
θ3
→ E + P
10. Stochastic Reaction Network (SRNs)
A stochastic reaction network (SRN) is a continuous-time Markov
chain, X(t), defined on a probability space (Ω,F,P)1
X(t) = (X(1)
(t),...,X(d)
(t)) ∶ [0,T] × Ω → Nd
described by J reactions channels, Rj ∶= (νj,aj), where
▸ νj ∈ Zd
: stoichiometric (state change) vector.
▸ aj ∶ Nd
→ R+: propensity (jump intensity) function.
aj(⋅) satisfies
P(X(t + ∆t) = x + νj ∣ X(t) = x) = aj(x)∆t + o(∆t), j = 1,...,J.
From the mass-action kinetic principle:
aj(x) ∶= θj
d
∏
i=1
xi!
(xi − αj,i)!
1{xi≥αj,i}
▸ θj: reaction rate of the jth reaction
▸ αi,j: number of consumed molecules of the ith type in reaction j.
1
X(i)
(t) describes the counting number of the i-th agent/species at time t.
4
11. SRNs Illustration: Michaelis-Menten Enzym Kinetics
We are interested in the time evolution of X(t) =
⎛
⎜
⎜
⎜
⎝
E(t)
S(t)
C(t)
P(t)
⎞
⎟
⎟
⎟
⎠
, t ∈ [0,T].
Reaction j = 1 Reaction j = 2 Reaction j = 3
enzyme
substrate
θ1 enzyme-
substrate
complex
enzyme
substrate
θ2
enzyme-
substrate
complex
enzyme
substrate
θ1
θ2
enzyme-
substrate
complex
θ3 enzyme
product
E + S
θ1
→ C
C
θ2
→ E + S
C
θ3
→ E + P
E + S
θ1
→ C C
θ2
→ E + S C
θ3
→ E + P
ν1 =
⎛
⎜
⎜
⎜
⎝
−1
−1
1
0
⎞
⎟
⎟
⎟
⎠
E
S
C
P
ν2 =
⎛
⎜
⎜
⎜
⎝
1
1
−1
0
⎞
⎟
⎟
⎟
⎠
E
S
C
P
ν3 =
⎛
⎜
⎜
⎜
⎝
1
0
−1
1
⎞
⎟
⎟
⎟
⎠
E
S
C
P
a1(x) = θ1 E S a2(x) = θ2 C a3(x) = θ3 C
5
12. Dynamics of SRNs
Kurtz’s random time-change representation (Ethier et al. 2009)
X(t) = X(0) +
J
∑
j=1
Yj (∫
t
0
aj(X(s))ds)
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
∶=Rj (t)
⋅ νj, (1)
▸ {Yj}1≤j≤J : independent unit-rate Poisson processes.
▸ Rj(t): number of occurrences of the jth reaction up to time t.
0 5 10 15 20
10
0
10
1
10
2
10
3
10
4
20 exact paths
Time
Number
of
particles
(log
scale)
G
S
E
V
13. Simulation of SRNs
Pathwise exact Pathwise approximate
model the exact stochastic distri-
bution of the process
simulation on a time-discrete grid
0 20 40 60 80 100
t
0
50
100
150
200
250
numer
of
molecules
A
B
⊖ computationally expensive ⊖ a bias is introduced (but) faster
● Stochastic Simulation Algo-
rithm (SSA) (Gillespie 1976)
● Explicit Tau-Leap (TL) ap-
proximate scheme (Gillespie
2001)
● Modified Next Reaction
Method (Anderson 2007)
● Split Step Implicit Tau-Leap
(Ben Hammouda et al. 2017)
7
14. The Explicit-TL Method
(Gillespie 2001; J. Aparicio 2001)
Based on the Kurtz’s random time-change representation
X(t) = X(0) +
J
∑
j=1
Yj (∫
t
0
aj(X(s))ds) ⋅ νj,
where Yj are independent unit-rate Poisson processes
The explicit-TL method (forward Euler approximation):
▸ Assume the propensity, aj(⋅), to be constant on small intervals
▸ 0 = t0 < t1 < ⋅⋅⋅ < tN = T be a uniform grid with step size ∆t
X̂
∆t
0 = x0
X̂
∆t
n = max
⎛
⎝
0,X̂
∆t
n−1 +
J
∑
j=1
Pn,j (aj(X̂
∆t
n−1) ⋅ ∆t)νj
⎞
⎠
for n = 1,...N
▸ X̂
∆t
n is the TL approximation at time tn, x0: initial state.
▸ Pn,j(aj(X̂
∆t
n−1) ⋅ ∆t) are conditionally independent Poisson random
variables with rate aj(X̂
∆t
n−1)∆t.
15. Typical Computational Tasks in the Context of SRNs
Estimation of the expected value of a given functional, g, of the SRNs,
{X(t) ∶ t ∈ [0,T]}, at a certain time t, i.e., E[g(X(t))].
1 Example 1: Expected counting number of the i-th species, i.e.,
E[X(i)
(T)].
2 Example 2: Expected hitting times of X, i.e., E[X(τB)], where
τB ∶= inf{t ∈ R+ ∶ X(t) ∈ B,B ⊆ Nd
}.
▸ E.g., The time of the sudden extinction of one of the species.
3 Example 3: Rare event probabilities, i.e.,
E[1{X(T)∈B}] = P(X(T) ∈ B) ≪ 1, for a set B ⊆ Nd
.
" Rare events are very critical in many applications (e.g., number
of intensive care unit (ICU) beds during pandemics).
⇒ One needs to design efficient Monte Carlo (MC) methods for these
tasks.
9
16. Monte Carlo (MC) Estimator
A MC estimator for E[g(X(T))] based on the TL approximate
scheme is given by
M∆t
M ∶=
1
M
M
∑
j=1
g(X̂
∆t
N,[j]),
▸ X̂
∆t
[j] for j = 1,...,M are iid sampled TL paths with step size ∆t.
The global error can be expressed as follows
∣E[g(X(T))] − M∆t
M ∣ ≤ ∣E[g(X(T))] − E[g(X̂
∆t
N )]∣
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Bias
+∣E[g(X̂
∆t
N )] − M∆t
M ∣
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Statistical Error
.
▸ The bias is of order O(∆t). (Li 2007)
▸ The statistical error (by the Central Limit Theorem) is
approximated by Cα ⋅
√
Var[g(X̂
∆t
N )]
M
, where Cα is the
(1 − α
2
)-quantile of the standard normal distribution.
10
17. Illustration: Rare Events in SRNs
Recall: Rare event probabilities:
q ∶= E[1{X(T)∈B}] = P(X(T) ∈ B) ≪ 1, for a set B ⊆ Nd
.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
P(C(T = 1) > 22) ≈ 10−5
Example: Michaelis-Menten enzym kinetics
11
18. Crude Monte Carlo (MC) Estimator
A MC estimator based on the TL approximate scheme is given by
q ≈ M∆t
M ∶=
1
M
M
∑
j=1
1{X̂
∆t
N,[j]>γ}
,
X̂
∆t
[j] for j = 1,...,M are iid sampled TL paths with step size ∆t.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
The relative statistical error: Cα
q
√
Var[1{X̂
∆t
N >γ}
]
M = Cα
q
√
q(1−q)
M ∝
√
1
qM .
12
19. Crude Monte Carlo (MC) Estimator
A MC estimator based on the TL approximate scheme is given by
q ≈ M∆t
M ∶=
1
M
M
∑
j=1
1{X̂
∆t
N,[j]>γ}
,
X̂
∆t
[j] for j = 1,...,M are iid sampled TL paths with step size ∆t.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
For a relative tolerance of order TOL, it requires M ≈
C2
α
q×TOL2 paths,
i.e., to achieve TOL = 5% for q ≈ 10−5
, we require M ∼ 2 ⋅ 108
.
12
20. Importance Sampling (IS)
Let ρ̂Z be the pdf of a new random variable Z, such that g ⋅ ρY is
dominated by ρ̂Z:
ρ̂Z(x) = 0 Ô⇒ g(x) ⋅ ρY (x) = 0, for all x ∈ R.
Then, the quantity of interest can be rewritten as
E[g(Y )] = ∫
R
g(x)ρY (x)dx = ∫
R
g(x)
ρY (x)
ρ̂Z(x)
´¹¹¹¹¹¹¸¹¹¹¹¹¹¹¶
L(x)
(likelihood
factor)
⋅ρ̂Z(x)dx = E[L(Z) ⋅ g(Z)].
Idea: Introduce a new probability measure (sampling on regions with
the most effect on the QoI), which reduces Var[g(Y )] but keeps
E[g(Y )] unchanged
small variance
Var[L(Z) ⋅ g(Z)]
small number
of MC samples
low computa-
tional effort
21. Importance Sampling (IS) for SRNs
M∆t,IS
M is an unbiased estimator (E[M∆t
M ] = E[M∆t,IS
M ])
Standard TL Importance sampling -TL
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
(under
IS)
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
M∆t
M ∶= 1
M ∑M
j=1 1{X̂
∆t
N,[j]>γ}
MI S
M = 1
M ∑M
i=1 Li ⋅ 1{X
∆t,IS
[i],N >γ}
Question: How to choose systematically the IS measure to achieve
V ar[L ⋅ 1{X
∆t,IS
N >γ}
] << V ar[1{X̂
∆t
N >γ}
]?
14
22. Importance Sampling (IS) for SRNs
M∆t,IS
M is an unbiased estimator (E[M∆t
M ] = E[M∆t,IS
M ])
Standard TL Importance sampling -TL
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
(under
IS)
=22
enzyme (E)
substrate (S)
complex (C)
product (P)
M∆t
M ∶= 1
M ∑M
j=1 1{X̂
∆t
N,[j]>γ}
MI S
M = 1
M ∑M
i=1 Li ⋅ 1{X
∆t,IS
[i],N >γ}
Question: How to choose systematically the IS measure to achieve
V ar[L ⋅ 1{X
∆t,IS
N >γ}
] << V ar[1{X̂
∆t
N >γ}
]?
23. Aim and Setting
Design a computationally efficient MC estimator for
E[g(X(T))] using IS:
▸ We are interested in g(X(T)) = 1{X(T )∈B} for a set B ⊆ Nd
for
rare event applications: E[g(X(T))] = P(X(T) ∈ B) ≪ 1
▸ {X(t) ∶ t ∈ [0,T]} is a SRNs.
Challenge
IS often requires insights into the given problem.
Solution
Propose a generic/systematic path dependent measure change based
on a novel connection between finding optimal IS parameters and a
SOC formulation, corresponding to solving a variance minimization
problem.
15
24. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
25. Introduction of the IS Scheme
Recall the TL approximate scheme for SRNs with step size ∆t:
X̂
∆t
n+1 = max
⎛
⎝
0,X̂
∆t
n +
J
∑
j=1
νjPn,j (aj(X̂
∆t
n )∆t)
⎞
⎠
, n = 0,...,N − 1
We introduce the following change of measure:2
Pn,j = Pn,j(δ∆t
n,j(X
∆t
n )∆t), n = 0,...,N − 1,j = 1,...,J,
where δ∆t
n,j(x) ∈ Ax,j is the control parameter at time step n,
under reaction j and in state x ∈ Nd
for an admissible set of
Ax,j =
⎧
⎪
⎪
⎨
⎪
⎪
⎩
{0} ,if aj(x) = 0
{y ∈ R ∶ y > 0} ,otherwise
.
Challenge: Number of IS parameters exponential in dimension d
2
A similar class of measure change was previously introduced in (Ben Hammouda
et al. 2020) to improve the MLMC estimator robustness and performance.
26. SOC Formulation for the IS scheme
Aim: Find IS parameters which result in the lowest possible variance
Value Function
Let u∆t(⋅,⋅) be the value function which gives the optimal second
moment. For time step 0 ≤ n ≤ N and state x ∈ Nd
:
u∆t(n,x) ∶= inf
{δ∆t
i }i=n,...,N−1∈AN−n
E[g2
(X
∆t
N )
N−1
∏
i=n
Li(P i,δ∆t
i (X
∆t
i ))
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
likelihood factor
2
∣X
∆t
n = x]
Notation:
A = ⨉x∈Nd ⨉J
j=1 Ax,j is the admissible set for the IS parameters.
Li(P i,δ∆t
i (X
∆t
i )) =
exp(−(∑J
j=1 aj(X
∆t
i ) − δ∆t
i,j (X
∆t
i ))∆t) ⋅ ∏J
j=1 (
aj(X
∆t
i )
δ∆t
i,j (X
∆t
i )
)
Pi,j
(P i)j ∶= Pi,j and (δi)j ∶= δi,j
27. Dynamic Programming (DP) for IS Parameters
Theorem (Ben Hammouda et al. 2023a)
For x ∈ Nd
and given step size ∆t > 0, the value function u∆t(⋅,⋅) fulfills
the following dynamic programming relation for n = N − 1,...,0
u∆t(n,x) = inf
δ∆t
n (x)∈Ax
exp
⎛
⎝
⎛
⎝
−2
J
∑
j=1
aj(x) +
J
∑
j=1
δ∆t
n,j(x)
⎞
⎠
∆t
⎞
⎠
× ∑
p∈NJ
J
∏
j=1
(∆t ⋅ δ∆t
n,j(x))pj
pj!
(
aj(x)
δ∆t
n,j(x)
)2pj
⋅ u∆t(n + 1,max(0,x + ν ⋅ p))
for x ∈ Nd
, Ax ∶= ⨉J
j=1 Ax,j and final condition u∆t(N,x) = g2
(x).
" Solving the above minimization problem is challenging due to the
infinite sum.
Notation:
ν = (ν1,...,νJ ) ∈ Zd×J
.
28. HJB equations for IS Parameters
For x ∈ Nd
, let the continuous-time value function ũ(⋅,x) ∶ [0,T] → R,
be the limit of the discrete value function u∆t(⋅,x) as ∆t → 0.
Corollary (Ben Hammouda et al. 2023b)
For x ∈ Nd
, the continuous-time value function ũ(t,x) fulfills the
Hamilton-Jacobi-Bellman (HJB) equations for t ∈ [0,T]
ũ(T,x) = g2
(x)
−
dũ
dt
(t,x) = inf
δ(t,x)∈Ax
⎛
⎝
−2
J
∑
j=1
aj(x) +
J
∑
j=1
δj(t,x)
⎞
⎠
ũ(t,x)
+
J
∑
j=1
aj(x)2
δj(t,x)
ũ(t,max(0,x + νj)),
where δj(t,x) ∶= (δ(t,x))j.
⊖Computational cost to solve HJB equations scales
exponentially with dimension d.
19
29. Additional Notes on the HJB Equations
If ũ(t,x) > 0 for all x ∈ Nd
and t ∈ [0,T], the HJB simplifies to:
ũ(T,x) = g2
(x)
dũ
dt
(t,x) = −2
J
∑
j=1
aj(x)(
√
ũ(t,x)ũ(t,max(0,x + νj)) − ũ(t,x))
The corresponding near-optimal control is given by
δ̃j(t,x) = aj(x)
¿
Á
Á
Àũ(t,max(0,x + νj))
ũ(t,x)
(2)
For rare event probabilities, we approximate the observable g(x) = 1xi>γ
by a sigmoid:
g̃(x) =
1
1 + exp(b − βxi)
,
with appropriately chosen parameters b ∈ R and β ∈ R.
⊖ Computational cost to solve HJB equations scales
exponentially with dimension d.
20
30. Additional Notes on the HJB Equations
If ũ(t,x) > 0 for all x ∈ Nd
and t ∈ [0,T], the HJB simplifies to:
ũ(T,x) = g2
(x)
dũ
dt
(t,x) = −2
J
∑
j=1
aj(x)(
√
ũ(t,x)ũ(t,max(0,x + νj)) − ũ(t,x))
The corresponding near-optimal control is given by
δ̃j(t,x) = aj(x)
¿
Á
Á
Àũ(t,max(0,x + νj))
ũ(t,x)
(2)
For rare event probabilities, we approximate the observable g(x) = 1xi>γ
by a sigmoid:
g̃(x) =
1
1 + exp(b − βxi)
,
with appropriately chosen parameters b ∈ R and β ∈ R.
⊖ Computational cost to solve HJB equations scales
exponentially with dimension d.
20
31. Additional Notes on the HJB Equations
If ũ(t,x) > 0 for all x ∈ Nd
and t ∈ [0,T], the HJB simplifies to:
ũ(T,x) = g2
(x)
dũ
dt
(t,x) = −2
J
∑
j=1
aj(x)(
√
ũ(t,x)ũ(t,max(0,x + νj)) − ũ(t,x))
The corresponding near-optimal control is given by
δ̃j(t,x) = aj(x)
¿
Á
Á
Àũ(t,max(0,x + νj))
ũ(t,x)
(2)
For rare event probabilities, we approximate the observable g(x) = 1xi>γ
by a sigmoid:
g̃(x) =
1
1 + exp(b − βxi)
,
with appropriately chosen parameters b ∈ R and β ∈ R.
⊖ Computational cost to solve HJB equations scales
exponentially with dimension d.
20
32. Our Approaches to Address the Curse of Dimensionality
Optimal IS (control) parameters: can
be found by the HJB equation using (2)
⊖ curse of dimensionality
Learning-based approach
(Ben Hammouda et al.
2023a):
● use a parametrized ansatz
function for the value function
with parameter set β
● learned β by stochastic opti-
mization
Markovian projection-based
approach (Ben Hammouda
et al. 2023b):
● reduce dimension of SRNs by
Markovian Projection (poten-
tially even to one)
● solve significantly lower di-
mensional HJB
Suitable when the dimension
after MP projection is very low
Suitable when a
relevant ansatz exists
Combined approaches are possible
21
33. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
34. Learning-based Approach: Steps
1 Use an ansatz function, û(t,x;β), to approximate the value function :
u∆t(n,x) = inf
{δ∆t
i }i=n,...,N−1∈AN−n
E[g2
(X
∆t
N )
N−1
∏
i=n
Li(P̄ i,δ∆t
i (X
∆t
i ))
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
likelihood factor
2
∣X
∆t
n = x].
Illustration: For the observable g(X(T)) = 1{Xi(T)>γ} , we use the ansatz
û(t,x;β) =
1
1 + e−(1−t)⋅(<βspace
,x>+βtime)+b0−<β0,x>
, t ∈ [0,1], x ∈ Nd
learned parameters β = (βspace
,βtime
) ∈ Rd+1
, and
b0 and β0 are chosen to fit the final condition at time T (not learned)
Example sigmoid for d = 1:
● final fit (t = 1) for g(x) = 1{xi>10}
→ b0 = 14,β0 = 1.33
22
35. Learning-Based Approach: Steps
2 Learn/Find the parameters β = (βspace
,βtime
) ∈ Rd+1
which
minimize the second moment under IS:
inf
β∈Rd+1
E[g2
(X
∆t,β
N )
N−1
∏
k=0
L2
k (P̄ k,δ̂
∆t
(k,X
∆t,β
k ;β))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
=∶C0,X (δ̂
∆t
0 ,...,δ̂
∆t
N−1;β)
,
▸ (δ̂
∆t
(n,x;β))
j
= δ̂∆t
j (n,x;β) =
aj (x)
√
û∆t(
(n+1)∆t
T ,max(0,x+νj );β)
√
û∆t(
(n+1)∆t
T ,x;β)
▸ {X
∆t,β
n }n=1,...,N is an IS path generated with {δ̂
∆t
(n,x;β)}n=1,...,N
→ Use stochastic optimization (e.g. stochastic gradient descent)
(Kingma et al. 2014)
" We derived explicit pathwise derivatives.
23
36. Learning-Based Approach: Steps
3 Learn/Find the parameters β = (βspace
,βtime
) ∈ Rd+1
which
minimize the second moment under IS:
inf
β∈Rd+1
E[g2
(X
∆t,β
N )
N−1
∏
k=0
L2
k (P̄ k,δ̂
∆t
(k,X
∆t,β
k ;β))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
=∶C0,X (δ̂
∆t
0 ,...,δ̂
∆t
N−1;β)
,
▸ (δ̂
∆t
(n,x;β))
j
= δ̂∆t
j (n,x;β) =
aj (x)
√
û∆t(
(n+1)∆t
T ,max(0,x+νj );β)
√
û∆t(
(n+1)∆t
T ,x;β)
▸ {X
∆t,β
n }n=1,...,N is an IS path generated with {δ̂
∆t
(n,x;β)}n=1,...,N
→ Use stochastic optimization (e.g. stochastic gradient descent)
(Kingma et al. 2014)
" We derived explicit pathwise derivatives.
37. Partial Derivatives of the Second Moment
Lemma ((Ben Hammouda et al. 2023a))
The partial derivatives of the second moment C0,X (δ̂
∆t
0 ,...,δ̂
∆t
N−1;β)
with respect to βl, l = 1,...,(d + 1), are given by
∂
∂βl
E
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
g2
(X
∆t,β
N )
N−1
∏
k=0
L2
k (P̄ k,δ̂
∆t
(k,X
∆t,β
k ;β))
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
=∶R(X0;β)
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
= E
⎡
⎢
⎢
⎢
⎢
⎢
⎣
R(X0;β)
⎛
⎜
⎝
N−1
∑
k=1
J
∑
j=1
⎛
⎜
⎝
∆t −
P̄k,j
δ̂∆t
j (k,X
∆t,β
k ;β)
⎞
⎟
⎠
⋅
∂
∂βl
δ̂∆t
j (k,X
∆t,β
k ;β)
⎞
⎟
⎠
⎤
⎥
⎥
⎥
⎥
⎥
⎦
,
where
{X
∆t,β
n }n=1,...,N is an IS path generated with {δ̂
∆t
(n,x;β)}n=1,...,N
∂
∂βl
δ̂∆t
j (k,X;β) is found in a closed form
38. Learning-Based Approach: Steps
4 IS Samples paths: Use the optimal IS parameters (the output of
the stochastic optimization step)
δ̂j(n,x;β∗
) =
aj(x)
√
û((n+1)∆t
T
,max(0,x + νj);β∗
)
√
û((n+1)∆t
T
,x;β∗
)
, 0 ≤ n ≤ N − 1, x ∈ Nd
,
,1 ≤ j ≤ J
to simulate M IS sample path with their corresponding likelihood
factors
▸ X
∆t,β∗
[i],N : the i-th IS sample path, 1 ≤ i ≤ M.
▸ Lβ∗
i : the corresponding likelihood factor, 1 ≤ i ≤ M.
5 Estimate E[g(X(T))] using the MC-IS estimator
µIS
M,∆t =
1
M
M
∑
i=1
Lβ∗
i ⋅ g(X
∆t,β∗
[i],N ).
25
39. Learning-based Approach: Illustration
full-
dimensional
SRN
X(t) ∈ Rd
Paramterized
Ansatz
for Value
Function
Parameter Learning for Value Function
IS forward run
to derive gradient
Efficient MC-
IS estimator
Optimal IS Paths
IS Sample Paths for Training
Parameter Update via
Stochastic Optimization
26
40. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
41. Examples
Michaelis-Menten enzyme kinetics (d=4, J=3) (Rao et al. 2003)
E + S
θ1
→ C, C
θ2
→ E + S, C
θ3
→ E + P,
▸ initial states X0 = (E(0),S(0),C(0),P(0))⊺
= (100,100,0,0)⊺
,
▸ θ = (0.001,0.005,0.01)⊺
,
▸ final time T = 1, and
▸ observable g(X(T)) = 1{X3(T )>22} → P(X3(T) > 22) ≈ 10−5
Enzymatic futile cycle model (d=6, J=6) (Kuwahara et al. 2008)
R1 ∶ S1 + S2
θ1
Ð→ S3, R2 ∶ S3
θ2
Ð→ S1 + S2, R3 ∶ S3
θ3
Ð→ S1 + S5,
R4 ∶ S4 + S5
θ4
Ð→ S6, R5 ∶ S6
θ5
Ð→ S4 + S5, R6 ∶ S6
θ6
Ð→ S4 + S2.
▸ initial states (S1(0),...,S6(0)) = (1,50,0,1,50,0)
▸ θ1 = θ2 = θ4 = θ5 = 1, and θ3 = θ6 = 0.1,
▸ final time T = 2, and
▸ observable g(X(T)) = 1{X5(T )>60} → P(X5(T) > 60) ≈ 10−6
27
42. Learning-based IS Results:
Michaelis-Menten enzyme kinetics (d=4, J=3)
g(X(T)) = 1{X3(T)>22} → P(X3(T) > 22) ≈ 10−5
0 5 10 15 20 25 30 35 40 45 50
Optimizer steps
0.5
1
1.5
2
mean
10
-5
proposed approach
standard TL
0 5 10 15 20 25 30 35 40 45 50
Optimizer steps
10
1
102
10
3
10
4
105
squared
coefficient
of
variation
proposed approach
standard TL
0 5 10 15 20 25 30 35 40 45 50
Optimizer steps
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Parameter
value
space
1
space
2
space
3
space
4
time
0 5 10 15 20 25 30 35 40 45 50
Optimizer steps
10
2
10
3
10
4
10
5
kurtosis
proposed approach
standard TL
Variance reduction of a factor 4 × 103
after few iterations (∼ 5 iterations).
28
43. Learning-based IS Results:
Enzymatic futile cycle (d=6, J=6)
g(X(T)) = 1{X5(T)>60} → P(X5(T) > 60) ≈ 10−6
0 10 20 30 40 50 60 70 80 90 100
Optimizer steps
10
-8
10
-7
10
-6
10
-5
10-4
10-3
mean
proposed approach
standard TL
0 10 20 30 40 50 60 70 80 90 100
Optimizer steps
104
10
5
squared
coefficient
of
variation
proposed approach
standard TL
0 10 20 30 40 50 60 70 80 90 100
Optimizer steps
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Parameter
value
space
1
space
2
space
3
space
4
space
5
space
6
time
0 10 20 30 40 50 60 70 80 90 100
Optimizer steps
10
4
10
5
kurtosis
proposed approach
standard TL
Variance reduction of a factor 50 after 43 iterations.
29
44. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
45. Our Approaches to Address the Curse of Dimensionality
Optimal IS (control) parameters: can
be found by the HJB equation using (2)
⊖ curse of dimensionality
Learning-based approach
(Ben Hammouda et al.
2023a):
● use a parametrized ansatz
function for the value function
with parameter set β
● learned β by stochastic opti-
mization
Markovian projection-based
approach (Ben Hammouda
et al. 2023b):
● reduce dimension of SRNs by
Markovian Projection (poten-
tially even to one)
● solve significantly lower di-
mensional HJB
Suitable when the dimension
after MP projection is very low
Suitable when a
relevant ansatz exists
Combined approaches are possible
30
46. MP-based Approach: Illustration
full-
dimensional
SRN
X(t) ∈ Rd
projected SRN
S̄(t) ∈ R¯
d, d̄ ≪ d
Markovian
Projection
solving (reduced
dimensional)
d̄-dim HJB equ.
projected
IS controls
IS forward run
for (full-dimensional)
d-dim SRN
Efficient MC-IS estimator
31
47. Markovian Projection (MP): Motivation
Recall: A SRN X(t) is characterized by (Ethier et al. 2009)
X(t) = x0 +
J
∑
j=1
Yj (∫
t
0
aj(X(s))ds) ⋅ νj, (3)
where Yj ∶ R+×Ω → N are independent unit-rate Poisson processes.
Let P be a projection to a ¯
d-dimensional space (1 ≤ ¯
d ≪ d),
P ∶ Rd
→ R
¯
d
∶ x ↦ P ⋅ x,
" While X is Markovian, S(t) ∶= P ⋅ X(t) is non-Markovian.
⇒ We want construct a low dimensional Markovian process that
mimics the evolution of S.
32
48. Markovian Projection (MP): Motivation
Recall: A SRN X(t) is characterized by (Ethier et al. 2009)
X(t) = x0 +
J
∑
j=1
Yj (∫
t
0
aj(X(s))ds) ⋅ νj, (3)
where Yj ∶ R+×Ω → N are independent unit-rate Poisson processes.
Let P be a projection to a ¯
d-dimensional space (1 ≤ ¯
d ≪ d),
P ∶ Rd
→ R
¯
d
∶ x ↦ P ⋅ x,
" While X is Markovian, S(t) ∶= P ⋅ X(t) is non-Markovian.
⇒ We want construct a low dimensional Markovian process that
mimics the evolution of S.
32
49. Markovian Projection: Illustration
Aim: Construct a low dimensional Markovian process that mimics the
evolution of S ∶= P ⋅ X(t), where P be a projection to a ¯
d-dimensional
space (1 ≤ ¯
d ≪ d), i.e., P ∶ Rd
→ R
¯
d
∶ x ↦ P ⋅ x,
The choice of the projection depends on the QoI, e.g., for observable
g(x) = 1{xi>γ}, a suitable projection is
P(x) = ⟨(0, . . . , 0
i−1
, 1
i
, 0
i+1
, . . . , 0)⊺
, x⟩ .
Example: Michaelis-Menten enzym kinetics
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
enzyme (E)
substrate (S)
complex (C)
product (P)
Markovian Proj.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time
0
10
20
30
40
50
60
70
80
90
100
number
of
molecules
MP complex (C)
20 TL sample paths in d = 4 20 MP sample paths in ¯
d = 1
33
50. Markovian Projection for SRNs
For t ∈ [0,T], let us consider the projected process as S(t) ∶= P ⋅ X(t),
where X(t) follows X(t) = x0 + ∑J
j=1 Yj (∫
t
0 aj(X(s))ds)νj.
Theorem ((Ben Hammouda et al. 2023b))
For t ∈ [0,T], let S̄(t) be a ¯
d-dimensional stochastic process, whose
dynamics are given by
S̄(t) = P(x0) +
J
∑
j=1
Ȳj (∫
t
0
āj(τ,S̄(τ))dτ)P(νj)
´¹¹¹¹¹¹¸¹¹¹¹¹¹¶
=∶ν̄j
,
where Ȳj are independent unit-rate Poisson processes and āj are
characterized by
āj(t,s) ∶= E[aj(X(t))∣P (X(t)) = s,X(0) = x0], for 1 ≤ j ≤ J,s ∈ N
¯
d
.
Then, S(t) ∣{X(0)=x0} and S̄(t) ∣{X(0)=x0} have the same conditional
distribution for all t ∈ [0,T].
34
51. Propensities of the Projected Process
Under MP, the propensity becomes time-dependent
āj(t,s) ∶= E[aj(X(t)) ∣ P (X(t)) = s;X(0) = x0], for 1 ≤ j ≤ J,s ∈ N
¯
d
The index set of the projected propensities is (#JMP ≤ J)
JMP ∶= {1 ≤ j ≤ J ∶ P(νj) ≠ 0 and aj(x) ≠ f(P(x)) ∀f ∶ R
¯
d
→ R}.
To approximate āj for j ∈ JMP , we use discrete L2
regression:
āj(⋅,⋅) = argminh∈V ∫
T
0
E[(aj(X(t)) − h(t,P(X(t))))
2
]dt
≈ argminh∈V
1
M
M
∑
m=1
1
N
N−1
∑
n=0
(aj(X̂∆t
[m],n) − h(tn,P(X̂∆t
[m],n)))
2
▸ V ∶= {h ∶ [0,T] × R
¯
d
→ R ∶ ∫
T
0 E[h(t,P(X(t))2
)]dt < ∞}
▸ {X̂∆t
[m]}
M
m=1
are M independent TL paths on a uniform time grid
0 = t0 < t1 < ⋅⋅⋅ < tN = T with step size ∆t.
52. Propensities of the Projected Process
Under MP, the propensity becomes time-dependent
āj(t,s) ∶= E[aj(X(t)) ∣ P (X(t)) = s;X(0) = x0], for 1 ≤ j ≤ J,s ∈ N
¯
d
The index set of the projected propensities is (#JMP ≤ J)
JMP ∶= {1 ≤ j ≤ J ∶ P(νj) ≠ 0 and aj(x) ≠ f(P(x)) ∀f ∶ R
¯
d
→ R}.
To approximate āj for j ∈ JMP , we use discrete L2
regression:
āj(⋅,⋅) = argminh∈V ∫
T
0
E[(aj(X(t)) − h(t,P(X(t))))
2
]dt
≈ argminh∈V
1
M
M
∑
m=1
1
N
N−1
∑
n=0
(aj(X̂∆t
[m],n) − h(tn,P(X̂∆t
[m],n)))
2
▸ V ∶= {h ∶ [0,T] × R
¯
d
→ R ∶ ∫
T
0 E[h(t,P(X(t))2
)]dt < ∞}
▸ {X̂∆t
[m]}
M
m=1
are M independent TL paths on a uniform time grid
0 = t0 < t1 < ⋅⋅⋅ < tN = T with step size ∆t. 35
53. Propensities of the Projected Process
Under MP, the propensity becomes time-dependent
āj(t,s) ∶= E[aj(X(t)) ∣ P (X(t)) = s;X(0) = x0], for 1 ≤ j ≤ J,s ∈ N
¯
d
The index set of the projected propensities is (#JMP ≤ J)
JMP ∶= {1 ≤ j ≤ J ∶ P(νj) ≠ 0 and aj(x) ≠ f(P(x)) ∀f ∶ R
¯
d
→ R}.
To approximate āj for j ∈ JMP , we use discrete L2
regression:
āj(⋅,⋅) = argminh∈V ∫
T
0
E[(aj(X(t)) − h(t,P(X(t))))
2
]dt
≈ argminh∈V
1
M
M
∑
m=1
1
N
N−1
∑
n=0
(aj(X̂∆t
[m],n) − h(tn,P(X̂∆t
[m],n)))
2
▸ V ∶= {h ∶ [0,T] × R
¯
d
→ R ∶ ∫
T
0 E[h(t,P(X(t))2
)]dt < ∞}
▸ {X̂∆t
[m]}
M
m=1
are M independent TL paths on a uniform time grid
0 = t0 < t1 < ⋅⋅⋅ < tN = T with step size ∆t. 35
54. Importance Sampling via Markovian Projection
full-dimensionalSRN
X(t) ∈ Rd
projectedSRN
S̄(t) ∈ R
¯
d
Markovian
Projection
solving
reduceddi-
mensionHJB
equations
projectedIS
parameters
ISforward
runforfull-
dimensional
SRN
EfficientMC-IS
estimator
IS sample paths
SOC formulation
1
○ Perform the MP by using a L2
regression to derive āj(t,s)
36
55. Importance Sampling via Markovian Projection
full-dimensionalSRN
X(t) ∈ Rd
projectedSRN
S̄(t) ∈ R
¯
d
Markovian
Projection
solving
reduceddi-
mensionHJB
equations
projectedIS
parameters
ISforward
runforfull-
dimensional
SRN
EfficientMC-IS
estimator
IS sample paths
SOC formulation
2
○ For t ∈ [0,T], solve the reduced-dimensional HJB equations
corresponding to the MP process
ũ¯
d(T,s) = g̃2
(s), s ∈ N
¯
d
dũ¯
d
dt
(t,s) = −2
J
∑
j=1
āj(t,s)(
√
ũ¯
d(t,s)ũ¯
d(t,max(0,s + ν̄j)) − ũ¯
d(t,s)),s ∈ N
¯
d
.
36
56. Importance Sampling via Markovian Projection
full-dimensionalSRN
X(t) ∈ Rd
projectedSRN
S̄(t) ∈ R
¯
d
Markovian
Projection
solving
reduceddi-
mensionHJB
equations
projectedIS
parameters
ISforward
runforfull-
dimensional
SRN
EfficientMC-IS
estimator
IS sample paths
SOC formulation
3
○ Construct the MP-IS-MC estimator composed of IS-TL paths with
a uniform grid 0 = t0 ≤ t1 ≤ ⋅⋅⋅ ≤ tN = T and the IS controls
δ̄j(tn,x) = aj(x)
¿
Á
Á
Àũ¯
d (tn,max(0,P(x + νj)))
ũ¯
d (tn,P(x))
, for x ∈ Nd
,n = 0,...,N − 1.
36
57. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
58. Examples
Michaelis-Menten enzyme kinetics (d=4, J=3) (Rao et al. 2003)
E + S
θ1
→ C, C
θ2
→ E + S, C
θ3
→ E + P,
▸ initial states X0 = (E(0),S(0),C(0),P(0))⊺
= (100,100,0,0)⊺
,
▸ θ = (0.001,0.005,0.01)⊺
,
▸ final time T = 1, and
▸ observable g(X(T)) = 1{X3(T )>22} → P(X3(T) > 22) ≈ 10−5
Goutsias’s model of regulated transcription (d=6, J=10) (Goutsias
2005; Kang et al. 2013):
RNA
θ1
→ RNA + M, M
θ2
→ ∅,
DNA ⋅ D
θ3
→ RNA + DNA ⋅ D, RNA
θ4
→ ∅,
DNA + D
θ5
→ DNA ⋅ D, DNA ⋅ D
θ6
→ DNA + D,
DNA ⋅ D + D
θ7
→ DNA ⋅ 2D, DNA ⋅ 2D
θ8
→ DNA ⋅ D + D,
2 ⋅ M
θ9
→ D, D
θ10
→ 2 ⋅ M,
▸ X0 = (M(0),D(0),RNA(0),DNA(0),DNA ⋅ D(0),DNA ⋅ 2D(0)) =
(2,6,0,0,2,0)
▸ final time T = 1, and
▸ observable g(X(T)) = 1{X2(T )>8} → P(X2(T) > 8) ≈ 10−3
37
59. MP Results
(a) Michaelis-Menten enzyme kinetics
(d = 4, J = 3, ¯
d = 1)
(b) Goutsias’ model of regulated
transcription (d = 6, J = 10, ¯
d = 1)
Figure 4.1: Relative occurrences of states at final time T with 104
sample
paths comparing the TL estimate of P(X(t)) ∣{X0=x0} and the MP estimate of
S̄(T) ∣{X0=x0}.
38
60. MP-IS Results:
Michaelis-Menten enzyme kinetics (d = 4,J = 3, ¯
d = 1)
g(X(T)) = 1{X3(T)>22} → P(X3(T) > 22) ≈ 10−5
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
t
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
sample
mean
10
-5
MP-IS
standard TL
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
t
10-1
10
0
101
10
2
103
104
10
5
10
6
squared
coefficient
of
variation
MP-IS
standard TL
Variance reduction of a factor 106
for ∆t = 2−10
.
39
61. MP-IS Results:
Goutsias’ model (d=6, J=10, ¯
d = 1)
g(X(T)) = 1{X2(T)>8} → P(X2(T) > 8) ≈ 10−3
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
t
1
2
3
4
5
6
7
sample
mean
10
-3
MP-IS
standard TL
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
t
100
101
102
10
3
squared
coefficient
of
variation
MP-IS
standard TL
Variance reduction of a factor 500 for ∆t = 2−10
.
40
62. Remark: Adaptive MP
There could exist examples, where a projection to dimension ¯
d = 1
is not sufficient to achieve a desired variance reduction.
In this case, one can adaptively increase the dimension of
projection ¯
d = 1,2,... until a sufficient variance reduction is
achieved.
This comes with an increased computational cost in the MP and
in solving the projected HJB equations.
41
63. 1 Framework and Motivation
2 Optimal Path Dependent Importance Sampling (IS) via Stochastic
Optimal Control (SOC)
3 Address the Curse of Dimensionality: Learning-based Approach
Formulation
Numerical Experiments and Results
4 Address the Curse of Dimensionality: Markovian Projection
(MP)-based Approach
Formulation
Numerical Experiments and Results
5 Conclusions
64. Conclusion and Contributions
1 Design of efficient Monte Carlo (MC) estimators for rare event
probabilities for a particular class of continuous-time Markov
chains, namely stochastic reaction networks (SRNs).
2 Automated path dependent measure change is derived based on a
connection between finding optimal importance sampling (IS)
parameters and a stochastic optimal control (SOC) formulation.
3 Address the curse of dimensionality when solving the SOC
problem
(a) Learning-based approach for the value function and controls of
an approximate dynamic programming problem, via stochastic
optimization
(b) Markovian projection-based approach to solve a significantly
reduced-dimensional Hamilton-Jacobi-Bellman (HJB) equation.
4 Our analysis and numerical experiments in (Ben Hammouda et al.
2023b; Ben Hammouda et al. 2023a) show that the proposed
approaches substantially reduces MC estimator variance, resulting
in a lower computational complexity in the rare event regime than
standard MC estimators. 42
65. Related References
Thank you for your attention!
[1] C. Ben Hammouda, N. Ben Rached, R. Tempone, S. Wiechert. Automated
Importance Sampling via Optimal Control for Stochastic Reaction
Networks: A Markovian Projection-based Approach. arXiv preprint
arXiv:2306.02660 (2023).
[2] C. Ben Hammouda, N. Ben Rached, R. Tempone, S. Wiechert.
Learning-based importance sampling via stochastic optimal control for
stochastic reaction networks. Statistics and Computing, 33, no. 3 (2023).
[3] C. Ben Hammouda, N. Ben Rached, R. Tempone. Importance sampling for
a robust and efficient multilevel Monte Carlo estimator for stochastic
reaction networks. Statistics and Computing, 30, no. 6 (2020).
[4] C. Ben Hammouda, A. Moraes, R. Tempone. Multilevel hybrid split-step
implicit tau-leap. Numerical Algorithms, 74, no. 2 (2017).
43
66. References I
[1] David F Anderson. “A modified next reaction method for simulating chemical
systems with time dependent propensities and delays”. In: The Journal of chemical
physics 127.21 (2007), p. 214107.
[2] David F Anderson and Thomas G Kurtz. Stochastic analysis of biochemical systems.
Springer, 2015.
[3] C. Ben Hammouda et al. “Learning-based importance sampling via stochastic
optimal control for stochastic reaction networks”. In: Statistics and Computing 33.3
(2023), p. 58.
[4] Chiheb Ben Hammouda, Alvaro Moraes, and Raúl Tempone. “Multilevel hybrid
split-step implicit tau-leap”. In: Numerical Algorithms 74.2 (2017), pp. 527–560.
[5] Chiheb Ben Hammouda, Nadhir Ben Rached, and Raúl Tempone. “Importance
sampling for a robust and efficient multilevel Monte Carlo estimator for stochastic
reaction networks”. In: Statistics and Computing 30.6 (2020), pp. 1665–1689.
[6] Chiheb Ben Hammouda et al. “Automated Importance Sampling via Optimal
Control for Stochastic Reaction Networks: A Markovian Projection-based Approach”.
In: arXiv preprint arXiv:2306.02660 (2023).
[7] Fred Brauer, Carlos Castillo-Chavez, and Carlos Castillo-Chavez. Mathematical
models in population biology and epidemiology. Vol. 2. Springer, 2012.
44
67. References II
[8] Corentin Briat, Ankit Gupta, and Mustafa Khammash. “A Control Theory for
Stochastic Biomolecular Regulation”. In: SIAM Conference on Control Theory and
its Applications. SIAM. 2015.
[9] Stewart N Ethier and Thomas G Kurtz. Markov processes: characterization and
convergence. Vol. 282. John Wiley & Sons, 2009.
[10] D. T. Gillespie. “Approximate accelerated stochastic simulation of chemically
reacting systems”. In: Journal of Chemical Physics 115 (July 2001), pp. 1716–1733.
doi: 10.1063/1.1378322.
[11] Daniel Gillespie. “Approximate accelerated stochastic simulation of chemically
reacting systems”. In: The Journal of chemical physics 115.4 (2001), pp. 1716–1733.
[12] Daniel T Gillespie. “A general method for numerically simulating the stochastic time
evolution of coupled chemical reactions”. In: Journal of computational physics 22.4
(1976), pp. 403–434.
[13] John Goutsias. “Quasiequilibrium approximation of fast reaction kinetics in
stochastic biochemical systems”. In: The Journal of chemical physics 122.18 (2005),
p. 184102.
[14] SebastianC. Hensel, JamesB. Rawlings, and John Yin. “Stochastic Kinetic Modeling
of Vesicular Stomatitis Virus Intracellular Growth”. English. In: Bulletin of
Mathematical Biology 71.7 (2009), pp. 1671–1692. issn: 0092-8240.
45
68. References III
[15] H. Solari J. Aparicio. “Population dynamics: Poisson approximation and its relation
to the langevin process”. In: Physical Review Letters (2001), p. 4183.
[16] Hye-Won Kang and Thomas G Kurtz. “Separation of time-scales and model
reduction for stochastic reaction networks”. In: (2013).
[17] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”.
In: arXiv preprint arXiv:1412.6980 (2014).
[18] Hiroyuki Kuwahara and Ivan Mura. “An efficient and exact stochastic simulation
method to analyze rare events in biochemical systems”. In: The Journal of chemical
physics 129.16 (2008), 10B619.
[19] Tiejun Li. “Analysis of explicit tau-leaping schemes for simulating chemically
reacting systems”. In: Multiscale Modeling & Simulation 6.2 (2007), pp. 417–436.
[20] NR Srinivasa Raghavan and N Viswanadham. “Stochastic models for analysis of
supply chain networks”. In: Proceedings of the 2002 American Control Conference
(IEEE Cat. No. CH37301). Vol. 6. IEEE. 2002, pp. 4714–4719.
[21] Christopher V Rao and Adam P Arkin. “Stochastic chemical kinetics and the
quasi-steady-state assumption: Application to the Gillespie algorithm”. In: The
Journal of chemical physics 118.11 (2003), pp. 4999–5010.
[22] Elijah Roberts et al. “Noise contributions in an inducible genetic switch: a whole-cell
simulation study”. In: PLoS computational biology 7.3 (2011), e1002010.
46