Workshop: Numerical Analysis of Stochastic Partial Differential Equations (NASPDE), in Network Eurandom at Eindhoven University of Technology, May 16, 2023, about my recent works (i) "Numerical Smoothing with Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for Efficient Option Pricing" (link: https://doi.org/10.1080/14697688.2022.2135455), and (ii) "Multilevel Monte Carlo with Numerical Smoothing for Robust and Efficient Computation of Probabilities and Densities" (link: https://arxiv.org/abs/2003.05708).
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
talk_NASPDE.pdf
1. Analysis of Numerical Smoothing with Hierarchical
Approximations: Applications in
Probabilities/Densities Computation and Option Pricing
Chiheb Ben Hammouda
Christian Bayer Raúl Tempone
Center for Uncertainty
Quantification
Center for Uncertainty
Quantification
Center for Uncertainty Quantification Logo Lock-up
NASPDE Workshop, Eindhoven, Netherlands
May 16, 2023
2. Related Manuscripts to the Talk
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Numerical smoothing with hierarchical adaptive sparse grids and
quasi-Monte Carlo methods for efficient option pricing”. In:
Quantitative Finance 23.2 (2023), pp. 209–227.
Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Multilevel Monte Carlo with Numerical Smoothing for Robust
and Efficient Computation of Probabilities and Densities”. In:
arXiv preprint arXiv:2003.05708 (2022).
1
3. Outline
1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
1
4. 1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
5. Framework
Goal: Approximate efficiently E[g(X(T))]
Setting:
▸ Given a (smooth) φ ∶ Rd
→ R, the function g ∶ Rd
→ R:
☀ Indicator functions: g(x) = 1(φ(x)≥0) (probabilities, digital/barrier
options, sensitivities, . . . )
☀ Dirac Delta functions: g(x) = δ(φ(x)=0) (density estimation,
sensitivities, . . . )
▸ X: solution process of a d-dimensional system of SDEs,
approximated by X (via a discretization scheme with N time steps),
E.g., stochastic volatility model: E.g., the Heston model
dXt = µXtdt +
√
vtXtdWX
t
dvt = κ(θ − vt)dt + ξ
√
vtdWv
t ,
(WX
t ,Wv
t ): correlated Wiener processes with correlation ρ.
Challenge: High-dimensional, non-smooth integration problem
E[g(X
∆t
(T))] = ∫
Rd×N
G(z)ρd×N (z)dz
(1)
1 ...dz
(1)
N ...dz
(d)
1 ...dz
(d)
N ,
with G(⋅) maps N × d random inputs to g(X
∆t
(T)); and ρd×N (z):
joint density function of z. 2
6. Multilevel Monte Carlo (MLMC)
(Heinrich 2001; Kebaier et al. 2005; Giles 2008)
A hierarchy of nested meshes of [0,T] (sequence of finer discretizations).
h` ∶= K−`
∆t0: the time steps size for levels ` ≥ 0; K>1, K ∈ N. (h0 > ... > hL)
X` ∶= Xh`
: The approximate process generated using a step size of h`.
h0 h1 hL
.....
MLMC idea
E[g(X(T))] ≈ E[g(XL(T))] = E[g(X0(T))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
+
L
∑
`=1
E[g(X`(T)) − g(X`−1(T))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
(1)
Var[g(X0(T))] ≫ Var[g(X`(T)) − g(X`−1(T))] ↘ as ` ↗
M0 ≫ M` ↘ as ` ↗
MLMC estimator: ̂
QMLMC
∶=
L
∑
`=0
̂
Q`, (sample independently each term of (1) with MC)
̂
Q0 ∶=
1
M0
M0
∑
m0=1
g(X0(T;ωm0 )); ̂
Q` ∶=
1
M`
M`
∑
m`=1
(g(X`(T;ωm`
)) − g(X`−1(T;ωm`
))), 1 ≤ ` ≤ L
Compared to MC: MLMC reduces the variance of the deepest level using samples on
coarser (less expensive) levels. 3
7. Quadrature Methods
Naive Quadrature operator based on a Cartesian quadrature grid
E[g] = ∫
Rd
g(x)ρ(x)dx ≈
d
⊗
k=1
QNk
k [g] ∶=
N1
∑
i1=1
⋯
Nd
∑
id=1
wi1 ⋯wid
g (xi1 ,...,xid
)
" Caveat: Curse of dimension: i.e., total number of quadrature
points N = ∏d
k=1 Nk.
Solution:
1 Sparsification of the grid points to reduce computational work.
2 Dimension-adaptivity to detect important dimensions of the
integrand.
Notation:
{xik
,wik
}Nk
ik=1 are respectively the sets of quadrature points and
corresponding quadrature weights for the kth dimension, 1 ≤ k ≤ d.
QNk
k [.]: the univariate quadrature operator for the kth dimension.
4
8. Hierarchical Sparse grids: Construction
(Bungartz and Griebel 2004)
Let β = [β1,...,βd] ∈ Nd
+, m(β) ∶ N+ → N+ an increasing function,
1 1D quadrature operators: Q
m(βk)
k on m(βk) points, 1 ≤ k ≤ d.
2 Detail operator: ∆
m(βk)
k = Q
m(βk)
k − Q
m(βk−1)
k , Q
m(0)
k = 0.
3 Hierarchical surplus: ∆m(β)
= ⊗d
k=1 ∆
m(βk)
k .
4 Hierarchical sparse grid approximation: on an index set I ⊂ Nd
QI
d [g] = ∑
β∈I
∆m(β)
[g] (2)
5
9. Grids Construction
Tensor Product (TP) approach: I ∶= I` = {∣∣ β ∣∣∞≤ `; β ∈ Nd
+}.
Regular sparse grids (SG): I ∶ I` = {∣∣ β ∣∣1≤ ` + d − 1; β ∈ Nd
+}
Adaptive sparse grids (ASG) (Gerstner and Griebel 2003):
Adaptive and a posteriori construction of I = IASGQ
by profit rule
IASGQ
= {β ∈ Nd
+ ∶ Pβ ≥ T}, with Pβ =
∣∆Eβ∣
∆Wβ
:
▸ ∆Eβ = ∣Q
I∪{β}
d [g] − QI
d [g]∣;
▸ ∆Wβ = Work[Q
I∪{β}
d [g]] − Work[QI
d [g]].
Figure 1.1: 2-D Illustration (Chen 2018): Admissible index sets I (top) and
corresponding quadrature points (bottom). Left: TP; middle: SG; right: ASG .
6
10. Table 1: Complexity comparison of the different methods for approximating
E[g(X(T))] within a pre-selected error tolerance, TOL. Given the same initial
problem, and using a weak order one scheme, E.g, the Euler-Maruyama scheme.
Method General Complexity Optimal Complexity
MC O (TOL−3
) O (TOL−3
)
MLMC O (TOL−3+β
), 1
2
≤ β ≤ 1
O (TOL−2
)
Quasi-MC (QMC) O (TOL−1− 2
1+2δ ), 0 ≤ δ ≤ 1
2
O (TOL−2
)
ASGQ O (TOL−1− 2
p ), p > 0
O (TOL−1
)
Sufficient Regularity Conditions for Optimal Complexity:
▸ MLMC (Cliffe, Giles, Scheichl, and Teckentrup 2011; Giles 2015):
g is Lipschtiz ⇒ canonical complexity: O (TOL−2
).
▸ QMC (Dick, Kuo, and Sloan 2013):
1 g belongs to the d-dimensional weighted Sobolev space of functions
with square-integrable mixed (partial) first derivatives.
2 High anisotropy between the different dimensions.
▸ ASGQ (Chen 2018; Ernst, Sprungk, and Tamellini 2018):
p is related to the order of bounded weighted mixed (partial)
derivatives of g and the anisotropy between the different dimensions.
⇒ ASGQ Complexity: O (TOL−1− 2
p ) (O (TOL−1
) when p ≫ 1).
11. Our Proposed Solutions/Strategies
to Recover Optimal Complexities
1 Extracting the available hidden regularity in the problem:
▸ Bias-free mollification (Bayer, Ben Hammouda, and Tempone 2020)
by taking conditional expectations over subset of integration
variables in the context of rough stochastic volatility models.
/ Good choice of subset of integration variables not always trivial.
▸ Mapping the problem to frequency/Fourier space + parametric
smoothing (Bayer, Ben Hammouda, Papapantoleon, Samet, and
Tempone 2022).
" Fourier transform of the density function available/cheap to
compute.
▸ Numerical smoothing (Bayer, Ben Hammouda, and Tempone 2023;
Bayer, Hammouda, and Tempone 2022): Topic of this talk.
2 Dimension reduction techniques based on Richardson
extrapolation on the weak error.
3 Reducing the discontinuity dimension and creating
anisotropy between the different dimensions using Brownian
bridge/Haar wavelet construction.
8
12. 1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
13. Numerical Smoothing Steps
Motivating Example:
E[g(X
∆t
T )] =?
g ∶ Rd
→ R nonsmooth function: (E.g., g(x) = 1(φ(x)≥0) or δ(φ(x)=0))
X
∆t
T (∆t = T
N ) Euler discretization of d-dimensional SDE , E.g.,
dX
(i)
t = ai(Xt)dt + ∑d
j=1 bij(Xt)dW
(j)
t ,
where {W(j)
}d
j=1 are standard Brownian motions.
X
∆t
T = X
∆t
T (∆W
(1)
1 ,...,∆W
(1)
N ,...,∆W
(d)
1 ,...,∆W
(d)
N )
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
∶=∆W
.
The discontinuity is in (N × d)-dimensional space characterised by
φ(X
∆t
(T)) = 0.
9
14. Numerical Smoothing Steps
1 Identify hierarchical representation of integration variables ⇒
locate the discontinuity in a smaller dimensional space
(a) X
∆t
T (∆W) ≡ X
∆t
T (Z), Z = (Zi)dN
i=1 ∼ N(0,IdN ):
s.t. “Z1 ∶= (Z
(1)
1 ,...,Z
(d)
1 ) (coarse rdvs) substantially contribute
even for ∆t → 0”, through hierarchical path generation (Brownian
bridges / Haar wavelet construction)
⇒ Discontinuity in d-dimensional space instead of
(N × d)-dimensions.
Haar wavelet construction in one dimension
For i.i.d. standard normal rdvs Z1, Zn,k, n ∈ N0, k = 0,...,2n
− 1, we define
the (truncated) standard Brownian motion
WN
t ∶= Z1Ψ−1(t) +
N
∑
n=0
2n
−1
∑
k=0
Zn,kΨn,k(t).
with Ψ−1(⋅) and Ψn,k(⋅) are the antiderivatives of the Haar basis functions.
15. Numerical Smoothing Steps
1 Identify hierarchical representation of integration variables ⇒
locate the discontinuity in a smaller dimensional space
(b) Linear mapping using A: rotation matrix whose structure depends
on the function g.
Y = AZ1.
E.g., for an arithmetic basket option with payoff
g(x) = max(∑
d
i=1 cixi(T) − K,0), a suitable A is a rotation matrix,
with the first row leading to Y1 = ∑
d
i=1 Z
(i)
1 up to rescaling without
any constraint for the remaining rows (Gram-Schmidt procedure).
⇒ Discontinuity in 1-dimensional space instead of d-dimensions.
⇒ y∗
1 (y−1,z
(1)
−1 ,...,z
(d)
−1 ): the exact discontinuity location s.t
φ(X
∆t
(T)) = φ(X
∆t
(T;y∗
1 ;y−1,z
(1)
−1 ,...,z
(d)
−1 )) = 0. (3)
Notation
▸ x−j: vector of length d − 1 denoting all the variables other than xj
in x ∈ Rd
. 11
16. Numerical Smoothing Steps
2
E[g(X(T))] ≈ E[g(X
∆t
(T))]
= ∫
Rd×N
G(z)ρd×N (z)dz
(1)
1 ...dz
(1)
N ...dz
(d)
1 ...dz
(d)
N
= ∫
RdN−1
I(y−1,z
(1)
−1 ,...,z
(d)
−1 )ρd−1(y−1)dy−1ρdN−d(z
(1)
−1 ,...,z
(d)
−1 )dz
(1)
−1 ...dz
(d)
−1
= E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] ≈ E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )], (4)
3
I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) = ∫
R
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1
= ∫
y∗
1
−∞
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1 + ∫
+∞
y∗
1
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1
≈ I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) ∶=
Mlag
∑
k=0
ηkG(ζk (y∗
1),y−1,z
(1)
−1 ,...,z
(d)
−1 ), (5)
4 Compute the remaining (dN − 1)-integral in (4) by MLMC or QMC or ASGQ.
Notation
G maps N × d Gaussian random inputs to g(X
∆t
(T));
y∗
1 (y−1,z
(1)
−1 ,...,z
(d)
−1 ): the exact discontinuity location (see (3))
y∗
1(y−1,z
(1)
−1 ,...,z
(d)
−1 ): the approximated discontinuity location via root finding.
MLag: number of Laguerre quadrature points ζk ∈ R, and weights ηk;
ρd×N (z) = 1
(2π)d×N/2 e−1
2
zT z
.
12
17. Extending Numerical Smoothing for
Density Estimation
Goal: Approximate the density ρX at u, for a stochastic process X
ρX(u) = E[δ(X − u)], δ is the Dirac delta function.
" Without any smoothing techniques (regularization, kernel
density,. . . ) MC and MLMC fail due to the infinite variance caused
by the Dirac distribution function, δ(⋅).
Strategy in (Bayer, Hammouda, and Tempone 2022): Conditioning
with respect to Z−1
ρX(u) =
1
√
2π
E[exp(−(Y ∗
1 (u))
2
/2)∣
dY ∗
1
dx
(u)∣]
≈
1
√
2π
E
⎡
⎢
⎢
⎢
⎣
exp(−(Y
∗
1(u))
2
/2)
R
R
R
R
R
R
R
R
R
R
R
dY
∗
1
dx
(u)
R
R
R
R
R
R
R
R
R
R
R
⎤
⎥
⎥
⎥
⎦
, (6)
Y ∗
1 (x;Z−1): the exact singularity; Y
∗
1(x;Z−1): the approximated
singularity obtained by solving X
∆t
(T;Y
∗
(x),Z−1) = x.
18. Why not Kernel Density Techniques
in Multiple Dimensions?
Similar to approaches based on MLMC with parametric regularization (Giles,
Nagapetyan, and Ritter 2015) or QMC with kernel density techniques
(Ben Abdellah, L’Ecuyer, Owen, and Puchhammer 2021; L’Ecuyer, Puchhammer,
and Ben Abdellah 2022).
This class of approaches has a pointwise error that increases exponentially with
respect to the dimension of the state vector X.
For a d-dimensional problem, a kernel density estimator with a bandwidth matrix,
H = diag(h,...,h)
MSE ≈ c1M−1
h−d
+ c2h4
. (7)
M is the number of samples, and c1 and c2 are constants.
Our approach in high dimension: For u ∈ Rd
ρX(u) = E[δ(X − u)] = E[ρd (Y∗
(u))∣det(J(u))∣]
≈ E[ρd (Y
∗
(u))∣det(J(u))∣], (8)
where
▸ Y∗
(u;⋅): the exact discontinuity; Y
∗
(u;⋅): the approximated discontinuity.
▸ J is the Jacobian matrix, with Jij =
∂y∗
i
∂uj
; ρd(⋅) is the multivariate Gaussian density.
Exact conditioning with respect to the remaining Brownian bridge noise ⇒ the
smoothing error in our approach is insensitive to the dimension of the problem.
14
19. 1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
20. Multilevel Monte Carlo with Numerical Smoothing:
Estimator and Notation
Recall
▸
E[g(X(T))] ≈ E[g(X
∆t
(T))] = E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] ≈ E[I(Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )],
▸
I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) = ∫
R
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1
= ∫
y∗
1
−∞
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1 + ∫
+∞
y∗
1
G(y1,y−1,z
(1)
−1 ,...,z
(d)
−1 )ρ1(y1)dy1
≈ I(y−1,z
(1)
−1 ,...,z
(d)
−1 ) ∶=
Mlag
∑
k=0
ηkG(ζk (y∗
1),y−1,z
(1)
−1 ,...,z
(d)
−1 ),
where y∗
1(y−1,z
(1)
−1 ,...,z
(d)
−1 ): the approximated discontinuity location via root finding.
I`:= I`(y`
−1,z
(1),`
−1 ,...,z
(d),`
−1 ): the level ` approximation of I in ̂
QMLMC
, computed
with step size ∆t`; MLag,` Laguerre points; TOLNewton,` as the tolerance of Newton
at level `.
̂
QMLMC
∶=
L
∑
`=L0
̂
Q`,
with
̂
QL0 ∶=
1
ML0
ML0
∑
mL0
=1
IL0,[mL0
]; ̂
Q` ∶=
1
M`
M`
∑
m`=1
(I`,[m`] − I`−1,[m`]), L0 + 1 ≤ ` ≤ L, (9)
21. Notation
X`(T) ∶= X`(T;(Z`
1,Z`
−1)).
For convenience, we denote X`(T) by X
N`
T and X
N`
k are the
Euler–Maruyama increments of X
N`
T for 0 ≤ k ≤ N` with
X
N`
T = X
N`
N`
.
Assumption 3.1
There are positive rdvs Cp with finite moments of all orders such that
∀N` ∈ N, ∀k1,...,kp ∈ {0,...,N` − 1} ∶
R
R
R
R
R
R
R
R
R
R
R
R
R
∂p
X
N`
T
∂X
N`
k1
⋯∂X
N`
kp
R
R
R
R
R
R
R
R
R
R
R
R
R
≤ Cp a.s.
Assumption 3.1 is fulfilled if the drift and diffusion coefficients are
smooth.
16
22. Assumption 3.2
For p ∈ N, there are positive rdvs Dp with finite moments of all orders
such that a
⎛
⎝
∂X
N`
T
∂y
(Z`
1,Z`
−1)
⎞
⎠
−p
≤ Cp a.s.
a
y ∶= z−1
In (Bayer, Ben Hammouda, and Tempone 2023), we show
sufficient conditions where this assumption is valid.
For instance, Assumption 3.2 is valid for
▸ one-dimensional SDEs with a linear or constant diffusion.
▸ multivariate SDEs with a linear drift and constant diffusion,
including the multivariate lognormal model (see (Bayer,
Siebenmorgen, and Tempone 2018)).
23. Multilevel Monte Carlo with Numerical Smoothing:
Variance Decay, Complexity and Robustness
Let g(x) = 1(φ(x)≥0) or δ (φ(x) = 0)
Theorem 3.3 ((Bayer, Hammouda, and Tempone 2022))
Under Assmuptions 3.1 and 3.2 and further regularity assumptions for the drift and
diffusion, using Euler–Maruyama, V` ∶= Var[I` − I`−1] = O (∆t1
` ), compared with
O (∆t
1/2
` ) for MLMC without smoothing.
" General MLMC Complexity: O (TOL
−2−max(0,γ−β
α
)
log (TOL)2×1{β=γ}
),
where α: weak rate; β: variance decay rate; γ : work growth rate.
Corollary 3.4 ((Bayer, Hammouda, and Tempone 2022))
Under Assmuptions 3.1 and 3.2 and further regularity assumptions for the drift and
diffusion, the complexity of MLMC combined with numerical smoothing is O (TOL−2
) up
to log terms, compared with O (TOL−2.5
) for MLMC without smoothing.
" Milstein scheme: we show that we obtain the canonical complexity (O (TOL−2
)).
Corollary 3.5 ((Bayer, Hammouda, and Tempone 2022))
Let κ` be the kurtosis of the random variable I` − I`−1, then under assmuptions 3.1 and
3.2 and further regularity assumptions for the drift and diffusion, we obtain κ` = O (1)
compared with O (∆t
−1/2
` ) for MLMC without smoothing.
24. Sketch of the Proof of Theorem 3.3: Notation and Goal
Notation
X`,X`−1: the coupled paths of the approximate process X, simulated with time
step sizes ∆t` and ∆t`−1, respectively.
W` and B`: coupling Wiener and related Brownian bridge processes at levels `
and ` − 1, respectively.
For t ∈ [0,T], e`(t;Y,B`) is defined as
(X` − X`−1)(t) = ∫
t
0
(a(X`(s)) − a(X`−1(s)))ds + ∫
t
0
(b(X`(s)) − b(X`−1(s)))dW`(s)
= ∫
t
0
(a(X`(s)) − a(X`−1(s)))ds + ∫
t
0
(b(X`(s)) − b(X`−1(s)))
Y
√
T
ds
+ ∫
t
0
(b(X`(s)) − b(X`−1(s)))dB`(s)
=∶ e`(t;Y,B`),
where a(X(s)) = a(X(tn)), b(X(s)) = b(X(tn)), for tn ≤ s < tn+1, on the time
grid 0 = t0 < t1 < ... < tN = T.
g̃δ: a C∞
mollified version of g, with δ > 0,
Goal: We want to show V` ∶= Var[I` − I`−1] ≤ E[(I` − I`−1)
2
] = O (∆t`).
19
25. Sketch of the Proof of Theorem 3.3: Step 1
For Euler–Maruyama scheme and p ≥ 1,
Under Global Lipschitzity of drift and diffusion coefficients
Assumption, we have (Kloeden and Platen 1992)
E[e2p
` (T)] = O (∆tp
` ). (10)
In (Bayer, Hammouda, and Tempone 2022), we prove (using the
Grönwall, Hölder, Burkholder-Davis-Gundy and Jensen
inequalities) that
E[(∂ye`)2p
(T)] = O (∆tp
` ). (11)
20
26. Sketch of the Proof of Theorem 3.3: Step 2
For δ > 0, we have
∆Iδ
` (B`) ∶= (I
δ
` − I
δ
`−1)(B`) ∶= ∫
R
(g̃δ(X`(T;y,B`)) − g̃δ(X`−1(T;y,B`)))ρ1(y)dy
= ∫
R
⎡
⎢
⎢
⎢
⎢
⎢
⎢
⎢
⎣
∫
1
0
g̃′
δ
⎛
⎜
⎜
⎜
⎜
⎝
X`−1(T;y,B`) + θe`(T;y,B`)
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
∶=z(θ;y,B`)
⎞
⎟
⎟
⎟
⎟
⎠
dθ
⎤
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
e`(T;y,B`) ρ1(y)dy, θ ∈ (0,1) (using Mean value theorem)
= ∫
1
0
[∫
R
∂yg̃δ(z(θ;y,B`))(∂yz(θ;y,B`))
−1
e`(T;y,B`) ρ1(y)dy]dθ (using ∂yg̃δ = g̃′
δ∂yz and Fubini’s theorem)
= −∫
1
0
[∫
R
e`(T;y,B`)g̃δ(z(θ;y,B`))(∂y ((∂yz(θ;y,B`))
−1
) − y (∂yz(θ;y,B`))
−1
)ρ1(y)dy]dθ
− ∫
1
0
[∫
R
∂ye`(T;y,B`)g̃δ(z(θ;y,B`))(∂yz(θ;y,B`))
−1
ρ1(y)dy]dθ. (12)
(Integration by parts and we prove that the boundary terms vanish to get the last terms)
Taking δ → 0 and applying the dominated convergence theorem to
(12):
∆I`(B`) ∶= (I` − I`−1)(B`)
= −∫
1
0
[∫
R
e`(T;y,B`)g(z(θ;y,B`))(∂y ((∂yz(θ;y,B`))
−1
) − y (∂yz(θ;y,B`))
−1
)ρ1(y)dy]dθ
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
(I)
−∫
1
0
[∫
R
∂ye`(T;y,B`)g(z(θ;y,B`))(∂yz(θ;y,B`))
−1
ρ1(y)dy]dθ
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
(II)
.
(13)
21
27. Sketch of the Proof of Theorem 3.3: Step 3
For the term (I) in (13), taking expectation with respect to the Brownian
bridge and using Hölder’s inequality twice (p,q,p1,q1 ∈ (1,+∞), 1
p + 1
q = 1
and 1
p1
+ 1
q1
= 1), result in
E [(I)
2
] ≤ (E [∣∣g(z(⋅;⋅,B`))(∂y ((∂yz(⋅;⋅,B`))
−1
) − Y (∂yz(⋅;⋅,B`))
−1
)∣∣
2q1
Lq
ρ1
([0,1]×R)
])
1/q1
× (E [∣∣e`(T;⋅,B`)∣∣
2p1
Lp
ρ1
(R)])
1/p1
= O (∆t`). (14)
Choosing p and p1 such that 2p1
p ≤ 1, and applying Jensen’s inequality:
(E [∣∣e`(T;⋅,B`)∣∣
2p1
Lp
ρ1
(R)])
1/p1
=
⎛
⎝
E
⎡
⎢
⎢
⎢
⎢
⎣
(∫
R
∣ep
` (T;y,B`)∣ρ1dy)
2p1
p
⎤
⎥
⎥
⎥
⎥
⎦
⎞
⎠
1/p1
≤ (E [∫
R
∣ep
` (T;y,B`)∣ρ1dy])
2
p
= O (∆t`) (using Fubini’s theorem and (10)). (15)
Using Assumptions 3.2 and 3.1, we show that
(E [∣∣g(z(⋅;⋅,B`))(∂y ((∂yz(⋅;⋅,B`))−1
) − Y (∂yz(⋅;⋅,B`))−1
)∣∣
2q1
Lq
ρ1
([0,1]×R)
])
1/q1
< ∞,
22
28. Sketch of the Proof of Theorem 3.3: Step 4
For the term (II) in (13), we redo same steps as for term (I)
E [(II)
2
] ≤ (E [∣∣g(z(⋅;⋅,B`))(∂yz(⋅;⋅,B`))
−1
∣∣
2q1
Lq
ρ1
([0,1]×R)
])
1/q1
× (E [∣∣∂ye`(T;⋅,B`)∣∣
2p1
Lp
ρ1
(R)])
1/p1
= O (∆t`) (16)
Using (11), we show (EB`
[∣∣∂ye`(T;⋅,B`)∣∣2p1
Lp
ρ1
(R)
])
1/p1
= O (∆t`).
Using Assumptions 3.2 and 3.1, we show that
(E [∣∣g(z(⋅;⋅,B`))(∂yz(⋅;⋅,B`))−1
∣∣
2q1
Lq
ρ1
([0,1]×R)
])
1/q1
< ∞,
23
29. Error Discussion for MLMC
̂
QMLMC
: the MLMC estimator
E[g(X(T)] − ̂
QMLMC
= E[g(X(T))] − E[g(X
∆tL
(T))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error I: bias or weak error of O(∆tL)
+ E[IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error II: numerical smoothing error of O(M
−s/2
Lag,L
)+O(TOLNewton,L)
+ E[IL (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − ̂
QMLMC
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error III: MLMC statistical error of O
⎛
⎝
√
∑L
`=L0
√
MLag,`+log(TOL−1
Newton,`
)
⎞
⎠
Notations
y∗
1: the approximated location of the non smoothness obtained by Newton
iteration ⇒ ∣y∗
1 − y∗
1∣ = TOLNewton
MLag is the number of points used by the Laguerre quadrature for the one
dimensional pre-integration step.
s > 0: For the parts of the domain separated by the discontinuity location,
derivatives of G with respect to y1 are bounded up to order s. 24
30. MLMC for Probability in the GBM model: Euler
0 2 4 6 8
-10
-8
-6
-4
-2
0 2 4 6 8
-15
-10
-5
0
0 2 4 6 8
2
4
6
8
10
0 2 4 6 8
102
103
kurtosis
(a) without numerical smoothing.
0 2 4 6 8
-20
-18
-16
-14
-12
-10
-8
0 2 4 6 8
-15
-10
-5
0
0 2 4 6 8
2
4
6
8
10
0 2 4 6 8
3
4
5
6
kurtosis
(b) With numerical smoothing.
Figure 3.1: MLMC with Euler–Maruyama scheme for probability computation
under the geometric Brownian motion (GBM): Variance, cost, L1
-distance
and kurtosis per level. P`: the numerical approximation of the QoI at level `.
25
31. MLMC for Probability in the GBM Model: Milstein
0 2 4 6 8
-15
-10
-5
0 2 4 6 8
-25
-20
-15
-10
-5
0
0 2 4 6 8
2
4
6
8
10
0 2 4 6 8
103
104
105
kurtosis
(a) without numerical smoothing.
0 2 4 6 8
-35
-30
-25
-20
0 2 4 6 8
-25
-20
-15
-10
-5
0
0 2 4 6 8
2
4
6
8
10
0 2 4 6 8
101
kurtosis
(b) With numerical smoothing.
Figure 3.2: MLMC with Milstein scheme for probability computation under
the geometric Brownian motion (GBM): Variance, cost, L1
-distance and
kurtosis per level. P`: the numerical approximation of the QoI at level `.
26
32. Probability Computation under the GBM Model:
Numerical Complexity Comparison
10-4
10-3
10-2
10-1
TOL
10-6
10-4
10-2
100
102
104
106
E[W]
MLMC without smoothing (Euler)
TOL-2.5
MLMC without smoothing (Milstein)
MLMC+ Numerical Smoothing (Euler)
TOL-2
log(TOL)2
MLMC+ Numerical Smoothing (Milstein)
TOL-2
Figure 3.3: Probability Computation under GBM: Comparison of the
numerical complexity of the different MLMC estimators.
27
33. 1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
34. Notations
The Haar basis functions, ψn,k, of L2
([0,1]) with support
[2−n
k,2−n
(k + 1)]:
ψ−1(t) ∶= 1[0,1](t); ψn,k(t) ∶= 2n/2
ψ (2n
t − k), n ∈ N0, k = 0,...,2n
− 1,
where ψ(⋅) is the Haar mother wavelet:
ψ(t) ∶=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
1, 0 ≤ t < 1
2
,
−1, 1
2
≤ t < 1,
0, else,
Grid Dn ∶= {tn
` ∣ ` = 0,...,2n+1
} with tn
` ∶= `
2n+1 T.
For i.i.d. standard normal rdvs Z−1, Zn,k, n ∈ N0, k = 0,...,2n
− 1,
we define the (truncated) standard Brownian motion
WN
t ∶= Z−1Ψ−1(t) +
N
∑
n=0
2n
−1
∑
k=0
Zn,kΨn,k(t).
with Ψ−1(⋅) and Ψn,k(⋅) are the antiderivatives of the Haar basis
functions. 28
35. The Smoothness Theorem
We define the deterministic function HN
∶ R2N+1−1
→ R as
HN
(zN
) ∶= EZ−1
[g (XN
T (Z−1,zN
))], (17)
where ZN
∶= (Zn,k)n=0,...,N, k=0,...2n−1.
Theorem 4.1 ((Bayer, Ben Hammouda, and Tempone 2023))
Given Assumptions 3.1 and 3.2. Then, for any p ∈ N and indices
n1,...,np and k1,...,kp (satisfying 0 ≤ kj < 2nj ), the function HN
defined in (17) satisfies the following (with constants independent of
nj,kj)a
∂p
HN
∂zn1,k1 ⋯∂znp,kp
(zN
) = Ô
P (2− ∑
p
j=1 nj/2
).
In particular, HN
is of class C∞
.
a
The constants increase in p and N.
36. Sketch of the Proof
1 We consider a mollified version gδ of g and the corresponding function
HN
δ (defined by replacing g with gδ in (17)).
2 We prove that we can interchange the integration and differentiation,
which implies
∂HN
δ (zN
)
∂zn,k
= E [g′
δ (XN
T (Z−1,zN
))
∂XN
T (Z−1,zN
)
∂zn,k
].
3 Multiplying and dividing by
∂XN
T (Z−1,zN )
∂y (where y ∶= z−1) and
replacing the expectation by an integral w.r.t. the standard normal
density, we obtain
∂HN
δ (zN
)
∂zn,k
= ∫
R
∂gδ (XN
T (y,zN
))
∂y
(
∂XN
T
∂y
(y,zN
))
−1
∂XN
T
∂zn,k
(y,zN
)
1
√
2π
e− y2
2 dy.
(18)
4 We show that integration by parts is possible, and then we can discard
the mollified version to obtain the smoothness of HN
because
∂HN
(zN
)
∂zn,k
= −∫
R
g (XN
T (y,zN
))
∂
∂y
⎡
⎢
⎢
⎢
⎢
⎣
(
∂XN
T
∂y
(y,zN
))
−1
∂XN
T
∂zn,k
(y,zN
)
1
√
2π
e− y2
2
⎤
⎥
⎥
⎥
⎥
⎦
dy.
5 The proof relies on successively applying the technique above of
dividing by
∂XN
T
∂y and then integrating by parts.
30
37. Error Discussion for ASGQ
QASGQ
N : the ASGQ estimator
E[g(X(T)] − QASGQ
N = E[g(X(T))] − E[g(X
∆t
(T))]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error I: bias or weak error of O(∆t)
+ E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )]
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error II: numerical smoothing error of O(M
−s/2
Lag
)+O(TOLNewton)
+ E[I (Y−1,Z
(1)
−1 ,...,Z
(d)
−1 )] − QASGQ
N
´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶
Error III: ASGQ error of O(M
−p/2
ASGQ
)
, (19)
Notations
MASGQ: the number of quadrature points used by the ASGQ estimator, and p > 0.
y∗
1: the approximated location of the non smoothness obtained by Newton iteration
⇒ ∣y∗
1 − y∗
1∣ = TOLNewton
MLag is the number of points used by the Laguerre quadrature for the one
dimensional pre-integration step.
s > 0: For the parts of the domain separated by the discontinuity location,
derivatives of G with respect to y1 are bounded up to order s.
31
38. Work and Complexity Discussion for ASGQ
An optimal performance of ASGQ is given by
⎧
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎩
min
(MASGQ,MLag,TOLNewton)
WorkASGQ ∝ MASGQ × MLag × ∆t−1
s.t. Etotal,ASGQ = TOL.
(20)
Etotal, ASGQ ∶= E[g(X(T)] − QASGQ
N
= O (∆t) + O (M
−p/2
ASGQ) + O (M
−s/2
Lag ) + O (TOLNewton).
We show in (Bayer, Ben Hammouda, and Tempone 2023) that
under certain conditions for the regularity parameters s and p
(p,s ≫ 1)
▸ WorkASGQ = O (TOL−1
) (for the best case) compared to
WorkMC = O (TOL−3
) (for the best case of MC).
32
39. ASGQ Quadrature Error Convergence
101
102
103
104
MASGQ
10-3
10
-2
10
-1
100
101
Relative
Quadrature
Error
ASGQ without smoothing
ASGQ with numerical smoothing
(a)
101
102
103
104
MASGQ
10-3
10
-2
10-1
100
101
Relative
Quadrature
Error
ASGQ without smoothing
ASGQ with numerical smoothing
(b)
Figure 4.1: Digital option under the Heston model (dimension=16):
Comparison of the relative quadrature error convergence for the ASGQ
method with/out numerical smoothing. (a) Without Richardson
extrapolation (N = 8), (b) with Richardson extrapolation (Nfine level = 8).
33
40. Comparing ASGQ with MC
In (Bayer, Ben Hammouda, and Tempone 2023):
For the different considered examples under Heston or discretized GBM models:
ASGQ with numerical smoothing 10 - 100× faster in dim. around 20 than MC.
10
-3
10
-2
10
-1
Total Relative Error
10
-1
10
0
10
1
10
2
103
Computational
Work
MC (without smoothing, without Richardson extrapolation)
MC (without smoothing, with Richardson extrapolation)
ASGQ (with smoothing, without Richardson extrapolation)
ASGQ (with smoothing, with Richardson extrapolation)
Figure 4.2: Digital option under Heston: Computational work comparison for ASGQ with
numerical smoothing and MC with the different configurations in terms of the level of
Richardson extrapolation.
34
41. 1 Framework and Motivation
2 The Numerical Smoothing
3 Analysis of Multilevel Monte Carlo with Numerical Smoothing
4 Analysis of Adaptive Sparse Grids Quadrature with Numerical
Smoothing
5 Conclusions
42. Conclusions
Contributions in the context deterministic quadrature methods
1 A novel numerical smoothing technique, combined with ASGQ/QMC,
Hierarchical Brownian Bridge and Richardson extrapolation.
2 Our approach outperforms substantially the MC method for moderate/high
dimensions and for dynamics where discretization is needed.
3 We provide a regularity analysis for the smoothed integrand in the time stepping
setting, and a related error discussion of our approach.
Contributions in the context MLMC methods
1 The numerical smoothing approach is adapted to the MLMC context for efficient
probability computation, univariate/multivariate density estimation, and option
pricing.
2 Compared to the case without smoothing
☀ We significantly reduce the kurtosis at the deep levels of MLMC which improves
the robustness of the estimator.
☀ We improve the strong convergence rate (we recover the MLMC complexities
obtained for Lipschitz functionals) ⇒ improvement of MLMC complexity from
O (TOL−2.5
) to O (TOL−2
).
3 When estimating densities: Compared to the smoothing strategies based on
MLMC with parametric regularization as in (Giles, Nagapetyan, and Ritter 2015)
or QMC with kernel density techniques as in (Ben Abdellah, L’Ecuyer, Owen, and
Puchhammer 2021), the error of our approach does not increase exponentially
with respect to the dimension of state vector
35
43. Future Work and Extensions
1 Extend our techniques to efficiently compute
▸ Sensitivities (Financial Greeks): ∂
∂α
E[f(ω,α)].
▸ Risk quantities ⇒ nested expectations problems
E [g (E [f(X,Y )∣X])].
▸ Computing nonsmooth quantities (such as probabilities) of a
functional of a solution arising from a random PDE.
2 Combine the numerical smoothing technique with multilevel QMC
to profit from the good features of QMC and MLMC.
3 Combine the numerical smoothing technique with importance
sampling ideas in (Hammouda, Rached, and Tempone 2020;
Hammouda, Rached, Tempone, and Wiechert 2021) to improve
MLMC efficiency for rare events probabilities.
36
44. Related References
Thank you for your attention!
[1] C. Bayer, C. Ben Hammouda, R. Tempone. Numerical Smoothing with
Hierarchical Adaptive Sparse Grids and Quasi-Monte Carlo Methods for
Efficient Option Pricing. Quantitative Finance 23, no. 2 (2023): 209-227.
[2] C. Bayer, C. Ben Hammouda, R. Tempone. Multilevel Monte Carlo with
Numerical Smoothing for Robust and Efficient Computation of Probabilities
and Densities, arXiv:2003.05708 (2022).
[3] C. Bayer, C. Ben Hammouda, A. Papapantoleon, M. Samet, R. Tempone.
Optimal Damping with Hierarchical Adaptive Quadrature for Efficient
Fourier Pricing of Multi-Asset Options in Lévy Models, arXiv:2203.08196
(2022).
[4] C. Bayer, C. Ben Hammouda, R. Tempone. Hierarchical adaptive sparse
grids and quasi-Monte Carlo for option pricing under the rough Bergomi
model, Quantitative Finance, 2020.
37
45. References I
[1] Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Hierarchical adaptive sparse grids and quasi-Monte Carlo for
option pricing under the rough Bergomi model”. In: Quantitative
Finance 20.9 (2020), pp. 1457–1473.
[2] Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Numerical smoothing with hierarchical adaptive sparse grids
and quasi-Monte Carlo methods for efficient option pricing”. In:
Quantitative Finance 23.2 (2023), pp. 209–227.
[3] Christian Bayer, Chiheb Ben Hammouda, and Raúl Tempone.
“Multilevel Monte Carlo with Numerical Smoothing for Robust
and Efficient Computation of Probabilities and Densities”. In:
arXiv preprint arXiv:2003.05708 (2022).
[4] Christian Bayer, Markus Siebenmorgen, and Rául Tempone.
“Smoothing the payoff for efficient computation of basket option
pricing.”. In: Quantitative Finance 18.3 (2018), pp. 491–505.
38
46. References II
[5] Christian Bayer et al. “Optimal Damping with Hierarchical
Adaptive Quadrature for Efficient Fourier Pricing of Multi-Asset
Options in Lévy Models”. In: arXiv preprint arXiv:2203.08196
(2022).
[6] Amal Ben Abdellah et al. “Density estimation by randomized
quasi-Monte Carlo”. In: SIAM/ASA Journal on Uncertainty
Quantification 9.1 (2021), pp. 280–301.
[7] Hans-Joachim Bungartz and Michael Griebel. “Sparse grids”. In:
Acta numerica 13 (2004), pp. 147–269.
[8] Peng Chen. “Sparse quadrature for high-dimensional integration
with Gaussian measure”. In: ESAIM: Mathematical Modelling
and Numerical Analysis 52.2 (2018), pp. 631–657.
39
47. References III
[9] K Andrew Cliffe et al. “Multilevel Monte Carlo methods and
applications to elliptic PDEs with random coefficients”. In:
Computing and Visualization in Science 14.1 (2011), p. 3.
[10] Josef Dick, Frances Y Kuo, and Ian H Sloan. “High-dimensional
integration: the quasi-Monte Carlo way”. In: Acta Numerica 22
(2013), pp. 133–288.
[11] Oliver G Ernst, Bjorn Sprungk, and Lorenzo Tamellini.
“Convergence of sparse collocation for functions of countably
many Gaussian random variables (with application to elliptic
PDEs)”. In: SIAM Journal on Numerical Analysis 56.2 (2018),
pp. 877–905.
[12] Thomas Gerstner and Michael Griebel. “Dimension–adaptive
tensor–product quadrature”. In: Computing 71 (2003), pp. 65–87.
40
48. References IV
[13] Michael B Giles. “Multilevel Monte Carlo methods”. In: Acta
Numerica 24 (2015), pp. 259–328.
[14] Michael B Giles. “Multilevel Monte Carlo path simulation”. In:
Operations Research 56.3 (2008), pp. 607–617.
[15] Michael B Giles, Tigran Nagapetyan, and Klaus Ritter.
“Multilevel Monte Carlo approximation of distribution functions
and densities”. In: SIAM/ASA Journal on Uncertainty
Quantification 3.1 (2015), pp. 267–295.
[16] Chiheb Ben Hammouda, Nadhir Ben Rached, and Raúl Tempone.
“Importance sampling for a robust and efficient multilevel Monte
Carlo estimator for stochastic reaction networks”. In: Statistics
and Computing 30.6 (2020), pp. 1665–1689.
41
49. References V
[17] Chiheb Ben Hammouda et al. “Optimal Importance Sampling
via Stochastic Optimal Control for Stochastic Reaction
Networks”. In: arXiv preprint arXiv:2110.14335 (2021).
[18] Stefan Heinrich. “Multilevel monte carlo methods”. In:
International Conference on Large-Scale Scientific Computing.
Springer. 2001, pp. 58–67.
[19] Ahmed Kebaier et al. “Statistical Romberg extrapolation: a new
variance reduction method and applications to option pricing”.
In: The Annals of Applied Probability 15.4 (2005), pp. 2681–2705.
[20] Peter E Kloeden and Eckhard Platen. “Stochastic differential
equations”. In: Numerical solution of stochastic differential
equations. Springer, 1992, pp. 103–160.
42
50. References VI
[21] Pierre L’Ecuyer, Florian Puchhammer, and Amal Ben Abdellah.
“Monte Carlo and Quasi–Monte Carlo Density Estimation via
Conditioning”. In: INFORMS Journal on Computing (2022).
43