Mines April 2017 Colloquium

Think Like an Applied Mathematician
and a Statistician:
the Example of Cubature
Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology
hickernell@iit.edu mypages.iit.edu/~hickernell
Thanks to the Guaranteed Automatic Integration Library (GAIL) team and friends
Supported by NSF-DMS-1522687
Thanks for your kind invitation

Introduction IID Monte Carlo Low Discrepancy Bayesian Cubature Summary References
Amy (an applied mathematician) wants to compute
option price =
ż
Rd
payoﬀ(x)
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
PDF of Brownian motion at d times
dx
payoﬀ(x) = max
1
d
dÿ
k=1
Sk(xk) ´ K, 0 e´rT
Sk(xk) = S0e(r´σ2
/2)tk+σxk
= stock price at time tk = kT/d
d, S0, K, T, r, σ arbitrary, but known
Σ = (T/d) min(k, l)
d
k,l=1
2/23

Amy (an applied mathematician) wants to compute
option price =
ż
Rd
payoﬀ(x)
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
PDF of Brownian motion at d times
dx
payoﬀ(x) = max
1
d
dÿ
k=1
Sk(xk) ´ K, 0 e´rT
Sk(xk) = S0e(r´σ2
/2)tk+σxk
= stock price at time tk = kT/d
d, S0, K, T, r, σ arbitrary, but known
Σ = (T/d) min(k, l)
d
k,l=1
Sue (a statistician) wants to compute
Gaussian probability =
ż
[a,b]
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx
a, b, Σ arbitrary, but known
2/23

Amy Suggests Rectangular Grids & Product Rules
ż 1
0
f(x) dx ´
1
m
mÿ
i=1
f
2i ´ 1
2m
= O(m´2
), so
ż
[0,1]d
f(x) dx
´
1
2m
mÿ
i1=1
¨ ¨ ¨
mÿ
id=1
f
2i1 ´ 1
2m
, . . . ,
2id ´ 1
2m
= O(m´2
) = O(n´2/d
)
assuming partial derivatives of up to order 2 in each direction exist.
3/23

Amy Suggests Rectangular Grids & Product Rules
ż 1
0
f(x) dx ´
1
m
mÿ
i=1
f
2i ´ 1
2m
= O(m´2
), so
ż
[0,1]d
f(x) dx
´
1
2m
mÿ
i1=1
¨ ¨ ¨
mÿ
id=1
f
2i1 ´ 1
2m
, . . . ,
2id ´ 1
2m
= O(m´2
) = O(n´2/d
)
assuming partial derivatives of up to order 2 in each direction exist.
Computational cost is prohibitive for large dimensions, d:
d 1 2 5 10 100
m = 8, n = 8d
8 64 3.3E4 1.0E9 2.0E90
Product rules are a bad idea unless d is small.
3/23

Sue Reads the News
4/23

Sue Suggests IID Monte Carlo
µ = E[f(X)] =
ż
Rd
f(x) ρ(x) dx
« ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ρ
P[|µ ´ ^µn| ď errn] « 99%
for errn =
2.58 ˆ 1.2^σ
?
n
by the Central Limit Theorem (CLT),
where ^σ2
is the sample variance. But the CLT is only an asymptotic result, and
1.2^σ may be an overly optimistic upper bound on σ.
5/23

µ = E[f(X)] =
ż
Rd
f(x) ρ(x) dx
« ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ρ
P[|µ ´ ^µn| ď errn] « 99%
for errn =
2.58 ˆ 1.2^σ
?
n
where ^σ2
A Berry-Esseen Inequality, Cantelli’s Inequality, and an assumed upper bound on
the kurtosis can be used to provide a rigorous error bound (H. et al., 2013; Jiang,
2016). More
5/23

Sue’s Gaussian Probability
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
aﬃne
=
ż
[0,1]d
f(x) dx
For some typical choice of a, b, Σ, d = 3; µ « 0.6763
Rel. Error Median Worst 10% Worst 10%
Tolerance Method Error Accuracy n Time (s)
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
6/23

Amy Suggests a Variable Transformation
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
1E´2 IID Genz 4E´4 100% 8.1E4 1.9E´2
6/23

IID Monte Carlo Is Slow
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
1E´2 IID Genz 4E´4 100% 8.1E4 1.9E´2
1E´3 IID Genz 7E´5 100% 2.0E6 3.8E´1
6/23

µ = E[f(X)] =
ż
Rd
f(x) ρ(x) dx
« ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ρ
P[|µ ´ ^µn| ď errn] « 99%
for errn =
2.58 ˆ 1.2^σ
?
n
where ^σ2
A Berry-Esseen Inequality, Cantelli’s Inequality, and an assumed upper bound on
the kurtosis can be used to provide a rigorous error bound (H. et al., 2013; Jiang,
2016). More
7/23

Amy Suggests More Even Sampling than IID
µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
xi Sobol’ (Dick and Pillichshammer, 2010)
Normally n should be a power of 2
8/23

µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
Assume f P Hilbert space H with
reproducing kernel K (H., 1998)
µ(f) ´ ^µ(f) = xerr-rep, fy
= cos(err-rep, f) ˆ err-rep Hlooooomooooon
discrepancy
=O(n´1+
)
ˆ f H
err-rep
2
H =
ż
[0,1]2d
K(x, t) dxdt ´
2
n
nÿ
i=1
ż
[0,1]d
K(xi, t) dt +
1
n2
nÿ
i,j=1
K(xi, xj)
Adaptive stopping criteria developed (H. and Jiménez Rugama, 2016; Jiménez
Rugama and H., 2016; H. et al., 2017+). More
8/23

Sobol’ Sampling Converges Faster
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
1E´2 IID Genz 4E´4 100% 8.1E4 1.9E´2
1E´2 Sobol’ Genz 3E´4 100% 1.0E3 4.6E´3
1E´3 IID Genz 7E´5 100% 2.0E6 3.8E´1
1E´3 Sobol’ Genz 2E´4 100% 2.0E3 6.1E´3
9/23

Sue Randomizes Even Sampling
µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
xi scrambled Sobol’ (Owen, 1997a; 1997b)
E(^µ) = µ no bias
std(^µ) = n´1.5+
for scrambled Sobol’
10/23

Sue Randomizes Even Sampling
µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
xi scrambled Sobol’ (Owen, 1997a; 1997b)
or shifted lattice (Cranley and Patterson, 1976)
E(^µ) = µ no bias
std(^µ) = n´1+
for shifted lattices
10/23

Scrambled Sobol’ Is Better
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
1E´2 IID Genz 4E´4 100% 8.1E4 1.9E´2
1E´2 Sobol’ Genz 3E´4 100% 1.0E3 4.6E´3
1E´2 Scr. Sobol’ Genz 6E´5 100% 1.0E3 5.0E´3
1E´3 IID Genz 7E´5 100% 2.0E6 3.8E´1
1E´3 Sobol’ Genz 2E´4 100% 2.0E3 6.1E´3
11/23

Amy’s Asian Option Pricing
fair price =
ż
Rd
e´rT
max

1
d
dÿ
j=1
Sj ´ K, 0

 e´zT
z/2
(2π)d/2
dz « $13.12
Sj = S0e(r´σ2
/2)jT/d+σxj
= stock price at time jT/d,
x = Az, AAT
= Σ = min(i, j)T/d
d
i,j=1
, A =
a
T/d





1 0 0 ¨ ¨ ¨ 0
1 1 0 ¨ ¨ ¨ 0
...
...
...
1 1 1 ¨ ¨ ¨ 1





Abs. Error Median Worst 10% Worst 10%
1E´2 IID diﬀ 2E´3 100% 6.1E7 3.2E1
1E´2 Scr. Sobol’ diﬀ 3E´3 92% 6.6E4 1.2E´1
12/23

Sue Suggests Principal Component Analysis
fair price =
ż
Rd
e´rT
max

1
d
dÿ
j=1
Sj ´ K, 0

 e´zT
z/2
(2π)d/2
dz « $13.12
Sj = S0e(r´σ2
/2)jT/d+σxj
x = Az, AAT
= Σ = min(i, j)T/d
d
i,j=1
= VΛVT
, VT
V = I, A =
?
ΛV
1E´2 IID diﬀ 2E´3 100% 6.1E7 3.2E1
1E´2 Scr. Sobol’ PCA 1E´3 100% 1.6E4 3.7E´2
12/23

Sue Suggests Control Variates
fair price =
ż
Rd
e´rT
max

1
d
dÿ
j=1
Sj ´ K, 0

 e´zT
z/2
(2π)d/2
dz « $13.12
Sj = S0e(r´σ2
/2)jT/d+σxj
x = Az, AAT
= Σ = min(i, j)T/d
d
i,j=1
= VΛVT
, VT
V = I, A =
?
ΛV
1E´2 IID diff 2E´3 100% 6.1E7 3.2E1
1E´2 Scr. Sob. cont. var. PCA 2E´3 96% 4.1E3 1.9E´2
The coefficient of the control variate for low discrepancy sampling is different than
for IID Monte Carlo (H. et al., 2005; H. et al., 2017+)
12/23

Sue Hypothesizes Random f
µ =
ż
Rd
f(x) ρ(x) dx « ^µn =
nÿ
i=1
wi f(xi)
Assume f „ GP(0, s2
Cθ) (Diaconis, 1988;
O’Hagan, 1991; Ritter, 2000; Rasmussen and
Ghahramani, 2003)
c0 =
ż
RdˆRd
Cθ(x, t) ρ(x)ρ(t) dxdt
c =
ż
Rd
Cθ(xi, t) ρ(x) dx
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
Choosing w = wi
n
i=1
= C´1
c is optimal
µ ´ ^µn =
µ ´ ^µn
b
c0 ´ cTC´1c yTC´1y
nlooooooooooooooomooooooooooooooon
„N(0,1)
ˆ
a
c0 ´ cTC´1c ˆ
c
yTC´1y
n
where y = f(xi)
n
i=1
.
13/23

Amy Suggests Nice C
µ =
ż
Rd
f(x) ρ(x) dx « ^µn =
nÿ
i=1
wi f(xi)
Assume f „ GP(0, s2
Cθ) (Diaconis, 1988;
O’Hagan, 1991; Ritter, 2000; Rasmussen and
Ghahramani, 2003)
c0 =
ż
RdˆRd
Cθ(x, t) ρ(x)ρ(t) dxdt
c =
ż
Rd
Cθ(xi, t) ρ(x) dx
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
Choosing w = wi
n
i=1
= C´1
c is optimal
P[|µ ´ ^µn| ď errn] = 99% for errn = 2.58
c
c0 ´ cTC´1c
yTC´1y
n
where y = f(xi)
n
i=1
. But, θ needs to be inferred (by MLE) More . If C is nice,
then operations involving C only require O(n log(n)) operations.
13/23

Bayesian Cubature Is Promising
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
1E´2 IID Aﬃne 7E´4 100% 1.5E6 1.8E´1
1E´2 IID Genz 4E´4 100% 8.1E4 1.9E´2
1E´2 Sobol’ Genz 3E´4 100% 1.0E3 4.6E´3
1E´2 Bayes. Latt. Genz 1E´5 100% 1.0E3 3.7E´3
1E´3 IID Genz 7E´5 100% 2.0E6 3.8E´1
1E´3 Sobol’ Genz 2E´4 100% 2.0E3 6.1E´3
14/23

Bayesian Cubature Is Promising
fair price =
ż
Rd
e´rT
max

1
d
dÿ
j=1
Sj ´ K, 0

 e´zT
z/2
(2π)d/2
dz « $13.12
Sj = S0e(r´σ2
/2)jT/d+σxj
x = Az, AAT
= Σ = min(i, j)T/d
d
i,j=1
= VΛVT
, VT
V = I, A =
?
ΛV
1E´2 IID diﬀ 2E´3 100% 6.1E7 3.2E1
1E´2 Scr. Sob. cont. var. PCA 2E´3 96% 4.1E3 1.9E´2
1E´2 Bayes. Latt. PCA 2E´3 100% 1.6E4 5.1E´2
15/23

Think Like Amy Sue or Sarah Ann
Combining ideas from applied mathematics and statistics or statistics and
applied mathematics often leads to a better solution of your problem.
The cubature error may be represented as a trio identity, product of three
quantities (H., 2017+; Meng, 2017+) Look here and here :
the size of the function,
the quality of the sampling sites (design), and
the confounding, which is typically of order one.
Automatically tuning the parameters in your algorithm to obtain the desired
accuracy at modest cost is interesting and challenging (H. et al., 2013; H. and
Jiménez Rugama, 2016; Jiang, 2016; Jiménez Rugama and H., 2016; H. et al.,
2017+).
16/23

Thank you
Slides available at www.slideshare.net/fjhickernell/
mines-april-2017-colloquium

References I
Cools, R. and D. Nuyens (eds.) 2016. Monte Carlo and quasi-Monte Carlo methods: MCQMC,
Leuven, Belgium, April 2014, Springer Proceedings in Mathematics and Statistics, vol. 163,
Springer-Verlag, Berlin.
Cranley, R. and T. N. L. Patterson. 1976. Randomization of number theoretic methods for multiple
integration, SIAM J. Numer. Anal. 13, 904–914.
Diaconis, P. 1988. Bayesian numerical analysis, Statistical decision theory and related topics IV,
Papers from the 4th Purdue symp., West Lafayette, Indiana 1986, pp. 163–175.
Dick, J. and F. Pillichshammer. 2010. Digital nets and sequences: Discrepancy theory and
quasi-Monte Carlo integration, Cambridge University Press, Cambridge.
Genz, A. 1993. Comparison of methods for the computation of multivariate normal probabilities,
Computing Science and Statistics 25, 400–405.
H., F. J. 1998. A generalized discrepancy and quadrature error bound, Math. Comp. 67, 299–322.
. 2017+. Error analysis of quasi-Monte Carlo methods. submitted for publication,
arXiv:1702.01487.
H., F. J., L. Jiang, Y. Liu, and A. B. Owen. 2013. Guaranteed conservative ﬁxed width conﬁdence
intervals via Monte Carlo sampling, Monte Carlo and quasi-Monte Carlo methods 2012, pp. 105–128.
18/23

References II
H., F. J. and Ll. A. Jiménez Rugama. 2016. Reliable adaptive cubature using digital sequences,
Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014, pp. 367–383.
arXiv:1410.8615 [math.NA].
H., F. J., Ll. A. Jiménez Rugama, and D. Li. 2017+. Adaptive quasi-Monte Carlo methods. submitted
for publication, arXiv:1702.01491 [math.NA].
H., F. J., C. Lemieux, and A. B. Owen. 2005. Control variates for quasi-Monte Carlo, Statist. Sci. 20,
1–31.
Jiang, L. 2016. Guaranteed adaptive Monte Carlo methods for estimating means of random
variables, Ph.D. Thesis.
Jiménez Rugama, Ll. A. and F. J. H. 2016. Adaptive multidimensional integration based on rank-1
lattices, Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014,
pp. 407–422. arXiv:1411.1966.
Meng, X. 2017+. Statistical paradises and paradoxes in big data. in preparation.
O’Hagan, A. 1991. Bayes-Hermite quadrature, J. Statist. Plann. Inference 29, 245–260.
Owen, A. B. 1997a. Monte Carlo variance of scrambled net quadrature, SIAM J. Numer. Anal. 34,
1884–1910.
. 1997b. Scrambled net variance for integrals of smooth functions, Ann. Stat. 25, 1541–1562.
Rasmussen, C. E. and Z. Ghahramani. 2003. Bayesian Monte Carlo, Advances in Neural Information
Processing Systems, pp. 489–496.
19/23

References III
Ritter, K. 2000. Average-case analysis of numerical problems, Lecture Notes in Mathematics,
vol. 1733, Springer-Verlag, Berlin.
20/23

µ = E[f(X)] =
ż
Rd
f(x) ρ(x) dx
« ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ρ
P[|µ ´ ^µn| ď errn] ě 99%
for Φ ´
?
n errn /(1.2^σnσ
)
+ ∆n(´
?
n errn /(1.2^σnσ
), κmax) = 0.0025
by the Berry-Esseen Inequality,
where ^σ2
nσ
is the sample variance using an independent sample from that used to
simulate the mean, and provided that kurt(f(X)) ď κmax(nσ) (H. et al., 2013;
Jiang, 2016). Return
21/23

µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
Assume f P Hilbert space H with
reproducing kernel K (H., 1998)
µ(f) ´ ^µ(f) = xerr-rep, fy
= cos(err-rep, f) ˆ err-rep Hlooooomooooon
discrepancy
=O(n´1+
)
ˆ f H
err-rep
2
H =
ż
[0,1]2d
K(x, t) dxdt ´
2
n
nÿ
i=1
ż
[0,1]d
K(xi, t) dt +
1
n2
nÿ
i,j=1
K(xi, xj)
E.g., K(x, t) =
dź
k=1
1 + γ2
kt1 ´ |xk ´ tk|u ,
1
γu
B|u|
f
Bxu xsu=1 L2
u‰H 2
, γu =
ź
kPu
γk
22/23

µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
Let pf(k)
(
k
denote the coeﬃcients of the
Fourier Walsh expansion of f. Let tω(k)uk
be some weights. Then
µ ´ ^µn =
´
ÿ
0‰kPdual
pf(k)
! pf(k)
ω(k)
)
k 2
tω(k)u0‰kPdual 2
loooooooooooooooooooomoooooooooooooooooooon
ALNP[´1,1]
ˆ tω(k)u0‰kPdual 2loooooooooomoooooooooon
DSC(txiun
i=1)=O(n´1+ )
ˆ
#
pf(k)
ω(k)
+
k 2looooooomooooooon
VAR(f)
22/23

µ =
ż
[0,1]d
f(x) dx « ^µn =
1
n
nÿ
i=1
f(xi),
Let pf(k)
(
k
denote the coeﬃcients of the
Fourier Walsh expansion of f. Let tω(k)uk
be some weights. Then
Assuming that the pf(k) do not decay erratically as k Ñ ∞, the discrete
transform, rfn(k)
(
k
, may be used to bound the error reliably (H. and Jiménez
Rugama, 2016; Jiménez Rugama and H., 2016; H. et al., 2017+):
|µ ´ ^µn| ď errn := C(n)
ÿ
certaink
rfn(k)
Return
22/23

Maximum Likelihood Estimation of the Covariance Kernel
f „ GP(0, s2
Cθ), Cθ = Cθ(xi, xj)
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
P[|µ ´ ^µn| ď errn] = 99% for errn =
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
23/23

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
There is a de-randomized interpretation of Bayesian cubature (H., 2017+)
f P Hilbert space w/ reproducing kernel Cθ and with best interpolant rfy
Return 23/23

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
|µ ´ ^µn| ď
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ
loooooooooomoooooooooon
error representer ^θ
b
yTC´1
^θ
y
looooomooooon
rfy ^θ
if f ´ rfy ^θ
ď
2.58 rf ^θ?
n
Return 23/23

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
= argmin
θ
vol z P Rn
: rfz θ ď rfy θ
(
|µ ´ ^µn| ď
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ
loooooooooomoooooooooon
error representer ^θ
b
yTC´1
^θ
y
looooomooooon
rfy ^θ
if f ´ rfy ^θ
ď
2.58 rf ^θ?
n
Return 23/23

Mines April 2017 Colloquium

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mines April 2017 Colloquium

Similar to Mines April 2017 Colloquium (20)

Recently uploaded

Recently uploaded (20)

Mines April 2017 Colloquium