Tulane March 2017 Talk

Simulating the Mean Eﬃciently
and to a Given Tolerance
Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology
hickernell@iit.edu mypages.iit.edu/~hickernell
Thanks to Lan Jiang, Tony Jiménez Rugama, Jagadees Rathinavel,
and the rest of the the Guaranteed Automatic Integration Library (GAIL) team
Supported by NSF-DMS-1522687
Thanks for your kind invitation

Introduction IID Monte Carlo Low Discrepancy Sampling Bayesian Cubature Examples References
Estimating/Simulating/Computing an Integral
Gaussian probability =
ż
[a,b]
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx
2/16

ż
[a,b]
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx
option price =
ż
Rd
payoﬀ(x)
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
PDF of Brownian motion at d times
dx
2/16

ż
[a,b]
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx
option price =
ż
Rd
payoﬀ(x)
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
dx
Bayesian ^βj =
ż
Rd
ş
ş
Sobol’ indexj =
ş
[0,1]2d output(x) ´ output(xj, x1
´j) output(x1
) dx dx1
ş
[0,1]d output(x)2 dx ´
ş
[0,1]d output(x) dx
2
2/16

Estimating/Simulating/Computing the Mean
ż
[a,b]
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx
option price =
ż
Rd
payoﬀ(x)
e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
dx
Bayesian ^βj =
ż
Rd
ş
ş
Sobol’ indexj =
ş
[0,1]2d output(x) ´ output(xj, x1
´j) output(x1
) dx dx1
ş
[0,1]d output(x)2 dx ´
ş
[0,1]d output(x) dx
2
µ =
ż
Rd
g(x) dx = E[f(X)] =
ż
Rd
f(x) ν(dx) =?, ^µn =
nÿ
i=1
wif(xi)
How to choose ν, txiun
i=1, and twiun
i=1 to make |µ ´ ^µn| small? (trio identity)
Given εa, how big must n be to guarantee |µ ´ ^µn| ď εa? (adaptive cubature) 2/16

Product Rules Using Rectangular Grids
µ =
ż
Rd
f(x) ν(dx) « ^µn =
nÿ
i=1
wif(xi)
If
ż 1
0
f(x) dx ´
mÿ
i=1
wif(ti) = O(m´r
), then
ż
[0,1]d
f(x) dx
´
mÿ
i1=1
¨ ¨ ¨
mÿ
id=1
wi1
¨ ¨ ¨ wid
f(ti1
, . . . , tid
)
= O(m´r
) = O(n´r/d
)
assuming rth
derivatives in each direction exist.
3/16

Product Rules Using Rectangular Grids
µ =
ż
Rd
f(x) ν(dx) « ^µn =
nÿ
i=1
wif(xi)
If
ż 1
0
f(x) dx ´
mÿ
i=1
wif(ti) = O(m´r
), then
ż
[0,1]d
f(x) dx
´
mÿ
i1=1
¨ ¨ ¨
mÿ
id=1
wi1
¨ ¨ ¨ wid
f(ti1
, . . . , tid
)
= O(m´r
) = O(n´r/d
)
assuming rth
derivatives in each direction exist. But the computational cost
becomes prohibitive for large dimensions, d:
d 1 2 5 10 100
n = 8d
8 64 3.3E4 1.0E9 2.0E90
Product rules are typically a bad idea unless d is small. 3/16

Monte Carlo Simulation in the News
Sampling with a computer can be fast
How big is our error?
4/16

IID Monte Carlo
µ =
ż
Rd
f(x) ν(dx) « ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ν
µ ´ ^µn =
µ ´ ^µn
std(f(X))/
?
nlooooooomooooooon
CNF„(0,1)
ˆ
1
?
nloomoon
DSC(txiu)
ˆ std(f(X))loooomoooon
VAR(f)
trio identity (Meng, 2017+)
µ ´ ^µn =
ż
Rd
f(x) (ν ´ ^νn)(dx)
^νn =
1
n
nÿ
i=1
δxi
5/16

Central Limit Theorem Stopping Rule for IID Monte Carlo
µ =
ż
Rd
f(x) ν(dx) « ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ν
µ ´ ^µn =
µ ´ ^µn
std(f(X))/
?
nlooooooomooooooon
CNF„(0,1)
ˆ
1
?
nloomoon
DSC(txiu)
VAR(f)
P[|µ ´ ^µn| ď errn] « 99%
for errn =
2.58 ˆ 1.2^σ
?
n
by the Central Limit Theorem (CLT),
where ^σ2
is the sample variance.
5/16

Central Limit Theorem Stopping Rule for IID Monte Carlo
µ =
ż
Rd
f(x) ν(dx) « ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ν
µ ´ ^µn =
µ ´ ^µn
std(f(X))/
?
nlooooooomooooooon
CNF„(0,1)
ˆ
1
?
nloomoon
DSC(txiu)
VAR(f)
P[|µ ´ ^µn| ď errn] « 99%
for errn =
2.58 ˆ 1.2^σ
?
n
by the Central Limit Theorem (CLT),
where ^σ2
is the sample variance. But the CLT is only an asymptotic result, and
1.2^σ may be an overly optimistic upper bound on σ.
5/16

Berry-Esseen Stopping Rule for IID Monte Carlo
µ =
ż
Rd
f(x) ν(dx) « ^µn =
1
n
nÿ
i=1
f(xi), xi
IID
„ ν
µ ´ ^µn =
µ ´ ^µn
std(f(X))/
?
nlooooooomooooooon
CNF„(0,1)
ˆ
1
?
nloomoon
DSC(txiu)
VAR(f)
P[|µ ´ ^µn| ď errn] ě 99%
for Φ ´
?
n errn /(1.2^σnσ
)
+ ∆n(´
?
n errn /(1.2^σnσ
), κmax) = 0.0025
by the Berry-Esseen Inequality,
where ^σ2
nσ
is the sample variance using an independent sample from that used to
simulate the mean, and provided that kurt(f(X)) ď κmax(nσ) (H. et al., 2013;
Jiang, 2016)
5/16

Adaptive Low Discrepancy Sampling Cubature
µ =
ż
[0,1]d
f(x) dx
^µn =
1
n
nÿ
i=1
f(xi), xi Sobol’ or lattice
Normally n should be a power of 2
6/16

µ =
ż
[0,1]d
f(x) dx
^µn =
1
n
nÿ
i=1
Let tpf(k)uk denote the coeﬃcients of the
Fourier Walsh or complex exponential
expansion of f. Let tω(k)uk be some
weights. Then
µ ´ ^µn =
´
ÿ
0‰kPdual
pf(k)
! pf(k)
ω(k)
)
k 2
tω(k)u0‰kPdual 2
loooooooooooooooooooomoooooooooooooooooooon
CNFP[´1,1]
ˆ tω(k)u0‰kPdual 2loooooooooomoooooooooon
DSC(txiun
i=1)=O(n´1+ )
ˆ
#
pf(k)
ω(k)
+
k 2looooooomooooooon
VAR(f)
6/16

µ =
ż
[0,1]d
f(x) dx
^µn =
1
n
nÿ
i=1
Let tpf(k)uk denote the coeﬃcients of the
Fourier Walsh or complex exponential
expansion of f.
Assuming that the pf(k) do not decay erratically as k Ñ ∞, the discrete
transform, rfn(k)
(
k
, may be used to bound the error reliably (H. and Jiménez
Rugama, 2016; Jiménez Rugama and H., 2016; H. et al., 2017+):
|µ ´ ^µn| ď errn := C(n)
ÿ
certaink
rfn(k)
6/16

Bayesian Cubature—f Is Random
µ =
ż
Rd
f(x) ν(dx) « ^µn =
nÿ
i=1
wi f(xi)
Assume f „ GP(0, s2
Cθ) (Diaconis, 1988;
O’Hagan, 1991; Ritter, 2000; Rasmussen and
Ghahramani, 2003)
c0 =
ż
RdˆRd
Cθ(x, t) ν(dx)ν(dt)
c =
ż
Rd
Cθ(xi, t) ν(dt)
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
Choosing w = wi
n
i=1
= C´1
c is optimal
µ ´ ^µn =
µ ´ ^µn
b
c0 ´ cTC´1c yTC´1y
nlooooooooooooooomooooooooooooooon
CNF„N(0,1)
ˆ
a
c0 ´ cTC´1cloooooooomoooooooon
DSC
ˆ
c
yTC´1y
nlooooomooooon
VAR(f)
where y = f(xi)
n
i=1
.
7/16

Bayesian Cubature—f Is Random
µ =
ż
Rd
f(x) ν(dx) « ^µn =
nÿ
i=1
wi f(xi)
Assume f „ GP(0, s2
Cθ) (Diaconis, 1988;
O’Hagan, 1991; Ritter, 2000; Rasmussen and
Ghahramani, 2003)
c0 =
ż
RdˆRd
Cθ(x, t) ν(dx)ν(dt)
c =
ż
Rd
Cθ(xi, t) ν(dt)
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
Choosing w = wi
n
i=1
= C´1
c is optimal
P[|µ ´ ^µn| ď errn] = 99% for errn = 2.58
c
c0 ´ cTC´1c
yTC´1y
n
where y = f(xi)
n
i=1
. But, θ needs to be inferred (by MLE), and C´1
typically
requires O(n3
) operations
7/16

Gaussian Probability
µ =
ż
[a,b]
exp ´1
2 tT
Σ´1
t
a
(2π)d det(Σ)
dt
Genz (1993)
=
ż
[0,1]d´1
f(x) dx
For some typical choice of a, b, Σ, d = 3, εa = 0; µ « 0.6763
Worst 10% Worst 10%
εr Method % Accuracy n Time (s)
IID Monte Carlo 100% 8.1E4 1.8E´2
1E´2 Sobol’ Sampling 100% 1.0E3 5.1E´3
Bayesian Lattice 100% 1.0E3 2.8E´3
IID Monte Carlo 100% 2.0E6 3.8E´1
Bayesian lattice cubature uses covariance kernel C for which C is circulant,
and operations on C require only O(n log(n)) operations 8/16

Asian Option Pricing
fair price =
ż
Rd
e´rT
max

1
d
dÿ
j=1
Sj ´ K, 0

 e´xT
Σ´1
x/2
(2π)d/2 |Σ|1/2
dx « $13.12
Sj = S0e(r´σ2
/2)jT/d+σxj
= stock price at time jT/d,
Σ = min(i, j)T/d
d
i,j=1
Worst 10% Worst 10%
εa = 1E´4 Method % Accuracy n Time (s)
Sobol’ Sampling 100% 2.1E6 4.3
Sobol’ Sampling w/ control variates 97% 1.0E6 2.1
The coeﬃcient of the control variate for low discrepancy sampling is diﬀerent than
for IID Monte Carlo (H. et al., 2005; H. et al., 2017+)
9/16

Sobol’ Indices
Y = output(X), where X „ U[0, 1]d
; Sobol’ Indexj(µ) describes how much
coordinate j of input X inﬂuences output Y (Sobol’, 1990; 2001):
Sobol’ Indexj(µ) :=
µ1
µ2 ´ µ2
3
, j = 1, . . . , d
µ1 :=
ż
[0,1)2d
[output(x) ´ output(xj, x1
´j)]output(x1
) dx dx1
µ2 :=
ż
[0,1)d
output(x)2
dx, µ3 :=
ż
[0,1)d
output(x) dx
output(x) = ´x1 + x1x2 ´ x1x2x3 + ¨ ¨ ¨ + x1x2x3x4x5x6 (Bratley et al., 1992)
εa = 1E´3, εr = 0 j 1 2 3 4 5 6
n 65 536 32 768 16 384 16 384 2 048 2 048
Sobol’ Indexj 0.6529 0.1791 0.0370 0.0133 0.0015 0.0015
{Sobol’ Indexj 0.6528 0.1792 0.0363 0.0126 0.0010 0.0012
Sobol’ Indexj(pµn) 0.6492 0.1758 0.0308 0.0083 0.0018 0.0039
10/16

Summary
The error in simulating the mean can be decomposed as a trio identity
(Meng, 2017+; H., 2017+)
Knowing when to stop a simulation of the mean is not trivial (H. et al., 2017+)
The Berry-Esseen inequality can tell us when to stop an IID simulation
Fourier analysis can tell us when to stop a low discrepancy simulation
Bayesian cubature can tell us when to stop a simulation if you can aﬀord the
computational cost
All methods can be fooled by nasty functions, f
Relative error tolerances and problems involving functions of integrals can
be handled (H. et al., 2017+)
Our algorithms are implemented in the Guaranteed Automatic Integration
Library (GAIL) (Choi et al., 2013–2015), which is under continuous
development
11/16

Upcoming SAMSI Quasi-Monte Carlo Program
12/16

Thank you
Slides available at www.slideshare.net/fjhickernell/
tulane-march-2017-talk

References I
Bratley, P., B. L. Fox, and H. Niederreiter. 1992. Implementation and tests of low-discrepancy
sequences, ACM Trans. Model. Comput. Simul. 2, 195–213.
Choi, S.-C. T., Y. Ding, F. J. H., L. Jiang, Ll. A. Jiménez Rugama, X. Tong, Y. Zhang, and X. Zhou.
2013–2015. GAIL: Guaranteed Automatic Integration Library (versions 1.0–2.1).
Cools, R. and D. Nuyens (eds.) 2016. Monte Carlo and quasi-Monte Carlo methods: MCQMC,
Leuven, Belgium, April 2014, Springer Proceedings in Mathematics and Statistics, vol. 163,
Springer-Verlag, Berlin.
Diaconis, P. 1988. Bayesian numerical analysis, Statistical decision theory and related topics IV,
Papers from the 4th Purdue symp., West Lafayette, Indiana 1986, pp. 163–175.
Genz, A. 1993. Comparison of methods for the computation of multivariate normal probabilities,
Computing Science and Statistics 25, 400–405.
H., F. J. 2017+. Error analysis of quasi-Monte Carlo methods. submitted for publication,
arXiv:1702.01487.
H., F. J., L. Jiang, Y. Liu, and A. B. Owen. 2013. Guaranteed conservative ﬁxed width conﬁdence
intervals via Monte Carlo sampling, Monte Carlo and quasi-Monte Carlo methods 2012, pp. 105–128.
H., F. J. and Ll. A. Jiménez Rugama. 2016. Reliable adaptive cubature using digital sequences,
Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014, pp. 367–383.
arXiv:1410.8615 [math.NA].
14/16

References II
H., F. J., Ll. A. Jiménez Rugama, and D. Li. 2017+. Adaptive quasi-Monte Carlo methods. submitted
for publication, arXiv:1702.01491 [math.NA].
H., F. J., C. Lemieux, and A. B. Owen. 2005. Control variates for quasi-Monte Carlo, Statist. Sci. 20,
1–31.
Jiang, L. 2016. Guaranteed adaptive Monte Carlo methods for estimating means of random
variables, Ph.D. Thesis.
Jiménez Rugama, Ll. A. and F. J. H. 2016. Adaptive multidimensional integration based on rank-1
lattices, Monte Carlo and quasi-Monte Carlo methods: MCQMC, Leuven, Belgium, April 2014,
pp. 407–422. arXiv:1411.1966.
Meng, X. 2017+. Statistical paradises and paradoxes in big data. in preparation.
O’Hagan, A. 1991. Bayes-Hermite quadrature, J. Statist. Plann. Inference 29, 245–260.
Rasmussen, C. E. and Z. Ghahramani. 2003. Bayesian Monte Carlo, Advances in Neural Information
Processing Systems, pp. 489–496.
Ritter, K. 2000. Average-case analysis of numerical problems, Lecture Notes in Mathematics,
vol. 1733, Springer-Verlag, Berlin.
Sobol’, I. M. 1990. On sensitivity estimation for nonlinear mathematical models, Matem. Mod. 2,
no. 1, 112–118.
. 2001. Global sensitivity indices for nonlinear mathematical models and their monte carlo
estimates, Math. Comput. Simul. 55, no. 1-3, 271–280.
15/16

Maximum Likelihood Estimation of the Covariance Kernel
f „ GP(0, s2
Cθ), Cθ = Cθ(xi, xj)
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
P[|µ ´ ^µn| ď errn] = 99% for errn =
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
16/16

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
There is a de-randomized interpretation of Bayesian cubature (H., 2017+)
f P Hilbert space w/ reproducing kernel Cθ and with best interpolant rfy
16/16

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
|µ ´ ^µn| ď
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ
loooooooooomoooooooooon
error representer ^θ
b
yTC´1
^θ
y
looooomooooon
rfy ^θ
if f ´ rfy ^θ
ď
2.58 rf ^θ?
n
16/16

f „ GP(0, s2
n
i,j=1
y = f(xi)
n
i=1
, ^µn = cT
^θ
C´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ yTC´1
^θ
y
^θ = argmin
θ
yT
C´1
θ y
[det(C´1
θ )]1/n
= argmin
θ
vol z P Rn
: rfz θ ď rfy θ
(
|µ ´ ^µn| ď
2.58
?
n
b
c0,^θ ´ cT
^θ
C´1
^θ
c^θ
loooooooooomoooooooooon
error representer ^θ
b
yTC´1
^θ
y
looooomooooon
rfy ^θ
if f ´ rfy ^θ
ď
2.58 rf ^θ?
n
16/16

Tulane March 2017 Talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Tulane March 2017 Talk

Similar to Tulane March 2017 Talk (20)

Recently uploaded

Recently uploaded (20)

Tulane March 2017 Talk