SIAM - Minisymposium on Guaranteed numerical algorithms
1. Fast Automatic Bayesian Cubature
with Lattice Sampling
Jagadeeswaran R., Fred J. Hickernell
Department of Applied Mathematics, Illinois Institute of Technology, Chicago IL
jrathin1@iit.edu
Thanks to GAIL Team
Thanks to my adviser Prof. Fred J. Hickernell and my employer Wi-Tronix LLC
for the conference support
• July 9, 2018 •
2. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Numerical Integration
A fundamental problem in various fields, including finance, machine learning and
statistics.
µ =
»
Rd
g(x) dx =
»
[0,1]d
f(x) dx =E[f(X)], where X U[0, 1]d
by a cubature rule ^µn :=w0 +
n¸
j=1
f(xj)wj
using points txjun
j=1 and associated weights wj.
2/33
3. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Numerical Integration
A fundamental problem in various fields, including finance, machine learning and
statistics.
µ =
»
Rd
g(x) dx =
»
[0,1]d
f(x) dx =E[f(X)], where X U[0, 1]d
by a cubature rule ^µn :=w0 +
n¸
j=1
f(xj)wj
using points txjun
j=1 and associated weights wj.
The goal of this work is to
Develop an automatic algorithm for integration
Assume f is a Gaussian process
Need to estimate the mean and Covariance kernel
MLE is expensive in general
Use points and kernel for which MLE is cheap
Use an extensible point-set and an algorithm that allows to add more points
if needed
Determine n such that, given , |µ ¡ ^µn| ¤ 2/33
5. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Motivating Examples
Gaussian probability =
»
[a,b]
e¡xT
Σ¡1
x/2
(2π)d/2 |Σ|1/2
dx, (Genz, 1993)
Option pricing =
»
Rd
payoff(x)
e¡xT
Σ¡1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
PDF of Brownian motion at d times
dx, (Glasserman, 2004)
where payoff(x) = e¡rT
max
1
d
d¸
k=1
Sk(xk) ¡K, 0
Sj(xj) = S0e(r¡σ2
/2)tj+σxj
= stock price at time tj = jT/d;
3/33
6. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Motivating Examples
Gaussian probability =
»
[a,b]
e¡xT
Σ¡1
x/2
(2π)d/2 |Σ|1/2
dx, (Genz, 1993)
Option pricing =
»
Rd
payoff(x)
e¡xT
Σ¡1
x/2
(2π)d/2 |Σ|1/2
looooooomooooooon
PDF of Brownian motion at d times
dx, (Glasserman, 2004)
where payoff(x) = e¡rT
max
1
d
d¸
k=1
Sk(xk) ¡K, 0
Sj(xj) = S0e(r¡σ2
/2)tj+σxj
= stock price at time tj = jT/d;
Keister integral =
»
Rd
cos( x ) exp(¡ x 2
) dx, d = 1, 2, . . . (Keister, 1996)
3/33
7. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
1: procedure AutoCubature(f, ) ™ Integrate within the error tolerance
2: n0 Ð 28
™ start with minimum number of points
3: n Ð n0, nI Ð 0
4: while true do ™ Iterate till error tolerance is met
5: Generate txiun
i=nI+1 and sample tf(xi)un
i=nI+1
6: Compute error bound errn ™ errn data driven error bound
7: if errn ¤ then break
8: end if
9: nI Ð n, n Ð 2 ¢nI
10: end while
11: Compute cubature weights twiun
i=1
12: Compute approximate integral ^µn
13: return ^µn ™ Integral estimate ^µn
14: end procedure
Problem:
How to choose txiun
i=1, and twiun
i=1 to make |µ ¡ ^µn| small? what is errn?
(Bayesian posterior error)
How to find n such that |µ ¡ ^µn| ¤ errn ¤ ? (automatic cubature) 4/33
8. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
1: procedure AutoCubature(f, ) ™ Integrate within the error tolerance
2: n0 Ð 28
™ start with minimum number of points
3: n Ð n0, nI Ð 0
4: while true do ™ Iterate till error tolerance is met
5: Generate txiun
i=nI+1 and sample tf(xi)un
i=nI+1
6: Compute error bound errn ™ errn data driven error bound
7: if errn ¤ then break
8: end if
9: nI Ð n, n Ð 2 ¢nI
10: end while
11: Compute cubature weights twiun
i=1
12: Compute approximate integral ^µn
13: return ^µn ™ Integral estimate ^µn
14: end procedure
Problem:
How to choose txiun
i=1, and twiun
i=1 to make |µ ¡ ^µn| small? what is errn?
(Bayesian posterior error)
How to find n such that |µ ¡ ^µn| ¤ errn ¤ ? (automatic cubature) 4/33
9. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Bayesian posterior error to get error bound
Assume random f GP(m, s2
Cθ), a Gaussian process with mean m and
covariance kernel, s2
Cθ, Cθ : [0, 1] ¢[0, 1] Ñ R .
Lets define c0 =
»
[0,1]¢[0,1]
Cθ(x, t)dxdt,
c =
»
[0,1]
Cθ(xi, t)dt
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
5/33
10. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Bayesian posterior error to get error bound
Assume random f GP(m, s2
Cθ), a Gaussian process with mean m and
covariance kernel, s2
Cθ, Cθ : [0, 1] ¢[0, 1] Ñ R .
Lets define c0 =
»
[0,1]¢[0,1]
Cθ(x, t)dxdt,
c =
»
[0,1]
Cθ(xi, t)dt
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
µ ¡ ^µn
§
§y N ¡w0 + m(1 ¡1T
C¡1
c) + yT
(C¡1
c ¡w), s2
(c0 ¡cT
C¡1
c)
where y = f(xi)
n
i=1
. Moreover m, s and θ needs to be inferred (by MLE).
^µn = w0 +
n¸
i=1
wif(xi) = w0 + wT
y
5/33
11. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Bayesian posterior error to get error bound
Assume random f GP(m, s2
Cθ), a Gaussian process with mean m and
covariance kernel, s2
Cθ, Cθ : [0, 1] ¢[0, 1] Ñ R .
Lets define c0 =
»
[0,1]¢[0,1]
Cθ(x, t)dxdt,
c =
»
[0,1]
Cθ(xi, t)dt
n
i=1
, C = Cθ(xi, xj)
n
i,j=1
µ ¡ ^µn
§
§y N ¡w0 + m(1 ¡1T
C¡1
c) + yT
(C¡1
c ¡w), s2
(c0 ¡cT
C¡1
c)
where y = f(xi)
n
i=1
. Moreover m, s and θ needs to be inferred (by MLE).
^µn = w0 +
n¸
i=1
wif(xi) = w0 + wT
y
In general choosing w0 = m(1 ¡1T
C¡1
c), w = C¡1
c, makes error unbiased
If m = 0 fixed, choosing w = C¡1
c, makes error unbiased
Diaconis (1988), O’Hagan (1991), Ritter (2000), Rasmussen (2003) and others 5/33
12. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Maximum likelihood estimation
Assume random f GP(m, s2
Cθ), a Gaussian process
with mean m and covariance kernel, s2
Cθ, where Cθ : [0, 1] ¢[0, 1] Ñ R. Then
log-likelihood given the data y = (f(xi))n
i=1 is :
l(y|s, θ) = log (2π)n
det(s2
C)
¡1
2
exp ¡1
2
s¡2
(y ¡m1)T
C¡1
(y ¡m1)
6/33
13. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Maximum likelihood estimation
Assume random f GP(m, s2
Cθ), a Gaussian process
with mean m and covariance kernel, s2
Cθ, where Cθ : [0, 1] ¢[0, 1] Ñ R. Then
log-likelihood given the data y = (f(xi))n
i=1 is :
l(y|s, θ) = log (2π)n
det(s2
C)
¡1
2
exp ¡1
2
s¡2
(y ¡m1)T
C¡1
(y ¡m1)
Maximising w.r.t m and then s2
, further with θ :
mMLE =
1T
C¡1
y
1T
C¡11
, (Explicit)
s2
MLE =
1
n
(y ¡mMLE1)T
C¡1
(y ¡mMLE1), (Explicit)
θMLE = argmin
θ
log
1
2n
log(det C) + log(sMLE) (numeric)
6/33
14. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Maximum likelihood estimation
Assume random f GP(m, s2
Cθ), a Gaussian process
with mean m and covariance kernel, s2
Cθ, where Cθ : [0, 1] ¢[0, 1] Ñ R. Then
log-likelihood given the data y = (f(xi))n
i=1 is :
l(y|s, θ) = log (2π)n
det(s2
C)
¡1
2
exp ¡1
2
s¡2
(y ¡m1)T
C¡1
(y ¡m1)
Maximising w.r.t m and then s2
, further with θ :
mMLE =
1T
C¡1
y
1T
C¡11
, (Explicit)
s2
MLE =
1
n
(y ¡mMLE1)T
C¡1
(y ¡mMLE1), (Explicit)
θMLE = argmin
θ
log
1
2n
log(det C) + log(sMLE) (numeric)
Why do we need θMLE? Function space spanned by Cθ customized to contain
the integrand f.
6/33
15. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Stopping criterion: When to stop?
Since µ ¡ ^µn
§
§(f = y) N 0, s2
MLE(c0 ¡cT
C¡1
c)
With w0 = mMLE(1 ¡1T
C¡1
c), w = cT
C¡1
7/33
16. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Stopping criterion: When to stop?
Since µ ¡ ^µn
§
§(f = y) N 0, s2
MLE(c0 ¡cT
C¡1
c)
With w0 = mMLE(1 ¡1T
C¡1
c), w = cT
C¡1
If 2.58sMLE
˜
c0 ¡cTC¡1c ¤ , then P[|µ ¡ ^µn| ¤ ] ¥ 99%,
Where c0, c and C depend on θMLE
Stopping criterion : n = min
4
nj € Z¡0 : 2.58 sMLE
˜
c0 ¡cTC¡1c ¤
B
Cubature rule : ^µn = w0 + wT
y = mMLE(1 ¡1T
C¡1
c) + cT
C¡1
y.
7/33
17. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Stopping criterion: When to stop?
Since µ ¡ ^µn
§
§(f = y) N 0, s2
MLE(c0 ¡cT
C¡1
c)
With w0 = mMLE(1 ¡1T
C¡1
c), w = cT
C¡1
If 2.58sMLE
˜
c0 ¡cTC¡1c ¤ , then P[|µ ¡ ^µn| ¤ ] ¥ 99%,
Where c0, c and C depend on θMLE
Stopping criterion : n = min
4
nj € Z¡0 : 2.58 sMLE
˜
c0 ¡cTC¡1c ¤
B
Cubature rule : ^µn = w0 + wT
y = mMLE(1 ¡1T
C¡1
c) + cT
C¡1
y.
Problem :
C¡1
typically requires O(n3
) operations, which makes it impractical for large n.
7/33
18. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Multivariate normal integration with Matern kernel
Problem: Computation time (in seconds) increases rapidly, so we cannot use
more than 4000 points in the cubature.
8/33
19. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Fast transform kernel
Choose the kernel Cθ and txiun
i=1, so the Gram matrix C = Cθ(xi, xj)
n
i,j=1
has
the special properties:
C = Cθ(xi, xj)
n
i,j=1
= (C1, ..., Cn) =
1
n
VΛVH
V :=(v1, ..., vn)T
= (V1, ..., Vn), V1 = v1 = 1,
Λ = diag(λ), λ = (λ1, ..., λn)
Then
9/33
20. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Fast transform kernel
Choose the kernel Cθ and txiun
i=1, so the Gram matrix C = Cθ(xi, xj)
n
i,j=1
has
the special properties:
C = Cθ(xi, xj)
n
i,j=1
= (C1, ..., Cn) =
1
n
VΛVH
V :=(v1, ..., vn)T
= (V1, ..., Vn), V1 = v1 = 1,
Λ = diag(λ), λ = (λ1, ..., λn)
Then
VT
C1 = VT 1
n
V¦Λv1 =
1
n
VT
V¦
looooomooooon
I
Λv1 = Λ1 =
λ1
...
λn
= λ
Cθ is a fast transform kernel, if the transform ^z = VT
z for arbitrary z, can be
done in O(n log n). Using the fast transform,
mMLE =
1T
C¡1
y
1T
C¡11
=
1
n
n¸
i=1
yi
s2
MLE =
1
n
yT
C¡1
¡ C¡1
11T
C¡1
1T
C¡11
y =
1
n2
n¸
i=2
^yi
2
λi
9/33
21. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Optimal shape parameter θMLE
Maximum likelihood estimate of θ made faster by using the properties of the fast
transform kernel:
θMLE = argmin
θ
log log(sMLE) +
1
n
log(det C)
= argmin
θ
log
n¸
i=2
|^yi|2
λi
+
1
n
n¸
i=1
log(λi)
^y = (^yi)n
i=1 = VT
y, λ = (λi)n
i=1 = VT
C1, where C1 = C(xi, x1)
n
i=1
O(n log n) operations to compute ^y and ^λ, So the θMLE
10/33
22. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Computing the error bound errn and ^µn faster
Using the properties of the fast transform kernel, the error bound errn can be
computed faster
errn = 2.58 sMLE
˜
c0 ¡cTC¡1c
= 2.58
g
f
f
e
1
n2
n¸
i=2
|^yi|2
λi
g
f
f
e
c0 ¡ 1
n
n¸
i=1
|pci|2
λi
similarly, ^µn can be computed faster
^µn = wT
y =
(1 ¡1T
C¡1
c)
1T
C¡11
C¡1
1 + C¡1
c
T
y =
py1
n
+
1
n
n¸
i=2
pc¦
i pyi
λi
where
^y = VT
y, λ = VT
C1, where C1 = (C(xi, x1))n
i=1
O(n log n) operations to compute the errn. O(n) operations to compute the ^µn
11/33
23. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Special shift invariant covariance kernel
Cθ(x, t) =
d¹
l=1
1 ¡θr
l
(2π
c
¡1)r
r!
Br(|xl ¡tl|), θ € (0, 1]d
, r € 2N
12/33
24. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Special shift invariant covariance kernel
Cθ(x, t) =
d¹
l=1
1 ¡θr
l
(2π
c
¡1)r
r!
Br(|xl ¡tl|), θ € (0, 1]d
, r € 2N
where Br is Bernoulli polynomial of order r (Olver et al., 2013).
Br(x) =
¡r!
(2π
c
¡1)r
∞¸
k$0,k=¡∞
e2π
c¡1kx
kr
5
for r = 1, 0 x 1
for r = 2, 3, . . . 0 ¤ x ¤ 1
,
We call Cθ, Fourier kernel. Also this kernel has:
c0 =
»
[0,1]2
Cθ(x, t)dxdt = 1, c =
»
[0,1]
Cθ(xi, t)dt
n
i=1
= 1.
This simplifies:
^µn =
^y1
n
, errn = 2.58
g
f
f
e
1
n2
n¸
i=2
|^yi|2
λi
1 ¡ n
λ1
12/33
26. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Rank-1 Lattice rules : low discrepancy point set
Given the “generating vector” h, the construction of n - Rank-1 lattice points (Dick
and Pillichshammer, 2010) is given by
Ln,h := txi := hφ(i ¡1) mod 1; i = 1, . . . , nu (1)
where h is a generalized Mahler integer (∞ digit expression) (Hickernell and
Niederreiter, 2003) also called generating vector. φ(i) is the Van der Corput
sequence in base 2. Then the Lattice rule approximation is
1
n
n¸
k=1
f
4
kh
n
+ ∆
B
1
where t.u1 the fractional part, i.e, modulo 1 operator and ∆ a random shift.
Extensible integration lattices : The number of points in the node set can be
increased while retaining the existing points.(Hickernell and Niederreiter, 2003)
14/33
28. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Iterative DFT
We can avoid recomputing the whole Fourier transform for function values
vy = (yi = f(xi))n
i=1 in every iteration. Discrete Fourier transform is defined as
DFTtyu := ^y =
n¸
j=1
yje¡2π
c¡1
n (j¡1)(i¡1)
n
i=1
, ^yi =
n¸
j=1
yje¡2π
c¡1
n (j¡1)(i¡1)
16/33
29. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Iterative DFT
We can avoid recomputing the whole Fourier transform for function values
vy = (yi = f(xi))n
i=1 in every iteration. Discrete Fourier transform is defined as
DFTtyu := ^y =
n¸
j=1
yje¡2π
c¡1
n (j¡1)(i¡1)
n
i=1
, ^yi =
n¸
j=1
yje¡2π
c¡1
n (j¡1)(i¡1)
Rearrange sum into even indexed j = 2l and odd indexed j = 2l + 1.
^yi =
n/2¸
l=1
y2le¡2π
c¡1
n/2 (l¡1)(i¡1)
looooooooooooomooooooooooooon
DFT of even-indexed part of yi
+e¡2π
c¡1
n (i¡1)
n/2¸
l=1
y2l+1e¡2π
c¡1
n/2 (l¡1)(i¡1)
loooooooooooooomoooooooooooooon
DFT of odd-indexed part of yi
we use this concept along with extensible point set, to avoid recomputing the DFT
of y in every iteration.
16/33
30. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Cancellation error in errn
errn = 2.58
g
f
f
e
1 ¡ n
λ1
1
n2
n¸
i=2
|^yi|2
λi
, term 1 ¡ n
λ1
causes cancellation error
17/33
31. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Cancellation error in errn
errn = 2.58
g
f
f
e
1 ¡ n
λ1
1
n2
n¸
i=2
|^yi|2
λi
, term 1 ¡ n
λ1
causes cancellation error
Let rC(x, t) = C(x, t) ¡1, then rC = C ¡11T
, and rC = V rΛVH
where rΛ = diag(˜λ1, ..., ˜λn), to compute (˜λi)n
i=1 = VT rC1
˜λ1 = λ1 ¡n, ˜λj = λj, d j = 2, ..., n
17/33
32. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Cancellation error in errn
errn = 2.58
g
f
f
e
1 ¡ n
λ1
1
n2
n¸
i=2
|^yi|2
λi
, term 1 ¡ n
λ1
causes cancellation error
Let rC(x, t) = C(x, t) ¡1, then rC = C ¡11T
, and rC = V rΛVH
where rΛ = diag(˜λ1, ..., ˜λn), to compute (˜λi)n
i=1 = VT rC1
˜λ1 = λ1 ¡n, ˜λj = λj, d j = 2, ..., n
vector rC1 = rC
(d)
1 computed iteratively
rC
(1)
1 = θ B(xi1 ¡x11)
n
i=1
, C
(1)
1 = 1 + rC
(1)
1 ,
d1 k ¤ d, rC
(k)
1 = θC
(k¡1)
1 ¥ B(xik ¡x1k)
n
i=1
+ rCk¡1, C
(k)
1 = 1 + rC
(k)
1
where ¥ is elementwise multiplication.
Then:
1 ¡ n
λ1
= 1 ¡ n
n + ˜λ1
=
˜λ1
n + ˜λ1
17/33
33. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Computation time with Fourier kernel
Figure: MVN : d=2 r=2, C1sin
Computation time increases linearly, so we can use more than 4000 points in the
Cubature, upto 223
points with 16GB RAM computer. 18/33
34. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Error bound vs actual error - Automatic cubature
Figure: Left : OptionPricing, Right : MVN with various error bounds
shows the Automatic Bayesian cubature algorithm meets the error bound errn.
Option pricing example is only with Baker transform and Bernoulli order
r = 2, 4.
MVN example uses Baker, C1sin and C2sin transforms, Bernoulli order
r = 2, 4 and dimension d = 2, 3. 19/33
35. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Error vs Time - Automatic cubature
Figure: Left : OptionPricing = 10¡2
, Right : MVN = 10¡4
shows the Automatic Bayesian cubature algorithms meets the specified in a
less than few seconds.
Option pricing example is only with Baker transform and Bernoulli order
r = 2, 4.
MVN example uses Baker, C1sin and C2sin transforms, Bernoulli order
r = 2, 4 and dimension d = 2, 3. 20/33
36. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Multivariate normal probability
21/33
37. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
l
Summary
Developed a general technique for a Fast transform kernel
Developed a fast automatic Bayesian cubature with O(n log n) complexity
Having the advantages of a kernel method and the low computation cost of
Quasi Monte carlo
Scalable based on the complexity of the Integrand
i.e, Kernel order and Lattice-points can be chosen to suit the smoothness of
the integrand
Conditioning problem if the kernel C is very smooth
Source code : https://github.com/GailGithub/GAIL_Dev/tree/
feature/BayesianCubature
More about Guaranteed Automatic Integration Algorithms (GAIL):
http://gailgithub.github.io/GAIL_Dev/ 22/33
38. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Future work
Choosing the kernel order r and periodization transform automatically
Deterministic interpretation of Bayesian cubature
Broaden the choice of numerical examples
Better handling of conditioning problem and numerical errors
Sobol pointset and Fast Walsh Transform with smooth kernels ...
23/33
39. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Future work : Sobol points and Fast walsh transform
Using the established generalized theory for a Fast transform kernel, we
could use other kernels with suitable point sets to achieve similar or better
performance and accuracy.
One such point sets to consider in future is, Sobol points and with
appropriate choice of smooth kernel, should lead to Fast Walsh Transform.
24/33
40. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Future work : More applications
Control variates : We would like to approximate a function of the form
(f ¡β1g1¡, ..., ¡βpgp), then
f = N β0 + β1g1+, ..., +βpgp, s2
C
Function approximation : consider approximating a function of the form
»
[0,1]d
f(φ(t)).
fφ
ftlooooooomooooooon
g(t)
dt, where
fφ
ft
is Jacobian, then
g(ψ(x)) = f(φ(ψ(x)looomooon
x
).
fφ
ft
(ψ(x)), f(x) = g(ψ(x)).
1
fφ
ft (ψ(x))
Finally, the function approximation is
˜f(x) = ˜g(ψ(x))
=
¸
wiC(., .)
25/33
41. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Full Bayes - parameter estimation
Consider m, s2
as hyperparameters with prior ρm,s2 (ξ, λ)W1
λ
. Then µ|(f = y) has
a Student’s-t distribution with:
pµfull|(f = y) =
(1 ¡cT
C¡1
1)1T
1T
C¡11
+ cT
C¡1
y,
pσ2
full|(f = y) =
1
n ¡1
(1 ¡cT
C¡1
1)2
1T
C¡11
+ (c0 ¡cT
C¡1
c)
¢yT
C¡1
¡ C¡1
11T
C¡1
1T
C¡11
y
So, the stopping criterion for the full Bayes case,
n = min
4
nj € Z¡0 :
t2
nj¡1,0.995
nj ¡1
(1 ¡cT
C¡1
1)2
1T
C¡11
+ (c0 ¡cT
C¡1
c)
¢yT
C¡1
¡ C¡1
11T
C¡1
1T
C¡11
y ¤ ε2
B
.
26/33
42. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Asserting Gaussian process assumption
How can we check if m, s, θ and C are chosen well, So f is a draw from the
Gaussian process?
f = (f(xi))n
i=1 N m1, s2
C , where C =
1
n
VΛVH
, VH
V = n
Let f I =
1
cn
Λ¡1
2 VH
f,
Then, E f I =
1
cn
Λ¡1
2 VH
E[f] = m
™
n
λ1
1
0
...
0
,
COV f I =
1
n
E Λ¡1
2 VH
(f ¡m1)(f ¡m1)T
VΛ¡1
2
=
1
n
Λ¡1
2 VH 1
n
VΛVH
VΛ¡1
2 = I
Thus, the distribution of, f I N (mIe1, I) , Where mI = m
™
n
λ1
.
If we can verify the sample distribution of f I is N (mIe1, 1), It could validate our
assumption. 27/33
43. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Normal plots
Figure: Normal plot, Left : Keister, Right : MVN
shows the f I is approximately Gaussian distributed.
Both the examples with C1sin transform and Bernoulli order r = 2 and
dimension d = 2.
28/33
44. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Periodization transforms
Baker’s : ˜f(t) = f 1 ¡2 tj ¡ 1
2
d
j=1
C0 : ˜f(t) = f (¯g0(t))
d¹
j=1
gI
0(tj), g0(t) = 3t2
¡2t3
, gI
0(t) = 6t(1 ¡t))
C1 : ˜f(t) = f (¯g1(t))
d¹
j=1
gI
1(tj),
g1(t) = t3
(10 ¡15t + 6t2
), gI
1(t) = 30t2
(1 ¡t)2
Sidi’s C1 : ˜f(t) = f ¯ψ2(t)
d¹
j=1
ψI
2(tj)
ψ2(t) = t ¡ 1
2π
sin(2πt) , ψI
2(t) = (1 ¡cos(2πt))
Sidi’s C2 : ˜f(t) = f ¯ψ3(t)
d¹
j=1
ψI
3(tj), ψ3(t) =
1
16
(8 ¡9 cos(πt) + cos(3πt)) ,
ψI
3(t) =
1
16
(9 sin(πt)π ¡sin(3πt)3π)
29/33
45. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
Keister integral
30/33
47. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
References I
Diaconis, P. 1988. Bayesian numerical analysis, Statistical decision theory and related topics iv,
papers from the 4th purdue symp., west lafayette/indiana 1986, pp. 163–175.
Dick, J. and F. Pillichshammer. 2010. Digital nets and sequences: Discrepancy theory and
quasi-Monte Carlo integration, Cambridge University Press, Cambridge.
Genz, A. 1993. Comparison of methods for the computation of multivariate normal probabilities,
Computing Science and Statistics 25, 400–405.
Glasserman, P. 2004. Monte Carlo methods in financial engineering, Applications of Mathematics,
vol. 53, Springer-Verlag, New York.
Hickernell, F. J. and H. Niederreiter. 2003. The existence of good extensible rank-1 lattices, J.
Complexity 19, 286–300.
Keister, B. D. 1996. Multidimensional quadrature algorithms, Computers in Physics 10, 119–122.
O’Hagan, A. 1991. Bayes-Hermite quadrature, J. Statist. Plann. Inference 29, 245–260.
Olver, F. W. J., D. W. Lozier, R. F. Boisvert, C. W. Clark, and A. B. O. Dalhuis. 2013. Digital library of
mathematical functions.
Rasmussen, C. E. 2003. Bayesian Monte Carlo, Advances in Neural Information Processing
Systems, pp. 489–496.
32/33
48. Intro Bayesian cubature Faster Kernel Experiments Conclusion Future work Appendix References
References II
Ritter, K. 2000. Average-case analysis of numerical problems, Lecture Notes in Mathematics,
vol. 1733, Springer-Verlag, Berlin.
33/33