2. SAMSI Fall,2018
✬
✫
✩
✪
Outline
• Motivation for emulation of simulators
• Design for simulator runs
• Some emulation strategies
• Gaussian process (GaSP) emulators
• Fitting GaSP emulators and the Robust R-package
• The differences between use of Gaussian processes in Spatial Statistics
and UQ
• Emulation in more complicated situations
– Functional emulators via Kronecker products
– Functional emulators via basis decomposition
– Functional emulators via parallel partial emulation
– Coupling emulators
2
3. SAMSI Fall,2018
✬
✫
✩
✪
Crucial assumption: I will be assuming the “black box” scenario, where all
we can do is run the simulator at various inputs; i.e., we do not have access
to the internal code.
3
4. SAMSI Fall,2018
✬
✫
✩
✪
Motivations for emulation (approximation)
of complex computer models
Simulators (complex computer models of processes) often take hours to
weeks for a single run. One often needs fast simulator approximations
(emulators, surrogate models, meta models, response surface
approximations, . . .) for Uncertainty Quantification analyses such as
• prediction of the simulator at unobserved values of the inputs
• optimization of the simulator over input values
• inverse problems (learning unknown simulator parameters from data)
• propagating uncertainty in inputs through the simulator
• data assimilation (predicting reality with a combination of simulator
output and observational data)
• assessing simulator bias and detecting suspect simulator components
• interfacing systems of simulators (or systems of simulators/stat models)
4
5. SAMSI Fall,2018
✬
✫
✩
✪
Statistical Design of Runs of the Simulator
(needed to create the emulator) McKay et al. (1979); Sacks et al. (1989); Welch
et al. (1992); Bates et al. (1996); Lim et al. (2002); Santner et al. (2003)
Notation: For these lectures, x ∈ X will denote the d-dimensional vector of
inputs to the simulator (computer model). These could be initial
conditions, control inputs, computer model parameters, ...
The simulator output is denoted by yM
(x).
Goal for design: Choose m points D = {x1, .., xm} at which the simulator
is to be evaluated, yielding yD
= (yM
(x1), . . . , yM
(xm))′
. From these, the
emulator (approximation) to the simulator will be constructed.
– Folklore says that m should be at least 10d although many more runs
are often needed (but often not available).
Criterion: In general, should be problem specific. General purpose criteria
involve finding “space-filling” designs.
5
6. SAMSI Fall,2018
✬
✫
✩
✪
Most common space filling design: Maximin Latin Hypercube Design
• A Latin Hypercube Design (LHD) is a design in a grid whereby each sample
is the only one in each axis-aligned hyperplane containing it.
• A maximin LHD is an LHD that maximizes mini=j δ(xi, xj), where δ(·, ·) is
a distance on X.
Figure 1: Left: 47 point maximin LHD; d = 6; 2d-projection;
Right: 47 point “0-Correlation” LHD; same 2d projection.
6
7. SAMSI Fall,2018
✬
✫
✩
✪
TITAN2D simulating pyroclastic flow on Montserrat (Bayarri et al. (2009);
Dalbey et al. (2012))
Inputs: Initial conditions: x1 = flow volume (V ); x2 = flow direction (ϕ);
Model parameters: x3 = basal friction (δbed); x4 = internal friction (δint).
7
8. SAMSI Fall,2018
✬
✫
✩
✪
Background of the application: The simulator, TITAN2D, for given inputs
V , ϕ, δbed, and δint, yields a description of the pyroclastic flow over a large
space-time grid of points. Each run of TITAN2D takes two hours.
Of primary interest is the maximum (over time) flow height,
yM
(V, ϕ, δbed, δint), at k spatial locations over the island.
• Flow heights yM
(V, ϕ, δbed, δint) > 1m are deemed to be catastrophic.
The analysis begins
• by choosing a Latin hypercube design to select m = 2048 design points in the
feasible input region X = [105
m3
, 109.5
m3
] × [0, 2π] × [5, 18] × [15, 35];
• running TITAN2D at these preliminary points, yielding the ‘data’
yD
=
yM
(x1)
yM
(x2)
...
yM
(xm)
=
(yM
1 (x1), yM
2 (x1), . . . yM
k (x1))
(yM
1 (x2), yM
2 (x2), . . . yM
k (x2))
...
(yM
1 (xm), yM
2 (xm), . . . yM
k (xm))
;
• constructing the emulator from yD
(in general, a matrix of size 2048 × 109
).
8
9. SAMSI Fall,2018
✬
✫
✩
✪
Adaptive design: (Aslett et al. (1998); Lam and Notz (2008); Ranjan et al.
(2008); Cumming and Goldstein (2009); Gramacy and Lee (2009); Loeppky et al.
(2010); Spiller et al. (2014)).
1 2 3 4 5 6
0
0.05
0.1
0.15
0.2
0.25
0.3
Initiation angle, (radians)
standarderror,(meters)
Figure 2: Standard error of the emulator of a function of interest in the volcano
problem. Red: original Latin hypercube design of 256 points. Blue: standard error
with 9 additional points chosen to maximize the resulting information.
9
10. SAMSI Fall,2018
✬
✫
✩
✪
Some Emulation Strategies
Recall the goal: We want to develop an emulator (approximation) of the
computer model that allows us to predict the computer model outcome
yM
(x∗
) at new inputs x∗
. This can be done by
• Regression: fine if the simulator output is smooth enough and
– one knows good basis functions to regress upon,
– or can find good basis functions by, e.g., PCA (or POD or EOD).
∗ This often works, but is often suboptimal (e.g., for TITAN2D).
• Polynomial chaos: statisticians question its effectiveness as a general
tool for emulation (it fails for TITAN2D and many other processes).
• Gaussian stochastic processes (GaSP’s), of a variety of types, typically
what are called separable GaSP’s.
• Combinations of the above (e.g. GaSP’s on coefficients of basis
expansions (Bayarri et al. (2007); Bowman and Woods (2016)).
10
11. SAMSI Fall,2018
✬
✫
✩
✪
An Aside: the Multivariate Normal Distribution
If Y = (Y1, Y2, . . . , Ym) has a multivariate normal distribution with mean
µ = (µ1, µ2, . . . , µm) and m × m positive definite covariance matrix Σ
having entries σi,j (notation: Y ∼ MV N(µ, Σ)), then
• each Yi is marginally normally distributed with
– mean E[Yi] = µi,
– variance E[(Yi − µi)2
] = σi,i;
• σi,j = E[(Yi − µi)(Yj − µj)] is called the covariance between Yi and Yj;
• ci,j =
σi,j
√
σi,iσj,j
is called the correlation between Yi and Yj.
11
12. SAMSI Fall,2018
✬
✫
✩
✪
GaSP Emulators
Model the real-valued for now simulator output yM
(x) as an unknown
function via a Gaussian stochastic process:
yM
(·) ∼ GaSP( µ(·), σ2
c (·, ·)),
with mean function µ(x), variance σ2
, and correlation function c (·, ·) if, for
any inputs {x1, . . . , xm} from X,
(yM
(x1), . . . , yM
(xm)) ∼ MV N (µ(x1), . . . , µ(xm)), σ2
C , (1)
where C is the correlation matrix with (i, j) element ci,j = c (xi, xj).
• This is a random distribution on functions of x.
• All we really need to know is the induced MV N distribution in (1) of
the function evaluated at a finite set of points.
12
13. SAMSI Fall,2018
✬
✫
✩
✪
Common is to choose the following forms for the mean and correlations:
• Model the unknown mean function via regression, as
µ(x) = Ψ(x) θ ≡
l
i=1
ψi(x)θi ,
Ψ(·) = (ψ1(·), . . . , ψl(·)) being a vector of specified basis functions and
the θi being unknown (e.g., µ(V, ϕ, δbed, δint) = θ1 + θ2V for TITAN2D).
• As the correlation function arising from the d-dimensional x, utilize the
separable power exponential family
c (x, x∗
) =
d
j=1 exp {−(|xj − xj
∗
|/γj)αj
};
– γj > 0 determines how fast the correlation decays to 0
– αj ∈ (0, 2] determines continuity, differentiability, . . .
∗ We set the αj = 1.9. (αj = 2 can have numerical problems.)
– the product form greatly speeds computation and allows stochastic
inputs to be handled easily.
13
14. SAMSI Fall,2018
✬
✫
✩
✪
Example: Suppose d = 2 and m = 2, so we have two designed inputs
x1 = (x11, x12) and x2 = (x21, x22). The correlation matrix for
(yM
(x1), yM
(x2)) is then
C =
1 e−[|x11−x21|/γ1]α1
e−[|x12−x22|/γ2]α2
e−[|x11−x21|/γ1]α1
e−[|x12−x22|/γ2]α2
1
.
• As, say, γ1 → ∞,
C →
1 e−[|x12−x22|/γ2]α2
e−[|x12−x22|/γ2]α2
1
.
Typically the emulator will then be constant in the first coordinate.
• If any γi → 0,
C →
1 0
0 1
,
which gives a terrible emulator.
14
15. SAMSI Fall,2018
✬
✫
✩
✪
• After obtaining the computer runs yD
at the design points, the
traditional strategy was to estimate the GaSP parameters by maximum
likelihood using this data, and then use the standard Kriging formula
for the emulator predictive mean and variance.
– Maximum likelihood (least squares fit) is not a good idea here; too
often the range parameters end up being 0 or ∞.
Figure 3: GaSP mean and 90% confidence bands fit by maximum likelihood to a
damped sine wave (the red curve) for m=10 (left) and m=9 (right) points.
15
16. SAMSI Fall,2018
✬
✫
✩
✪
Advantages of the GaSP Emulator
• It is an interpolator of the simulator values yM
(xi) at the design inputs
xi.
• It provides an assessment of the accuracy of the approximation, which
is quite reliable (in a conservative sense) when it is not crazy.
• The separable form properly allows very different fits to the various
inputs.
• The analysis stays within probability (Bayesian) calculus.
Disadvantages of the GaSP Emulator
• Maximum likelihood is very unreliable (fixable, as we will see).
• It requires inversion of an m × m matrix, requiring special techniques if
m is large (lots of research on this).
• It is a stationary process and, hence, not always suitable as an
emulator.
16
17. SAMSI Fall,2018
✬
✫
✩
✪
Improving on maximum likelihood (least squares)
estimation of the unknown GaSP parameters (Lopes (2011);
Ranjan and Karsten (2011); Roustant et al. (2012); Gu et al. (2016, 2018)).
Step 1. Deal with the crucial parameters (θ, σ2
) via a fully Bayesian
analysis, using the objective prior: π(θ, σ2
) = 1/σ2
.
• Alas, dealing with the correlation parameters by full Bayesian
methods is computationally intractable.
Step 2. Estimate γ = (γ1, . . . , γd) as the mode ˆγ of its marginal posterior
distribution arising
• by integrating out θ and σ2
with respect to their objective prior;
• multiplying the resulting integrated likelihood by the reference prior
for γ (the most popular objective Bayesian prior).
Step 3. The resulting GaSP emulator of yM
(x∗
) at new input x∗
is a
t-process with mean function and covariance function in closed form.
17
18. SAMSI Fall,2018
✬
✫
✩
✪
The GaSP emulator, yM
(x∗
), of yM
(x∗
) at new input x∗
is a t-distribution
yM
(x∗
) ∼ T ˆµ(x∗
) , ˆσ2
V (x∗
) , m − l ,
where
ˆµ(x∗
) = Ψ(x∗
)ˆθ + C(x∗
)C−1
(yD
− Ψˆθ)
ˆσ2
=
1
m − l
yD′
C−1
yD
− ˆθ
′
(Ψ′
C−1
Ψ)ˆθ
V (x∗
) = 1 − C(x∗
)C−1
C(x∗
)′
+ (Ψ(x∗
) − C(x∗
)C−1
Ψ)(Ψ′
C−1
Ψ)−1
(Ψ(x∗
) − C(x∗
)C−1
Ψ)′
,
with ˆθ = (Ψ′
C−1
Ψ)−1
Ψ′
C−1
yD
, Ψ = (Ψj(xi)), Ψ(x∗
) = (Ψ1(x∗
), . . . , Ψl(x∗
)),
and C(x∗
) = (c(x1, x∗
), . . . , c(xm, x∗
)).
• Not the usual Kriging formula because of use of posterior for (θ, σ2
).
• This is an interpolator of the simulator values yM
(xi) at the design inputs xi.
• It provides an assessment of the accuracy of the approximation, also
incorporating the uncertainty arising from estimating θ and σ2
.
• The only potential computational challenge is computing C−1
if m is very
large.
18
19. SAMSI Fall,2018
✬
✫
✩
✪
Figure 4: Mean of the emulator of TITAN2D, predicting ‘maximum flow
height’ at a location, as a function of flow volume and angle, for fixed δbed =
15 and δint = 27. Left: Plymouth, Right: Bramble Airport. Black points:
max-height simulator outputs at design points.
19
20. SAMSI Fall,2018
✬
✫
✩
✪
Details of Steps 1 and 3
• Note that
(yM
(x1), . . . , yM
(xm), yM
(x∗
)) ∼ MV N (µ(x1), . . . , µ(xm), µ(x∗
)), σ2
C∗
,
where
C∗
=
C C(x∗
)′
C(x∗
) 1
.
• Multiplying this by π(θ, σ2
) = 1/σ2
gives the joint density of
(yM
(x1), . . . , yM
(xm), yM
(x∗
), θ, σ2
)
• Compute the conditional density of (yM
(x∗
), θ, σ2
) given
(yM
(x1), . . . , yM
(xm)).
• Integrate out θ and σ2
to obtain the posterior predictive density
˜yM
(x∗
) of the target yM
(x∗
).
20
21. SAMSI Fall,2018
✬
✫
✩
✪
Details of Versions of Step 2
Version 1. Finding the Marginal Maximum Likelihood Estimate
(MMLE) of the correlation parameters γ = (γ1, . . . , γd):
• Starting with the likelihood L(θ, σ2
, γ) arising from
(yM
(x1), . . . , yM
(xm)) ∼ MV N (µ(x1), . . . , µ(xm)), σ2
C ,
integrate out θ and σ2
, using the objective prior π(θ, σ2
) = 1/σ2
,
obtaining the marginal likelihood for γ
L(γ) = L(θ, σ2
, γ)
1
σ2
dθ dσ2
∝ |C(γ)|− 1
2 |X′
C(γ)−1
X|− 1
2 (S2
(γ))−( n−p
2
)
,
where S2
(γ) = (Y − Xˆθ)′
C(γ)−1
(Y − Xˆθ) is the residual sum of squares
and ˆθ = (X′
C(γ)−1
X)−1
X′
C(γ)−1
Y is the least squares estimator of θ,.
• The MMLE estimate is that which maximizes L(γ).
21
22. SAMSI Fall,2018
✬
✫
✩
✪
Definition 0.1 (An Aside: Robust Estimation.) Estimation of the
correlation parameters in the GaSP is called robust, if the following two
situations do NOT happen:
(i) ˆC = 1n1T
n ,
(ii) ˆC = In,
(even approximately), where ˆC is the estimated correlation matrix.
When ˆC ≈ 1n1T
n , the correlation matrix is almost singular, leading to very
large computational errors in the GaSP predictive mean.
When ˆC ≈ In, the GaSP predictive mean will degenerate to the fitted
mean and impulse functions, as shown in the next figures.
22
23. SAMSI Fall,2018
✬
✫
✩
✪
0.0 0.2 0.4 0.6 0.8 1.0
−3−2−10123
x
y
0.0 0.2 0.4 0.6 0.8 1.0
−3−2−10123
x
y
Example of the problem when ˆC ≈ In: Emulation of the function
y = 3sin(5πx)x + cos(7πx), graphed as the black solid curves (overlapping the
green curves in the left panel). The n = 12 input function values are the black
circles. The left panel is for α = 1.9 and the right panel for α = 1, for the power
exponential correlation function.
• The blue curves give the emulator mean from the MLE approach;
• the red curves (overlapping with green on left) give the emulator mean from
the MMLE approach;
• the green curves give the emulator mean from the posterior mode approach.
23
24. SAMSI Fall,2018
✬
✫
✩
✪
Here are three common ways of parameterizing the range parameters in
power exponential correlation function:
cβl
(|xil − xjl|) = exp{−βl|xil − xjl|αl
},
cγl
(|xil − xjl|) = exp{−(|xil − xjl|/γl)αl
},
cξl
(|xil − xjl|) = exp {− exp(ξl)|xil − xjl|αl
} ,
for any l = 1, · · · , d.
Lemma 0.1 Robustness is lacking in either of the following two cases.
Case 1. If for all 1 ≤ l ≤ d, ˆβl = 0 (or ˆγl = ∞ or ˆξl = −∞ in the other
parameterizations), then ˆC = 1m1T
m.
Case 2. If any ˆβl = ∞ (equivalent to ˆγl = 0 or ˆξl = ∞), then ˆC = Im.
24
25. SAMSI Fall,2018
✬
✫
✩
✪
Version 2. Finding the Reference Posterior Mode (RPM) of the
correlation parameters:
The standard objective prior (the reference prior) for β (Paulo (2005)) is
πR
(β) ∝ |I⋆
(β)|1/2
where, with l being the dimension of θ and d the
dimension of β,
I⋆
(β) =
(m − l) trW1 trW2 · · · trWd
trW2
1 trW1W2 · · · trW1Wd
... · · ·
...
trW2
d
and
Wk =
∂C
∂βk
C(β)−1
[In − X(X′
C(β)−1
X)−1
X′
C(β)−1
] .
The posterior mode is then found by maximizing
• L(ψ−1
(β)) πR
(β) in the β parameterization, where β = ψ(γ);
• L(ψ−1
(exp(ξ))) πR
(exp(ξ)) exp( l ξl) in the ξ parameterization;
• L(γ) πR
(ψ(γ)) ψ′
(γ) in the γ parameterization.
25
26. SAMSI Fall,2018
✬
✫
✩
✪
1n is a column of X 1n is not a column of X
some βl → ∞ βl → 0 for all l some βl → ∞ βl → 0 for all l
Profile Lik O(1) O(γ
−α/2
(1) ) O(1) O(γ
−α/2
(1) )
Marginal Lik O(1) O(1) O(1) O((γ
−α/2
(1) ))
Post β, p = 1 O(exp(−βC)) O(1) O(β
1
2 exp(−βC)) O(β−1/2
)
p ≥ 2 O(
l∈E
exp(−βlCl)) O(β
−(p−1)
(p) ) O((
l∈E
βl)
1
2
p
l=1
exp(−βl)Cl) O(β
−(p−1/2)
(p) )
Post γ, p = 1 O(exp(−C/γα)
γ(α+1) ) O(γ−α−1
) O(exp(−C/γα)
γ(α/2+1) ) O(γ−α/2−1
)
p ≥ 2 O(
l∈E
exp(−Cl/γα
l )
γ
(α+1)
l
) O(
p
l=1
γ−α−1
l
γ
(1−p)α
(1)
) O(
l∈E
exp(−Cl/γα
l )
γ
(α/2+1)
l
) O(
p
l=1
γ−α−1
l
γ
(1/2−p)α
(1)
)
Post ξ, p = 1 O(exp(− exp(ξ)C + ξ)) O(exp(ξ)) O(exp(− exp(ξ)C + 3
2
ξ)) O(exp(ξ/2))
p ≥ 2 O(
l∈E
exp(− exp(ξl)Cl + ξl)) O(
exp(
p−1
l=1
ξl)
exp((p−2)ξ(p))
) O(
l∈E
exp(− exp(ξl)Cl) + 3
2
ξl) O(
exp(
p−1
i=1
ξl)
exp((p−1/2)ξ(p))
)
Tail behaviors of the profile likelihood (insert the MLE’s of θ and σ2
), the
marginal likelihood and the posterior distributions for different parameterizations
of the power exponential correlation function, using the reference prior.
• Blue gives the cases where the tail behavior is constant, so that there is
danger of non-robustness (the mle could be at ∞).
• Red gives the non-robust cases where the posterior goes to infinity in the tail.
• Thus use the posterior mode for either the γ or ξ parameterizations.
26
27. SAMSI Fall,2018
✬
✫
✩
✪
Another Improvement
One of the most frequently used Mat´ern correlation functions is
cl(dl) = 1 +
√
5dl
γl
+
5d2
l
3γ2
l
exp −
√
5dl
γl
,
where dl stands for any of the |xil − xjl|. Denoting ˜dl = dl/γl, the
following properties can be established.
• When ˜dl → 0, cl( ˜dl) ≈ 1 − C ˜d2
l with C > 0 being a constant. This thus
behaves similarly to exp(− ˜d2
l ) ≈ 1 − ˜d2
l , which corresponds to the power
exponential correlation with αl = 2, and thus has similar smoothness near
design points.
• When ˜dl → ∞, the dominant part of cl( ˜dl) is exp −
√
5 ˜dl which matches
the power exponential correlation with αl = 1. Thus the Mat´ern correlation
prevents the correlation from decreasing quickly with distance, as does the
Gaussian correlation. This can be of benefit in emulation since some inputs
may have almost no effect on the computer model, corresponding to near
constant correlations for distant inputs.
27
28. SAMSI Fall,2018
✬
✫
✩
✪
We test the following five functions:
i. 1 dimensional Higdon function,
Y = sin(2πX/10) + 0.2 sin(2πX/2.5), where X ∈ [0, 10].
ii. 2 dimensional Lim function,
Y = 1
6 [(30 + 5X1 sin(5X1))(4 + exp(−5X2)) − 100] + ǫ, where
Xi ∈ [0, 1], for i = 1, 2.
iii. 3 dimensional Pepelyshev function,
Y = 4(X1 − 2 + 8X2 − 8X2
2 )2
+ (3 − 4X2)2
+ 16
√
X3 + 1(2X3 − 1)2
,
where Xi ∈ [0, 1], for i = 1, 2, 3.
iv. 4 dimensional Park function,
Y = 2
3 exp(X1 + X2) − X4 sin(X3) + X3, where Xi ∈ [0, 1), for
i = 1, 2, 3, 4.
v. 5 dimensional Friedman function from,
Y = 10 sin(πX1X2) + 20(X3 − 0.5)2
+ 10X4 + 5X5, where Xi ∈ [0, 1],
for i = 1, 2, 3, 4, 5.
28
29. SAMSI Fall,2018
✬
✫
✩
✪
Robust GaSP ξ Robust GaSP γ MLE DiceKriging
1-dim Higdon .00011 .00012 .00013 .00013
2-dim Lim .0064 .0080 .021 .0083
3-dim Pepelyshev .083 .15 3.5 .79
4-dim Park .00011 .00011 .033 .00063
5-dim Friedman .026 .038 4.7 .44
Table 1: Average MSE of the four estimation procedures for the five exper-
imental functions. The sample size is n = 20 for the Higdon function and
n = 10p for the others. Designs are generated by maxmin LHD.
This suggests that optimal is to use the posterior mode in the ξ
parameterization, with the Matern correlation function.
29
31. SAMSI Fall,2018
✬
✫
✩
✪
The Jointly Robust Prior
Evaluation of the reference prior (especially computation of the needed
derivatives) and, hence, determination of the posterior mode can be
somewhat costly if p is large.
An approximation to the reference prior that has the same tail behavior in
terms of robustness is
πJR
(β) = (
p
l=1
Clβl)0.2
exp(−b
p
l=1
Clβl) ,
where b = n−1/p
(a + p) and Cl equals the mean of |xD
il − xD
jl|, for
1 ≤ i, j ≤ n, i = j.
31
32. SAMSI Fall,2018
✬
✫
✩
✪
Software
All the above methodology has been implemented in
RobustGaSP
Robust Gaussian Stochastic Process Emulation in R
by Mengyang Gu and Jesus Palomo
• There are many choices of the correlation function (Matern is the
default).
• Choice between the reference prior and its jointly robust approximation
(the approximation is the default).
• Inclusion of a nugget is possible, with the resulting prior changed
appropriately.
• There is also the capability of identifying and removing ‘inert’ inputs.
32
33. SAMSI Fall,2018
✬
✫
✩
✪
Differences between use of Gaussian processes in
Spatial Statistics and UQ
• Spatial statistics typically has only d = 2 or d = 3.
• Typically there are many fewer ‘observations’ in UQ, exacerbated by
the larger d.
– This makes estimation of the range parameters γ much more
difficult than in spatial statistics.
– But, luckily, the UQ design points are typically spread out, making
the α correlation parameters less important, compared with their
importance in spatial statistics.
• Spatial processes are often (though not always) smoother than UQ
processes.
• Instead of the product correlation structure, spatial statistics often uses
correlations such as
c(x, y) = e{−|x−y|α
/γ}
.
33
34. SAMSI Fall,2018
✬
✫
✩
✪
Emulation in more complicated situations
• Functional emulators via Kronecker products
• Functional emulators via basis decomposition
• Functional emulators via parallel partial emulation
• Coupling emulators
34
35. SAMSI Fall,2018
✬
✫
✩
✪
Functional emulators via Kronecker products
Example: A Vehicle Crash Model. Collision of a vehicle with a barrier
is implemented as a non-linear dynamic analysis code using a finite element
representation of the vehicle. The focus is on velocity changes of the
driver’s head, as in the following 30mph crash.
There are d = 2 inputs to the simulator: crash barrier type B and crash
vehicle velocity v.
35
36. SAMSI Fall,2018
✬
✫
✩
✪
Obvious approach – Discretize: Sample the m output functions from
the simulator (arising from the design on input space) at a discrete number
nT of time points and use t as just another input to the emulator.
• Only feasible if the functions are fairly regular.
• There are now m × nT inputs, so computing C−1
might be untenable.
– But Kronecker products come to the rescue.
36
37. SAMSI Fall,2018
✬
✫
✩
✪
Example: Suppose m = 2 and the original design inputs were x1 and x2. Also
suppose we discretize at t1 and t2. Then there are four modified inputs
{(x1, t1), (x1, t2), (x2, t1), (x2, t2)} with correlation matrix (assuming the product
exponential form and setting the correlation parameters to 1 for simplicity)
1 e−|t1−t2|
e−|x1−x2|
e−|x1−x2|
e−|t1−t2|
e−|t1−t2|
1 e−|x1−x2|
e−|t1−t2|
e−|x1−x2|
e−|x1−x2|
e−|x1−x2|
e−|t1−t2|
1 e−|t1−t2|
e−|x1−x2|
e−|t1−t2|
e−|x1−x2|
e−|t1−t2|
1
=
1 ×
1 e−|t1−t2|
e−|t1−t2|
1
e−|x1−x2|
×
1 e−|t1−t2|
e−|t1−t2|
1
e−|x1−x2|
×
1 e−|t1−t2|
e−|t1−t2|
1
1 ×
1 e−|t1−t2|
e−|t1−t2|
1
=
1 e−|x1−x2|
e−|x1−x2|
1
⊗
1 e−|t1−t2|
e−|t1−t2|
1
where ⊗ denotes the Kronecker product of the two matrices.
37
38. SAMSI Fall,2018
✬
✫
✩
✪
In general, if CD is the correlation matrix arising from the original
designed input to the GaSP, and CT is the correlation matrix that arises
from the discretization of time, then using the combined input results in
the overall correlation matrix
C = CD ⊗ CT .
The wonderful thing about Kronecker products is
C−1
= (CD ⊗ CT )−1
= C−1
D ⊗ C−1
T
|C| = |CD ⊗ CT | = |CD|nT
|CT |m
.
38
39. SAMSI Fall,2018
✬
✫
✩
✪
Functional emulators via basis decomposition
Example: Consider a vehicle being driven over a road with two major
potholes.
– x = (x1, . . . , x7) is the vector of key vehicle characteristics;
– yR
(x; t) is the time-history curve of resulting forces.
A finite element PDE computer model of the vehicle being driven over the
road
– depends on x = (x1, . . . , x7) and unknown calibration parameters
u = (u1, u2);
– yields time-history force curve yM
(x, u; t).
39
40. SAMSI Fall,2018
✬
✫
✩
✪
Parameter Type (label) Uncertainty
Damping 1 (force dissipation) Calibration (u1) 15%
Damping 2 (force dissipation) Calibration (u2) 15%
Bushing Stiffness (Voided) Unmeasured (x1) 15%
Bushing Stiffness (Non-Voided) Unmeasured (x2) 10%
Front rebound travel until Contact Unmeasured (x3) 5%
Front rebound bumper stiffness Unmeasured (x4) 8%
Sprung Mass Unmeasured (x5) 5%
Unsprung Mass Unmeasured (x6) 12%
Body Pitch Inertia Unmeasured (x7) 12%
Table 2: I/U Map: Vehicle characteristics (‘unmeasured’) and model cali-
bration inputs, and their prior uncertainty ranges.
40
41. SAMSI Fall,2018
✬
✫
✩
✪
Field Data: Seven runs of a given test vehicle over the same road
containing two potholes.
Denote the r-th field time-history curve by yF
r (x∗
; t), r = 1, . . . , 7,
where x∗
= (x∗
1, . . . , x∗
7) refers to the unknown vehicle characteristics of
the given test vehicle.
Model Data: The computer model of the vehicle was ‘run over the
potholes’ at 65 input values of z = (x, u) = (x1, . . . , x7, u1, u2); let
zr = (xr, ur), r = 1, . . . , 65, denote the corresponding parameter
vectors, which were chosen by a Latin-hypercube design over the input
uncertainty ranges.
Let yM
(zr; t) denote the rth computer model time-history curve,
r = 1, 2, . . . , 65.
41
43. SAMSI Fall,2018
✬
✫
✩
✪
Wavelet Representation of the Curves:
We view yF
r (x∗
; t), yM
(zr; t), and yR
(x∗
; t) as random functions, and must
ultimately perform a Bayesian analysis with these random functions.
To do this, we used a wavelet representation of the functions as follows.
• Each curve was replaced with its values on a dyadic grid with
212
= 4, 112 points, so that the number of resolution levels associated
with the wavelet decomposition is L = 13 (including the mean of each
curve as being at level 0.)
• The R wavethresh package was used to obtain the decomposition, with
thresholding of coefficients at the fourth and higher levels whose
absolute value was below the 0.975 percentile of the absolute values of
the wavelet coefficients in the level.
• The union, over all curves (field and model), of wavelet basis elements
was taken as the final basis, yielding a total of 289 basis elements,
ψi(t) , i = 1, . . . , 289.
43
44. SAMSI Fall,2018
✬
✫
✩
✪
Thus the 65 model response curves and 7 field response curves are
represented as
yM
(zj; t) =
289
i=1
wM
i (zj)ψi(t), j = 1, . . . , 65,
yF
r (x∗
; t) =
289
i=1
wF
ir(x∗
)ψi(t), r = 1, . . . , 7,
where the wM
i (zj) and wF
ir(x∗
) are the coefficients computed through the
wavelet decomposition.
44
45. SAMSI Fall,2018
✬
✫
✩
✪
7 8 9 10
00.51.0
meters
Tension
First Field Run:Ch45:Pothole 1
7 8 9 10
00.51.0
meters
Tension
First Field Run Reconstructed with wavelets:Ch45:Pothole 1
37 38 39 40 41
00.51.0
meters
Tension
First Field Run:Ch45:Pothole 2
37 38 39 40 41
00.51.0
meters
Tension
First Field Run Reconstructed with wavelets:Ch45:Pothole 2
Figure 6: The accuracy of the wavelet decomposition.
45
46. SAMSI Fall,2018
✬
✫
✩
✪
GASP Approximation to each of the Model Wavelet Coefficient
Functions wM
i (z):
Formally (and dropping the index i for convenience)
wM
(·) ∼ GASP( µ, 1
λM cM
(·, ·) ,)
with the usual separable correlation function, the parameters αj and βj
being estimated by maximum likelihood (recall there are 289 pairs of
them), as well as µ and λM
.
The posterior of the ith
coefficient, given GASP parameters and model-run
data, at a new input z∗
, is
˜wM
i (z∗
) ∼ N(ˆµi(z∗
), ˆVi(z∗
)) ,
where ˆµi(z∗
) and ˆVi(z∗
) are given by the usual Kriging expressions.
Thus the overall emulator is (assuming that the ˜wM
i are independent)
˜yM
(z∗
; t) ∼ N
289
i=1
ˆµi(z∗
)ψi(t),
289
i=1
ˆVi(z∗
)ψ2
i (t) .
46
47. SAMSI Fall,2018
✬
✫
✩
✪
Bayesian comparison of model and field data: calibration/tuning;
estimation of bias; and development of tolerance bands for
prediction
For the ith wavelet coefficient, the computer model is related to reality via
wR
i (x∗
) = wM
i (x∗
, u∗
) + bi(x∗
),
where (x∗
, u∗
) are the true (but unknown) values of the field vehicle inputs
and model calibration parameters, respectively. Hence bi(x∗
) is here just an
unknown constant.
(As usual, there is inevitable confounding between u∗
and the bi, but prediction
is little affected.)
47
48. SAMSI Fall,2018
✬
✫
✩
✪
The replicated field data is modeled as
wF
ir(x∗
) = wR
i (x∗
) + ǫir
= wM
i (x∗
, u∗
) + bi(x∗
) + ǫir, r = 1, . . . , 7,
where the ǫir are i.i.d. N(0, σ2
i ) random errors.
The sufficient statistics, ¯wF
i (x∗
) = 1
7
7
r=1 wF
ir(x∗
) and
S2
i (x∗
) =
7
r=1 (wF
ir(x∗
) − ¯wF
i (x∗
))2
, then have distributions
¯wF
i (x∗
) ∼ N wM
i (x∗
, u∗
) + bi(x∗
),
σ2
i
7
S2
i (x∗
)/σ2
i ∼ Chi-Square (6) ,
which we assume to be independent across i.
48
49. SAMSI Fall,2018
✬
✫
✩
✪
Prior Distributions: π(b, x∗
, u∗
, σ2
, τ2
) =
π(b | x∗
, u∗
, σ2
, τ2
)π(τ2
| x∗
, u∗
, σ2
)π(σ2
| x∗
, u∗
)π(x∗
, u∗
),
where b,σ2
and τ2
refer to the vectors of bi,σ2
i and τ2
j ; and, with ¯σ2
j
denoting the average of the σ2
i at wavelet level j,
π(b | x∗
, u∗
, σ2
, τ2
) =
289
i=1
N(bi | 0, τ2
j ) ,
(We did also try Cauchy and mixture priors, to some benefit.)
π(τ2
| x∗
, u∗
, σ2
) ∝
12
j=0
1
τ2
j + ¯σ2
j /7
,
π(σ2
| x∗
, u∗
, ) ∝
289
i=1
1
σ2
i
.
49
50. SAMSI Fall,2018
✬
✫
✩
✪
Finally, the I/U map is translated into
π(x∗
, u∗
) =
2
i=1
p(u∗
i )
7
i=1
p(x∗
i ),
p(u∗
i ) = Uniform(u∗
i | 0.125, 0.875), i = 1, 2,
p(x∗
i ) ∝ N(x∗
i | 0.5, 0.11112
)I(0.1667,0.8333)(x∗
i ), i = 1, 2, 3
p(x∗
4) ∝ N(x∗
4 | 0.5, 0.06412
)I(0.3077,0.6923)(x∗
4),
p(x∗
i ) ∝ N(x∗
i | 0.5, 0.11762
)I(0.1471,0.8529)(x∗
i ), i = 5, 6,
p(x∗
7) ∝ N(x∗
7 | 0.5, 0.10262
)I(0.1923,0.8077)(x∗
7),
where IA(u) is 1 if u ∈ A and 0 otherwise.
50
51. SAMSI Fall,2018
✬
✫
✩
✪
Posterior Distribution: Denote the available data:
D = { ¯wF
i , S2
i , ˆµi(·), ˆVi(·) : i = 1, . . . , 289}. The posterior distribution of
b,x∗
, u∗
, σ2
, τ2
and wM∗
≡ wM
(x∗
, u∗
) can be expressed as
π(wM∗
, b, x∗
, u∗
, σ2
, τ2
| D) = π(wM∗
| b, x∗
, u∗
, σ2
, τ2
, D)
× π(b | x∗
, u∗
, σ2
, τ2
, D)π(x∗
, u∗
, σ2
, τ2
| D),
where π(wM∗
| b, x∗
, u∗
, σ2
, τ2
, D) =
289
i=1
N(wM∗
i | ˜µi, ˜Vi),
˜µi =
ˆVi(x∗
,u∗
)
ˆVi(x∗,u∗)+σ2
i /7
( ¯wF
i − bi) +
σ2
i /7
ˆVi(x∗,u∗)+σ2
i /7
ˆµi(x∗
, u∗
) ,
˜Vi =
ˆVi(x∗,u∗)σ2
i /7
ˆVi(x∗,u∗)+σ2
i /7
;
π(b | x∗
, u∗
, σ2
, τ2
, D) =
289
i=1
N(bi | µ∗
i , V ∗
i ),
µ∗
i =
τ2
j ( ¯wF
i − ˆµi(x∗
, u∗
))
τ2
j + ˆVi(x∗, u∗) + σ2
i /7
, V ∗
i =
τ2
j ( ˆVi(x∗
, u∗
) + σ2
i /7)
τ2
j + ˆVi(x∗, u∗) + σ2
i /7
;
51
53. SAMSI Fall,2018
✬
✫
✩
✪
Computation: A Metropolis-Hastings-Gibbs Sampling scheme was used,
the Metropolis step being needed to sample from π(x∗
, u∗
, σ2
, τ2
| D). The
proposal used was
• for the σ2
i : Inverse Gamma distributions with shape 3 and scales 2/S2
i ;
• for the τ2
j : local moves ∝ 1/τ2
j in (0.5 τ
2(old)
j , 2 τ
2(old)
j );
• for the x∗
and u∗
, a mixture of prior and local moves:
gu(z) =
9
i=1
{ 0.5 Unif(zi | Ti) + 0.5 Unif(zi | T∗
i )} ,
where Ti = (ai, bi) is the support of each prior and
T∗
i = (max{ai, z
(old)
i − 0.05}, min{bi, z
(old)
i + 0.05}).
53
54. SAMSI Fall,2018
✬
✫
✩
✪
Computation was initially done by a standard Markov Chain Monte
Carlo analysis:
• Closed form full conditionals are available for b, and for the emulator
wavelet coefficients wM∗
≡ wM
(x∗
, u∗
).
• Metropolis-Hastings steps were used for (x∗
, u∗
, σ2
, τ2
); efficient
proposal distributions were available, so all seemed fine.
Shock: The original computation failed and could not be fixed using
traditional methods; the answers were also ‘wrong’!
• Problem: Some of the σ2
i (variances corresponding to certain wavelet
coefficients of the field data) got ‘stuck’ at very large values, with the
effect that the corresponding biases were estimated as near zero.
• Likely Cause: Modeling the bi as hierarchically normally distributed;
biases for many wavelet coefficients can be expected to be small, but
some are likely to be large.
54
55. SAMSI Fall,2018
✬
✫
✩
✪
Ideal solution: Improve the hierarchical models; for instance, one could
consider use of more robust models (e.g. Cauchy models) for the bi.
Pragmatic solution: Cheat computationally, and only allow generation
of the σ2
i from the replicate information (i.e., from the
s2
i =
7
r=1(wF
ir(x∗
) − ¯wF
i )2
), not allowing transference of information from
the bi to the σ2
i .
• We call such cheating modularization, the idea being to not always
allow Bayesian updating to flow both ways between modules
(components) of a complex model.
• Another name given to this idea is cutting feedback (Best, Spiegelhalter,
...); related notions are inconsistent dependency networks (Heckerman);
and inconsistent Gibbs for missing data problems (Gelman and others).
• In the road load analysis, the modularization approach gives very
similar answers to the improved modeling approach.
55
56. SAMSI Fall,2018
✬
✫
✩
✪
Inference: Bias estimates, predictions, and associated accuracy
statements can all be constructed from the posterior sample
{(wM∗
)(h)
, b(h)
, x∗(h)
, u∗(h)
, (σ2
)(h)
}, h = 1, . . . , N ,
and an auxiliary sample ǫ(h)
, h = 1, . . . , N, from a multivariate normal
distribution with zero mean and diagonal covariance matrix Diag(σ2
)(h)
.
• The posterior sample of bias curves is
b(h)
(d) =
289
i=1
b
(h)
i ψi(t), h = 1, . . . , N.
• The posterior sample of bias-corrected predictions of reality is
(yR
)(h)
(d) =
289
i=1
(wM∗
i )(h)
+ b
(h)
i ψi(t), h = 1, . . . , N.
• The posterior sample of individual (field) bias-corrected prediction curves is
(yF
)(h)
(d) =
289
i=1
(wM∗
i )(h)
+ b
(h)
i + ǫ
(h)
i ψi(t), h = 1, . . . , N.
56
59. SAMSI Fall,2018
✬
✫
✩
✪
7 8 9 10
00.51.0
PX: Ch45: Region 1: Bias Corrected Prediction: Individual Curve
Distance (m)
Tension(N)
Model Data
Field Data
Model Prediction
90% Tolerance bounds
59
60. SAMSI Fall,2018
✬
✫
✩
✪
7 8 9 10
05.01.0
PX: Ch45: Region 1: Nominal Model Prediction: Individual Curve
Distance (m)
Tension(N)
Model Data
Field Data
Model Prediction
90% Tolerance bounds
60
61. SAMSI Fall,2018
✬
✫
✩
✪
7 8 9 10
00.51.01.52.02.53.0
PX: Ch45: Region 1: Bias Corrected Prediction: Individual Curve
Distance (m)
Tension(N)
Field Data
Model Prediction
90% Tolerance bounds
7 8 9 10
−1.5−1.0−0.500.51.0
PX: Ch60: Region 1: Bias Corrected Prediction: Individual Curve
Distance (m)
Tension(N)
Field Data
Model Prediction
90% Tolerance bounds
37 38 39 40 41
00.51.01.52.02.53.0
PX: Ch45: Region 2: Bias Corrected Prediction: Individual Curve
Distance (m)
Tension(N)
Field Data
Model Prediction
90% Tolerance bounds
37 38 39 40 41
−1.5−1.0−0.500.51.0
PX: Ch60: Region 2: Bias Corrected Prediction: Individual Curve
Distance (m)
Tension(N)
Field Data
Model Prediction
90% Tolerance bounds
Figure 10: Multiplicative extrapolation of bias to Vehicle B.
61
62. SAMSI Fall,2018
✬
✫
✩
✪
Functional emulators via Parallel Partial emulation
Example: In the pyroclastic flow example, the full output of TITAN2D is
yM
(x) = (yM
1 (x), yM
2 (x), . . . , yM
k (x)) where each yM
i (x) is the pyroclastic
flow heights (and speed and direction) at the k space-time grid points on
which TITAN2D is run. This is a huge (discretized) function, with k as
large as 109
. One realization of the function, only looking at maximum flow
height at 24,000 spatial locations, looks like this:
62
63. SAMSI Fall,2018
✬
✫
✩
✪
Movie Time
Determination of 1m contours of maximum flow over time at k = 23, 040
spatial locations, using m = 50 simulator runs at various inputs to develop
the emulator.
63
64. SAMSI Fall,2018
✬
✫
✩
✪
The Big Issue: This wildly varying function varies even more wildly over
the inputs, so it is virtually unimaginable to capture it with any of the
previous methods, or any previous emulator method (stat or math).
So we have to trust to the magic of GaSP’s and hope to get lucky (you
can’t force UQ)!
Run the simulator at xD
= {x1, . . . , xm}, yielding outputs
yD
= (yM
(x1)′
, . . . , yM
(xm)′
)′
(a matrix of size up to 2048 × 109
).
The simplest imaginable GaSP for the k-dimensional yM
(x):
An independent GaSP is assigned to each coordinate yM
j (x), with
• prior mean functions of the regression form Ψ(x)θj, where Ψ(x) is a
common l-vector of given basis functions and the θj are differing
unknown regression coefficients;
• differing unknown prior variances σ2
j ;
• common estimated correlation parameters ˆγ (discussed later).
64
65. SAMSI Fall,2018
✬
✫
✩
✪
The mean function of the posterior GaSP for yM
j (x∗
) at new input x∗
is
ˆµj(x∗
) = Ψ(x∗
)ˆθj + C(x∗
)C−1
(yD
j − Ψˆθj) ,
where yD
j is the jth
column of yD
and ˆθj = (Ψ′
C−1
Ψ)−1
Ψ′
C−1
yD
j , with
Ψ being the earlier specified m × l design matrix, C being the earlier
specified m × m correlation matrix, Ψ(x∗
) = (Ψ1(x∗
), . . . , Ψl(x∗
)), and
C(x∗
) = (c(x1, x∗
), . . . , c(xm, x∗
)).
This can be rewritten
ˆµj(x∗
) =
m
i=1
hi(x∗
)yD
ij ,
where hi(x∗
) is the ith
element of the m-vector
h(x∗
) = (Ψ(x∗
) − C(x∗
)C−1
Ψ)(Ψ′
C−1
Ψ)−1
Ψ′
C−1
+ C(x∗
)C−1
.
As Ψ and C (and the functions of them) can be pre-computed, computing
h(x∗
) (at a new x∗
) requires roughly m2
numerical operations.
65
66. SAMSI Fall,2018
✬
✫
✩
✪
Finally, we can write the complete parallel partial posterior (PP) mean
vector (the emulator of the full simulator output at a new input x∗
) as
ˆµ(x∗
) = (ˆµ1(x∗
), . . . , ˆµk(x∗
)) = h(x∗
)yD
.
• The overall computational cost is just O(mk) when k >> m.
– It is crucially important to have differing θj and σ2
j at each coordinate,
but this comes with essentially no computational cost.
• Computation of all the PP emulator variances is O(m2
k), but one
rarely needs to compute all of them.
• The emulator is an interpolator so, when x∗
equals one of the runs xi,
the emulator will return the exact values from the computer run.
• As the emulator mean is just a weighted average of the actual simulator
runs, it hopefully captures some of the dynamics of the process.
66
67. SAMSI Fall,2018
✬
✫
✩
✪
What happens if the assumptions are relaxed?
• If different coordinates are allowed different bases, the cost goes up to
O([m2
l + l3
]k). (Recall the cost of the PP emulator was O(mk)).
– For TITAN2D, m ≈ 2000, l = 4, and k ≈ 109
⇒ O(1016
) computations,
compared to O(1012
) for the PP emulator.
• If the correlation parameters, γj, are allowed to vary at each
coordinate, the computational cost would be O(n m3
k) because there
would be differing m × m correlation matrices Cj at each coordinate
and the inversion of Cj would need to be done n times, in order to
estimate γj.
– For TITAN2D, n ≈ 150, m ≈ 2000, k ≈ 109
⇒ O(1021
) computations.
• In either case the emulator would still be an interpolator, but would no
longer be a weighted average of the simulator runs.
67
68. SAMSI Fall,2018
✬
✫
✩
✪
Figure 11: The mean of the emulator of ‘maximum flow height over time’ from
TITAN2D, at 24,000 spatial locations over Montserrat and for new input values
V = 107.462
, ϕ = 2.827, δbed = 11.111, and δint = 27.7373.
68
69. SAMSI Fall,2018
✬
✫
✩
✪
Figure 12: Variance of the emulator of ‘maximum flow height over time’ from
TITAN2D, at 24,000 spatial locations over Montserrat and for new input values
V = 107.462
, ϕ = 2.827, δbed = 11.111, and δint = 27.7373.
69
70. SAMSI Fall,2018
✬
✫
✩
✪
Movie Time
Determination of 1m contours of maximum flow over time at k = 23, 040
spatial locations, using m = 50 simulator runs at various inputs to develop
the emulator.
70
71. SAMSI Fall,2018
✬
✫
✩
✪
The spatial ‘elephant in the room’
Is the key (and clearly invalid) assumption that simulator output
values at all coordinates (e.g., space-time locations) are independent.
The usual attempted solution: Introduce a second spatial process over the
output coordinates of yM
(x) = (yM
1 (x), yM
2 (x), . . . , yM
k (x)) to reflect the
clear dependence. Usual assumptions on this process:
• It is also a Gaussian process, with correlation function λ(i, j), leading to the
k × k correlation matrix Λ.
• Because k is huge, the process must be chosen so that Λ is sparse (e.g., only
allow correlation with nearby points), to allow for the needed inversions of Λ.
• Separability with the GaSP over the process input space is assumed, so that
the covariance matrix of the joint Gaussian process is (letting σ denote the
diagonal matrix of coordinate standard deviations)
Σ = σΛσ ⊗ C , and thus Σ−1
= σ−1
Λ−1
σ−1
⊗ C−1
.
The problem: It is difficult to add plausible spatial structure while keeping
the computation manageable when k is huge.
71
72. SAMSI Fall,2018
✬
✫
✩
✪
The Surprise: The spatial elephant can (mostly) be ignored, as the PP
emulator will give essentially the same answers. Indeed, for any spatial
structure Λ, the following can be shown:
• The emulator mean of yM
(x∗
) = (yM
1 (x∗
), . . . , yM
k (x∗
)), at a new input
x∗
, is exactly the same as the PP emulator mean. (Intuition: it does not
matter that the yM
i (x∗
) are spatially related, as they are all unknown.)
• The emulator variance at coordinate j is still ˆσ2
j Vj(x∗
), with only ˆσ2
j
depending on the spatial structure, and only in a minor way; thus one
can just use the (slightly conservative) PP emulator variance.
The remaining little elephant: If one actually needs random draws from the
emulator, the PP emulator’s draws will be too rough (because each
coordinate is independent), which might be harmful in some applications.
• A relatively simple fix to obtain smoother draws is to divide the grid into
squares of moderate size s (e.g., s = 4), have the squares be independent, but
allow a dependent spatial process in each square.
• If Λ in each square is assigned the objective prior π(Λ) = |Λ|−s
, the mean
and variance of the emulator will then be the same as the PP emulator.
72
73. SAMSI Fall,2018
✬
✫
✩
✪
Additional concerns with the assumptions for the PP emulator:
• The likelihood from which the correlation parameters γ are estimated
might be bad because of the assumption of independence of
coordinates.
– In practice, use of a joint spatial process seems to give worse results,
because of considerable numerical instabilities in the likelihood.
– Also, the estimates of γ should primarily be driven by the varying
simulator output over the inputs xi at each fixed location.
– The likelihood is almost certainly too concentrated but, as we are
only using it to obtain plug-in estimates, this is not a major concern.
• Assuming common values of the correlation parameters γ at all
coordinates is potentially problematical, as the simulator may have
very different levels of smoothness in different regions of input space.
– One could utilize different γ in a few different regions, with minimal
additional cost, as in Gramacy and Lee (2008).
– Simulations (see later) indicate this is not a problem for TITAN2D.
73
74. SAMSI Fall,2018
✬
✫
✩
✪
Introduction of a nugget
Often certain inputs have very little effect on the simulator output, and
emulators that ignore that input can do better at prediction. But, for
deterministic simulators, one must then introduce a ‘nugget’ (i.i.d.
Gaussian errors) in the GaSP model.
We simply let the correlation matrix be C + ξI, renormalized to be a
correlation matrix, with ξ unknown. The computations are then only
slightly more complicated.
Example: In TITAN2D, δint has only a minor effect, so we will investigate
• the full 4 input emulator,
• the 3 input emulator with δint removed and a nugget inserted.
74
75. SAMSI Fall,2018
✬
✫
✩
✪
Emulator: PP GaSP PP GaSP MS GaSP LMC GaSP
Parameter estimation: robust est. robust est. robust est. DiceKriging
4 inputs 3 inputs and an estimated nugget
Mean Square Error 0.109 0.097 0.103 0.137
95% CI Coverage 0.926 0.950 0.924 0.909
95% CI Length 0.521 0.536 0.491 0.478
time (s) using R 50.0 28.1 31337.7 3407.6
Table 3: Performance of various emulators, developed from 50 simulator runs, of
max flow height over all spatial locations except the crater and non-flow areas.
• The first emulator uses all 4 inputs while the remaining three emulators use
3 inputs (V, δbed, φ) and a nugget, all with the same regressor h(x) = (1, V ).
• The LMC emulator uses coregionalization with SVD output decomposition.
• Evaluations based on n∗
= 633 held-out inputs over k = 17, 311 locations.
• The last row shows the computational times of the emulators, using R.
75
76. SAMSI Fall,2018
✬
✫
✩
✪
Determining Hazard Probabilities
Goal: Determine PH,T (k), the probability, at location k, that the maximum
pyroclastic flow height exceeds H over the next T years.
Implementation:
• Perform statistical analysis of historical data to determine the posterior
distribution of the simulator inputs (V, δbed, φ).
• Draw 100,000 samples from this posterior, and evaluate the emulator
at these inputs to estimate the distribution Fk of maximum flow
heights at each location k.
• Assuming pyroclastic flows follow a stationary Poisson process, an
exact expression can be given, in terms of the Fk, for the probability
distribution of maximum flow heights over T years at location k.
• From these, determination of the PH,T (k) is straightforward.
76
77. SAMSI Fall,2018
✬
✫
✩
✪
Figure 13: For SVH, contours of the probabilities that the maximum flow
heights exceed 0.5 (left), 1 (center) and 2 (right) meters over the next T = 2.5
years at each location on SVH. The shaded area is Belham Valley, which is
still inhabited.
77
78. SAMSI Fall,2018
✬
✫
✩
✪
Coupling emulators (closed form) to emulate
coupled computer models (Kyzyurova (2017)).
Coupled simulators:
• fM
(x) is the output of a simulator with input x.
– Example: fM
(x) is TITAN2D
• gM
(z) is a simulator with input z.
– Example: gM
(z) is a computer model that determines damage to a
structure incurred by being hit be a pyroclastic flow with properties z.
• Of interest is gM
◦ fM
(x) = gM
(fM
(x)), the coupled simulator
computing the damage from a pyroclastic flow arising from inputs x.
The problem: It is usually difficult to directly link two simulators.
• The output of fM
will often not be in the form needed as input to gM
.
• It is difficult to determine a good design in terms of inputs x for the coupled
emulator.
• It may well be that many more runs of fM
are available than runs of gM
.
78
79. SAMSI Fall,2018
✬
✫
✩
✪
A solution: Separately develop emulators ˜fM
of fM
and ˜gM
of gM
, and
couple the emulators.
• Always possible by Monte Carlo (generate outputs from ˜fM
and use
them in ˜gM
).
• For GaSP’s, a closed form mean and variance of the coupled emulator
is available!
Theorem. Suppose the GaSP for gM
has the linear mean function
h(z′
)β = β0 + β1z′
b, and a product power correlation function with αj = 2
for inputs j ∈ b, . . . , d that arise from fM
. For each j ∈ b, . . . , d, let fM
j be
an independent emulator of fj, the function which gives rise to the value of
input j for g(·). Then the mean Eξ and variance Vξ of the linked emulator
ξ of the coupled simulator (g ◦ (fb, . . . , fd))(u) are
79
80. SAMSI Fall,2018
✬
✫
✩
✪
Eξ = β0 + β1µ∗
fb
(ub
) +
m
i=1
ai
b−1
j=1
exp −
|uj − zij|
δj
αj d
j=b
Ii
j,
Vξ = σ2
(1 + η) + β2
0 + 2β0β1µ∗
fb
(ub
) + β2
1(σ∗2
fb
(ub
) + (µ∗
fb
(ub
))2
) − (Eξ)2
+
m
k,l=1
(alak − σ2
{Cz
−1
}k,l)
b−1
j=1
e
−
|uj −zkj |
δj
αj
+
|uj −zlj |
δj
αj d
j=b
I1k,l
j
+
2
m
i=1
ai
b−1
j=1
exp −
|uj − zij|
δj
αj
β0Ii
b + β1I+i
b
d
j=b+1
Ii
j,
80
82. SAMSI Fall,2018
✬
✫
✩
✪
−1 −0.63 −0.26 0.11 0.48 0.85
−101
●
●
●
●
●
●
x
f
f(x2) f(x3) f(x1) f(x6) f(x5)
−101
●
●
●
●
●
●
z
g
−1 −0.63 −0.26 0.11 0.48 0.85
−101
●
●
●
●
●
●
x
gOf
−1 −0.63 −0.26 0.11 0.48 0.85
−101
●
●
●
●
●
●
x
gOf
Figure 14: Top figures are functions f(x) and g(z) and their emulators. Bottom
left is the (closed form) coupled emulator of g ◦ f. Bottom right is the emulator of
the math coupled g ◦ f (constructed from the same inputs/outputs).
82
83. SAMSI Fall,2018
✬
✫
✩
✪
Other complex emulation scenarios
• Dynamic emulators (Conti et al. (2009); Liu and West (2009); Conti and
O’Hagan (2010); Reichert et al. (2011)).
• Emulating models with qualitative factors (Qian et al. (2008)).
• Nonstationary emulators (Gramacy and Lee (2008); Ba and Joseph
(2012).
• Emulating multivariate output (Bayarri et al. (2009); Paulo et al.
(2012); Fricker et al. (2013); Overstall and Woods (2016)).
• Evaluating the quality of emulators (Bastos and O’Hagan (2009);
Overstall and Woods (2016)).
83
84. SAMSI Fall,2018
✬
✫
✩
✪
References
Aslett, R., R. J. Buck, S. G. Duvall, J. Sacks, and W. J. Welch (1998).
Circuit optimization via sequential computer experiments: Design of an
output buffer. Applied Statistics 47, 31–48.
Ba, S. and V. R. Joseph (2012). Composite gaussian process models for
emulating expensive functions. The Annals of Applied Statistics 6(4),
1838–1860.
Bastos, L. S. and A. O’Hagan (2009). Diagnostics for gaussian process
emulators. Technometrics 51(4), 425–438.
Bates, R. A., R. J. Buck, E. Riccomagno, and H. P. Wynn (1996).
Experimental design and observation for large systems (Disc: p95–111).
Journal of the Royal Statistical Society, Series B, Methodological 58,
77–94.
Bayarri, M. J., J. O. Berger, E. S. Calder, K. Dalbey, S. Lunagomez, A. K.
Patra, E. B. Pitman, E. T. Spiller, and R. L. Wolpert (2009). Using
84
85. SAMSI Fall,2018
✬
✫
✩
✪
statistical and computer models to quantify volcanic hazards.
Technometrics 51, 402–413.
Bayarri, M. J., J. O. Berger, G. Garc´ıa-Donato, F. Liu, J. Palomo,
R. Paulo, J. Sacks, J. Walsh, J. A. Cafeo, and R. Parthasarathy (2007).
Computer model validation with functional output. Annals of
Statistics 35, 1874–1906.
Bayarri, M. J., J. O. Berger, M. C. Kennedy, A. Kottas, R. Paulo, J. Sacks,
J. A. Cafeo, C. H. Lin, and J. Tu (2009). Predicting vehicle
crashworthiness: validation of computer models for functional and
hierarchical data. Journal of the American Statistical Association 104,
929–942.
Bowman, V. E. and D. C. Woods (2016). Emulation of multivariate
simulators using thin-plate splines with application to atmospheric
dispersion. SIAM/ASA Journal on Uncertainty Quantification 4(1),
1323–1344.
85
86. SAMSI Fall,2018
✬
✫
✩
✪
Conti, S., J. P. Gosling, J. Oakley, and A. O’hagan (2009). Gaussian
process emulation of dynamic computer codes. Biometrika, asp028.
Conti, S. and A. O’Hagan (2010). Bayesian emulation of complex
multi-output and dynamic computer models. Journal of statistical
planning and inference 140(3), 640–651.
Cumming, J. A. and M. Goldstein (2009). Small sample bayesian designs
for complex high-dimensional models based on information gained using
fast approximations. Technometrics 51, 377–388.
Dalbey, K., M. Jones, E. B. Pitman, E. S. Calder, M. Bursik, and A. K.
Patra (2012). Hazard risk analysis using computer models of physical
phenomena and surrogate statistical models. Int. J. for Unceratainty
Quantification. To appear.
Fricker, T. E., J. E. Oakley, and N. M. Urban (2013). Multivariate gaussian
process emulators with nonseparable covariance structures.
Technometrics 55(1), 47–56.
86
87. SAMSI Fall,2018
✬
✫
✩
✪
Gramacy, B. and H. K. H. Lee (2008). Bayesian treed gaussian process
models with an application to computer modeling. Journal of the
American Statistical Association 103(483), 1119 – 1130.
Gramacy, B. and H. K. H. Lee (2009). Adaptive design and analysis of
supercomputer experiments. Technometrics 51, 130–145.
doi:10.1198/TECH.2009.0015.
Gu, M., J. Palomo, and J. O. Berger (2016). RobustGaSP: Robust
Gaussian Stochastic Process Emulation. R package version 0.5.4.
Gu, M., X. Wang, and J. Berger (2018). Robust gaussian stochastic process
emulation. Annals of Statistics.
Kyzyurova, K. N. (2017). On Uncertainty Quantification for Systems of
Computer Models. Ph. D. thesis, Duke University.
Lam, C. Q. and W. I. Notz (2008). Sequential adaptive designs in
computer experiments for response surface model fit. Statistics and
Applications 66(9), 207–233.
87
88. SAMSI Fall,2018
✬
✫
✩
✪
Lim, Y. B., J. Sacks, W. Studden, and W. J. Welch (2002). Design and
analysis of computer experiments when the output is highly correlated
over the input space. Canadian Journal of Statistics 30(1), 109–126.
Liu, F. and M. West (2009). A dynamic modelling strategy for Bayesian
computer model emulation. Bayesian Analysis 4(2), 393–412.
Loeppky, J. L., L. M. Moore, and B. J. Williams (2010). Batch sequential
designs for computer experiments. Journal of Statistical Planning and
Inference 140(6), 1452–1464.
Lopes, D. (2011). Development and Implementation of Bayesian Computer
Model Emulators. Ph.d. dissertation, Duke University.
McKay, M. D., W. J. Conover, and R. J. Beckman (1979). A comparison of
three methods for selecting values of input variables in the analysis of
output from a computer code. Technometrics 21, 239–245.
Overstall, A. M. and D. C. Woods (2016). Multivariate emulation of
computer simulators: model selection and diagnostics with application to
88
89. SAMSI Fall,2018
✬
✫
✩
✪
a humanitarian relief model. Journal of the Royal Statistical Society:
Series C (Applied Statistics) 65(4), 483–505.
Paulo, R. (2005). Default priors for gaussian processes. The Annals of
statistics 33(2), 556–582.
Paulo, R., G. Garc´ıa-Donato, and J. Palomo (2012). Calibration of
computer models with multivariate output. Computational Statistics and
Data Analysis 56(12), 3959–3974.
Qian, P. Z. G., H. Wu, and C. F. J. Wu (2008). Gaussian process models
for computer experiments with qualitative and quantitative factors.
Technometrics 50(3), 383–396.
Ranjan, P., H. R. and R. Karsten (2011). A computationally stable
approach to gaussian process interpolation of deterministic computer
simulation data. Technometrics 53, 366 – 378.
Ranjan, P., D. Bingham, and G. Michailidis (2008). Sequential experiment
89
90. SAMSI Fall,2018
✬
✫
✩
✪
design for contour estimation from complex computer codes.
Technometrics 50(4). Errata Technometrics 53, 527–541.
Reichert, P., G. White, M. J. Bayarri, and E. B. Pitman (2011).
Mechanism-based emulation of dynamic simulations models: Concept
and application in hydrology. Computational Statistics & Data
Analysis 55, 1638–1655.
Roustant, O., D. Ginsbourger, and Y. Deville (2012). Dicekriging,
diceoptim: Two r packages for the analysis of computer experiments by
kriging-based metamodeling and optimization. Journal of Statistical
Software 51(1), 1–55.
Sacks, J., W. J. Welch, T. J. Mitchell, and H. P. Wynn (1989). Design and
analysis of computer experiments (C/R: p423–435). Statistical Science 4,
409–423.
Santner, T. J., B. Williams, and W. Notz (2003). The Design and Analysis
of Computer Experiments. Springer-Verlag.
90
91. SAMSI Fall,2018
✬
✫
✩
✪
Spiller, E. T., M. Bayarri, J. O. Berger, E. S. Calder, A. K. Patra, E. B.
Pitman, and R. L. Wolpert (2014). Automating emulator construction
for geophysical hazard maps. SIAM/ASA Journal on Uncertainty
Quantification 2(1), 126–152.
Welch, W. J., R. J. Buck, J. Sacks, H. P. Wynn, T. J. Mitchell, and M. D.
Morris (1992). Screening, predicting, and computer experiments.
Technometrics 34, 15–25.
91