Although we often told not to do it, statistical scientists frequently predict the value of outcome measures of physical systems at input points far the observed data. Since predictions are made in new regions of the input space, a statistical theory cannot dictate optimal rules for measures of uncertainty associated with extrapolation. This talk presents several solutions based on simple principles. The solutions are illustrated via the analysis of data generated by dropping spheres of varying radii and masses from different heights. Some of the techniques apply to more complex physical systems. The efficacy of these techniques is demonstrated using data (experimental and simulated) of the level of complexity physical scientist frequently face. Scientists should tailor these techniques to fit the needs of a particular application.
Similar to MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty for Extrapolation in Physical Systems - Aaron Danielson, May 14, 2019
Similar to MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty for Extrapolation in Physical Systems - Aaron Danielson, May 14, 2019 (20)
4. DON’T EXTRAPOLATE?
• Extrapolation: estimate the value of a variable y* at
input points x* beyond the range of observed data
(x,y) using the associations between x and y learned
by the observed data.
• But, these associations may change across the input
space. Physical system operates differently at
different points in the input space.
• Then the extrapolations are misleading.
• How to quantify uncertainty in these cases?
5. COMPUTER EXPERIMENTS
• Scientific applications use mathematical models to
describe physical systems.
• Growth in computer power enables the study of
complex phenomena that otherwise might be too
time consuming or expensive to observe.
• To understand how inputs impact the system,
scientists vary simulation inputs and observe the
response. Conduct a computer experiment.
6. SOME GOALS FOR UQ
• Model Calibration:
• Use data from computer experiments.
• Use data from field experiments.
• Model the shared signal.
• Estimate parameters that govern the system with uncertainty.
• Model Validation:
• Demonstrate the model fits the data.
• Account for discrepancy between theory and empirical observations.
• Prediction (and … Extrapolation):
• Build a predictive model for the system with estimates of uncertainty.
• Hope the predictive model is “better” than using the code alone or field data alone.
7. ONE CONVENTIONAL APPROACH
(NO DISCREPANCY)
• Empirical data:
• Simulation data:
• is needed to run the model.
• But, its value is not known in the field.
• Calibration Parameter:
• Simulator:
• Observation Error:
• View computer code as single realization of a Gaussian
Process (Sacks, Welch, Mitchell, & Wynn, 1989).
Ys(xs) = η (xs, ts)
Yf(xf ) = η (xf, θ) + ϵf
η(x, t)
ϵf ∼ N(0,1/λf )
θ
0.0 0.2 0.4 0.6 0.8 1.0
−1.5−1.0−0.50.00.51.01.52.0
x
y
(xf, yf )
(xs, ts, ys)
ts
8. CONVENTIONAL APPROACH
(WITH DISCREPANCY)
• The discrepancy measures systematic deviation between
theory and empirical observation.
• Conventional models posit as a realization of a
Gaussian process.
• This is the Kennedy-O’Hagan (KOH) framework. (Kennedy &
O’Hagan, 2001) and (Higdon et al., 2004).
Ys(xs) = η (xs, ts)
Yf(xf ) = η (xf, θ) + δ(xf ) + ϵf
δ(xf )
9. GAUSSIAN PROCESSES
(INFORMAL DESCRIPTIONS)
• Prior on the space of functions for the data
• Regression that models observed data primarily via a covariance function
rather than the mean function.
• Gaussian process model is essentially a normal regression model with
correlated errors.
• N-dimensional version of kriging spatial statistics where spatial coordinates
are replaced by a general input space.
• Can interpolate data and provides statistical uncertainty at un-sampled inputs.
• But, computationally intensive when number of observations is large.
Disadvantages associated with GP discrepancy discussed throughout this talk.
10. MORE ON GAUSSIAN PROCESSES
• Simple example. Ignore calibration parameter.
• Model where:
• Mean zero
• Variance
• Correlation between data points:
• When is large, then the function wiggles around the data more extremely.
η(x) = μ + w(x)
𝔼[w(x)] = 0
Var(w(x)) = σ2
Corr(w(x), w(x′)) =
d
∏
i=1
exp {−βi(xi − x′i)2
}
β
11. A SIMPLE EXAMPLE: BALLS
FALLING THROUGH THE AIR
• Drop balls from 3 different
locations.
• Record time and vertical
displacement.
• Use two sets as training data.
• Reserve last as a validation set.
• Look out below!
12. A SIMPLE EXAMPLE: BALLS
FALLING THROUGH THE AIR
• Simulation does not
require complicated
computation.
• Can fit using the standard
model with or without a
discrepancy.
Ys (xs) =
1
2
tsx2
s
ts = acceleration due to gravity
13. BAYESIAN FRAMEWORK
• Combine likelihood and prior to
get posterior distribution for .
• Prior specifications for parameters depend on the
discrepancy and observation used in the model.
• To predict, sample and draw from the distribution
of given and the data .
L(ω|ys, xs, ts, yf, xf ) π(ω)
ω
ω
y* ω (x*, ys, xs, ts, yf, xf )
20. LIMITATIONS OF THE
CONVENTIONAL APPROACH
• The predictive variance is bounded.
• and are precision parameters associated with
the covariance functions for and .
var (ynew|xnew
) ≤
λη + λδ
ληλδ
λη λδ
η( ⋅ , ⋅ ) δ( ⋅ )
21. DISCREPANCY HAS BOUNDED
VARIANCE
• The variance of the discrepancy does not depend
on x.
• Compare this to OLS:
var (δ(x*)|x*, ω) =
1
λδ
exp
{
−
p
∑
k=1
βδ
k |x*k
− x*k
|2
}
=
1
λδ
.
var(Y(x*)|x*, ω) = s2
(1 + (x*)⊤
(X⊤
X)−1
x*) .
22. FUNKY BEHAVIOR FAR FROM THE
EMPIRICAL DATA
• The variance of the discrepancy converges to an
upper bound as the distance between an input
point and the empirical observations increases.
• The covariance terms converge to zero. In the
limit, predicted points are independent of the
empirical data … and the discrepancy.
lim
x*−x →∞
cov (x*, x) = lim
∥x*−x∥→∞
1
λδ
exp
{
−
p
∑
k=1
βδ
k |x*k
− xk |2
}
= 0.
23. WHAT TO DO?
• Linear models do not have this problem.
Uncertainty associated with the parameter values
leads to unbounded variance over the input space.
• Observation errors can be structured to enforce
unbounded prediction variance.
• Probability distributions with assumptions similar
to the constant coefficient of variation property
(CCV).
24. MODELS WITH LINEAR TERMS
• Setting ensures the variance of the
discrepancy is unbounded.
• And, the association between the discrepancy and
extrapolations does not disappear.
• Other choices with these properties include polynomials
and splines.
• Can also use a model for with unbounded
variance.
δ(xf ) = x⊤
f βδ
η( ⋅ , ⋅ )
26. OBSERVATION ERROR
• Take a cue from literature on heteroskedasticity. Changes in
the variance as we traverse the input space.
• Model error as where is a
function of the input data and a parameter .
• Some examples:
• where is a measure of central tendency
such as the spatial median or the distance from k nearest
neighbors.
• where is the Euclidean norm.
•
ϵ(x, τ) ∼ N (0,(g(x, τ) + 1)σ2
)
g(xf, τ) = d(xf, x0)τ x0
g(xf, τ) = ∥xf∥τ ⋅
g(xf, τ) =
∑
|xf,p |τp
g(x, τ)
x τ
28. CONSTANT COEFFICIENT OF
VARIATION (CCV)
• Classic assumption:
• Written differently:
• Some common distributions satisfy this: lognormal,
gamma and exponential distributions.
• Can be applied to regression settings. (Amemiya, 1973)
• The parameter can be estimated or set a priori. For
example, implies the variance is 10% of the
mean squared.
κ =
σ
μ
σ2
∝ μ2
κ
κ2
= .1
29. CCV ASSUMPTIONS
• Suppose a subject matter expert believes
• or
• .
• Model the discrepancy with a CCV distribution.
• Assume or
where
• .
• Assume is lognormal, gamma or exponential if it has support
on the positive reals.
var (δ(xf )) ∝ 𝔼[δ(xf )]2
δ(xf ) ∼ N(μ, κμ2
) δ(xf ) ∼ N
(
μ,
exp {μ2
/κ1}
κ2 )
μ = x⊤
f βδ
Yf(xf )
var (Yf(xf )) ∝ 𝔼[Yf(x)]2
31. MORE ON CCV MODELS
GAMMA EXAMPLE
• Model with as:
• Conditional mean:
• Conditional variance:
• No Gaussian process in this model.
Yf(xf )
p(yf(xf )|xf, θ, κ) ∝ y
1
2κ θx2
f −1
f
exp
{
−
yf
κ }
1
2
θx2
f
κ
1
2
θx2
f
𝔼 [η(xi, θ)] = 1/2θx2
i
33. COMPARISON OF STRATEGIES
Simulator Discprepancy
Observation
Error
Discrepancy
Var
Bounded
Prediction
Var
Bounded
Discrepancy
Dissipates
Yes Yes Yes
Yes No Yes
No No No
No No No
CCV No No No
η(xf; θ)
η(xf; θ)
η(xf; θ)
η(xf; θ)
η(xf; θ)
δ(xf )
δ(xf )
ϵ
ϵ(x, τ)
x⊤
f βδ
ϵ
x⊤
f βδ ϵ(x, τ)
ϵ
ϵ
34. DISCREPANCY WITH VARIABLES
EXTERNAL TO MODEL
• Besides fall times x and vertical distances y, we
observe additional information z, the radius and
density of each ball.
• Expand the models to incorporate this.
• Can use different functional forms:
Yf(xf ) = η(x, θ) + δ(x, z) + ϵ
Yf(xf ) = η(x, θ) + δ(x) + ϵ(z)
δ(x, z) = (x, z)⊤
βδ
δ(x, z) = (x)⊤
βδ + δ(z)
δ(x, z) = (z)⊤
βδ
36. HOW MUCH DOES THIS HELP?
(GP DISCREPANCY BASED ON Z)
Yf(xf ) = η(x, θ) + δ(z) + ϵ(x)
37. HOW TO CHOOSE?
• Scientific application provides natural choice.
• For the ball drop, we expect a positive bias.
Variance should increase in the time.
Extra variables are important to the physical
process.
• Elicitation of Expert Opinion.
• No universal answers; just strategies for
consideration.
38. REFERENCES
• Amemiya, Takeshi. "Regression analysis when the variance of the dependent variable is proportional to the square
of its expectation." Journal of the American Statistical Association 68.344 (1973): 928-934.
• Fang, Zhide, and Douglas P. Wiens. "Robust extrapolation designs and weights for biased regression models with
heteroscedastic errors." Canadian Journal of Statistics 27.4 (1999): 751-770.
• Gelman, Andrew, et al. Bayesian data analysis. Chapman and Hall/CRC, 2013.
• Higdon, Dave, et al. "Combining field data and computer simulations for calibration and prediction." SIAM Journal
on Scientific Computing 26.2 (2004): 448-466.
• Higdon, Dave, et al. "Computer model calibration using high-dimensional output." Journal of the American
Statistical Association 103.482 (2008): 570-583.
• Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration of computer models." Journal of the Royal
Statistical Society: Series B (Statistical Methodology) 63.3 (2001): 425-464.
• Khan, Rasul A. "A remark on estimating the mean of a normal distribution with known coefficient of variation."
Statistics 49.3 (2015): 705-710.
• Sacks, Jerome, et al. "Design and analysis of computer experiments." Statistical science (1989): 409-423.
• Zabarankin, Michael and Stan Uryasev. Statistical decision problems. Springer-Verlag New York, 2016.
38
41. PSEUDOLIKELIHOOD
APPROACH TO CCV
• Model empirical observations as:
• And, model the simulations as a Gaussian Process:
• Matrix Gamma distribution as an alternative.
p(yi(xi)|xi, y−i, x−i, θ, κ) ∝ y
𝔼[η(xi, θ)|y−i, x−i]
κ −1
i
exp
{
−
yi
κ }