MUMS Opening Workshop - Panel Discussion: Inexact Computer Model Calibration, Mathhew Plumlee, August 21, 2018
1. Inexact computer model calibration
Matthew Plumlee
Industrial Engineering and Management Sciences
Northwestern University
Matthew Plumlee SAMSI, 2018 1 / 17
2. History Matching
In a case study on aligning oil models with real data:
[Craig, Goldstien, et al. (1997)] In many cases, a fully satis-
factory match is not obtained and usually it is hard to judge
whether this was due to a problem with the underlying model
or due to an inadequate search over the space of possible inputs.
Key idea: Views the discrepancy itself as a random object (perhaps
with some prior).
This is the first (but maybe not first?) use of
yO
pxi q yM
pxi , uq bi pxi q i .
Matthew Plumlee SAMSI, 2018 2 / 17
3. History Matching
In a case study on aligning oil models with real data:
[Craig, Goldstien, et al. (1997)] In many cases, a fully satis-
factory match is not obtained and usually it is hard to judge
whether this was due to a problem with the underlying model
or due to an inadequate search over the space of possible inputs.
Key idea: Views the discrepancy itself as a random object (perhaps
with some prior).
This is the first (but maybe not first?) use of
yO
pxi q yM
pxi , uq bi pxi q i .
Matthew Plumlee SAMSI, 2018 2 / 17
4. Kennedy and O’Hagan (2001)
[Kennedy and O’Hagan, 2001] Even if there is no parameter
uncertainty , is that we know the values of all the inputs required
to make a particular prediction of the process being modelled,
the predicted value will not equal the true value of the process.
Key idea: Modelling the discrepancy uncertainty with a prior.
The prior choice that was advocated was a Gaussian process, possibly
because it fits nicely into computation.
Discussion article, so we have some comments:
Peter Diggle and Keith Beven had some concerns about Gaussian
process priors, as they are complex and difficult to understand.
Many concerns about parameters.
Matthew Plumlee SAMSI, 2018 3 / 17
5. Kennedy and O’Hagan (2001)
[Kennedy and O’Hagan, 2001] Even if there is no parameter
uncertainty , is that we know the values of all the inputs required
to make a particular prediction of the process being modelled,
the predicted value will not equal the true value of the process.
Key idea: Modelling the discrepancy uncertainty with a prior.
The prior choice that was advocated was a Gaussian process, possibly
because it fits nicely into computation.
Discussion article, so we have some comments:
Peter Diggle and Keith Beven had some concerns about Gaussian
process priors, as they are complex and difficult to understand.
Many concerns about parameters.
Matthew Plumlee SAMSI, 2018 3 / 17
6. Application issues and some
abandonment
Farah et al. (2014), “if the BEA simulator were to be substantially
biased, the form of the bias would affect the posterior distribution
of the calibration parameters.”
Gramacy et al. (2015), “identification issues known to plague
[KO]-style calibration.”
It seems odd to define the relationship
yR
pxq yM
px, uq bpxq
without fixing u. After all, how could this be true for all u?
Matthew Plumlee SAMSI, 2018 4 / 17
7. Application issues and some
abandonment
Farah et al. (2014), “if the BEA simulator were to be substantially
biased, the form of the bias would affect the posterior distribution
of the calibration parameters.”
Gramacy et al. (2015), “identification issues known to plague
[KO]-style calibration.”
It seems odd to define the relationship
yR
pxq yM
px, uq bpxq
without fixing u. After all, how could this be true for all u?
Matthew Plumlee SAMSI, 2018 4 / 17
8. Application issues and some
abandonment
Farah et al. (2014), “if the BEA simulator were to be substantially
biased, the form of the bias would affect the posterior distribution
of the calibration parameters.”
Gramacy et al. (2015), “identification issues known to plague
[KO]-style calibration.”
It seems odd to define the relationship
yR
pxq yM
px, uq bpxq
without fixing u. After all, how could this be true for all u?
Matthew Plumlee SAMSI, 2018 4 / 17
9. Application issues and some
abandonment
Farah et al. (2014), “if the BEA simulator were to be substantially
biased, the form of the bias would affect the posterior distribution
of the calibration parameters.”
Gramacy et al. (2015), “identification issues known to plague
[KO]-style calibration.”
It seems odd to define the relationship
yR
pxq yM
px, uq bpxq
without fixing u. After all, how could this be true for all u?
Matthew Plumlee SAMSI, 2018 4 / 17
10. Back to the parameter controversy
Two philosophical remedies:
yR
pxq yM
px, u¦q bpxq or yR
pxq yM
px, uq bpx, uq.
Higdon, Kennedy et al. (2004) seem to imply some true u.
Loeppky, Bingham and Welch (2006) give some discussion and note
“Even if one could estimate the [parameter] exactly, it may be the case
that this value leads to a worse prediction of the process mean ...”
Bayarri et al. (2007) says that u should be in each term and should be
u¦.
However, it left ambiguous what, exactly, u¦ ought to be, aside from
the “true generating parameter”.
Matthew Plumlee SAMSI, 2018 5 / 17
11. Back to the parameter controversy
Two philosophical remedies:
yR
pxq yM
px, u¦q bpxq or yR
pxq yM
px, uq bpx, uq.
Higdon, Kennedy et al. (2004) seem to imply some true u.
Loeppky, Bingham and Welch (2006) give some discussion and note
“Even if one could estimate the [parameter] exactly, it may be the case
that this value leads to a worse prediction of the process mean ...”
Bayarri et al. (2007) says that u should be in each term and should be
u¦.
However, it left ambiguous what, exactly, u¦ ought to be, aside from
the “true generating parameter”.
Matthew Plumlee SAMSI, 2018 5 / 17
12. Back to the parameter controversy
Two philosophical remedies:
yR
pxq yM
px, u¦q bpxq or yR
pxq yM
px, uq bpx, uq.
Higdon, Kennedy et al. (2004) seem to imply some true u.
Loeppky, Bingham and Welch (2006) give some discussion and note
“Even if one could estimate the [parameter] exactly, it may be the case
that this value leads to a worse prediction of the process mean ...”
Bayarri et al. (2007) says that u should be in each term and should be
u¦.
However, it left ambiguous what, exactly, u¦ ought to be, aside from
the “true generating parameter”.
Matthew Plumlee SAMSI, 2018 5 / 17
13. Back to the parameter controversy
Two philosophical remedies:
yR
pxq yM
px, u¦q bpxq or yR
pxq yM
px, uq bpx, uq.
Higdon, Kennedy et al. (2004) seem to imply some true u.
Loeppky, Bingham and Welch (2006) give some discussion and note
“Even if one could estimate the [parameter] exactly, it may be the case
that this value leads to a worse prediction of the process mean ...”
Bayarri et al. (2007) says that u should be in each term and should be
u¦.
However, it left ambiguous what, exactly, u¦ ought to be, aside from
the “true generating parameter”.
Matthew Plumlee SAMSI, 2018 5 / 17
14. Back to the parameter controversy
Two philosophical remedies:
yR
pxq yM
px, u¦q bpxq or yR
pxq yM
px, uq bpx, uq.
Higdon, Kennedy et al. (2004) seem to imply some true u.
Loeppky, Bingham and Welch (2006) give some discussion and note
“Even if one could estimate the [parameter] exactly, it may be the case
that this value leads to a worse prediction of the process mean ...”
Bayarri et al. (2007) says that u should be in each term and should be
u¦.
However, it left ambiguous what, exactly, u¦ ought to be, aside from
the “true generating parameter”.
Matthew Plumlee SAMSI, 2018 5 / 17
15. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
16. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
17. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
18. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
19. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
20. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
21. An opinion
Parameters be outlined based some “optimal” form of the discrepancy
that is well-defined.
Some rational:
There is no recovery of the true parameter: “Abandon all hope, ye
who enter here.”
If you change the model, the parameter should change.
It should not depend on the data collected.
It should not be based on a prior.
We need to make clear what we are estimating for users of our
methods.
Matthew Plumlee SAMSI, 2018 6 / 17
22. Opinion, cont.
My personal affinity lies with a weighted L2 definition
u¦ arg min
»
X
pyR
pxq¡yM
px, uqq2
dµpxq.
Why?
Users understand this really well
It gives flexiblity on where the user finds the inputs the most
important.
But almost any norm will do (Sobolov, RKHS, etc), just make it clear!
Others have used it to define “tuning parameters”.
Matthew Plumlee SAMSI, 2018 7 / 17
23. Opinion, cont.
My personal affinity lies with a weighted L2 definition
u¦ arg min
»
X
pyR
pxq¡yM
px, uqq2
dµpxq.
Why?
Users understand this really well
It gives flexiblity on where the user finds the inputs the most
important.
But almost any norm will do (Sobolov, RKHS, etc), just make it clear!
Others have used it to define “tuning parameters”.
Matthew Plumlee SAMSI, 2018 7 / 17
24. Opinion, cont.
My personal affinity lies with a weighted L2 definition
u¦ arg min
»
X
pyR
pxq¡yM
px, uqq2
dµpxq.
Why?
Users understand this really well
It gives flexiblity on where the user finds the inputs the most
important.
But almost any norm will do (Sobolov, RKHS, etc), just make it clear!
Others have used it to define “tuning parameters”.
Matthew Plumlee SAMSI, 2018 7 / 17
27. L2 Calibration
Tuo and Wu (2015, Annals) criticize Kennedy and O’Hagan under the
premise that the MAP is not a good estimator of the L2 minimizer.
Key idea: Take our data and come up with some estimate of the true
function ˆyRp¤q. Then
ˆuL2
arg min }ˆyR
¡yM
p¤, uq}L2 .
Under a litany of conditions, they show ˆuL2 , behaves exactly like if our
model was exactly correct!
This means that we just need to “re-center” our least squares
confidence interval, but same width.
Matthew Plumlee SAMSI, 2018 9 / 17
28. L2 Calibration
Tuo and Wu (2015, Annals) criticize Kennedy and O’Hagan under the
premise that the MAP is not a good estimator of the L2 minimizer.
Key idea: Take our data and come up with some estimate of the true
function ˆyRp¤q. Then
ˆuL2
arg min }ˆyR
¡yM
p¤, uq}L2 .
Under a litany of conditions, they show ˆuL2 , behaves exactly like if our
model was exactly correct!
This means that we just need to “re-center” our least squares
confidence interval, but same width.
Matthew Plumlee SAMSI, 2018 9 / 17
29. L2 Calibration
Tuo and Wu (2015, Annals) criticize Kennedy and O’Hagan under the
premise that the MAP is not a good estimator of the L2 minimizer.
Key idea: Take our data and come up with some estimate of the true
function ˆyRp¤q. Then
ˆuL2
arg min }ˆyR
¡yM
p¤, uq}L2 .
Under a litany of conditions, they show ˆuL2 , behaves exactly like if our
model was exactly correct!
This means that we just need to “re-center” our least squares
confidence interval, but same width.
Matthew Plumlee SAMSI, 2018 9 / 17
30. L2 Calibration
Tuo and Wu (2015, Annals) criticize Kennedy and O’Hagan under the
premise that the MAP is not a good estimator of the L2 minimizer.
Key idea: Take our data and come up with some estimate of the true
function ˆyRp¤q. Then
ˆuL2
arg min }ˆyR
¡yM
p¤, uq}L2 .
Under a litany of conditions, they show ˆuL2 , behaves exactly like if our
model was exactly correct!
This means that we just need to “re-center” our least squares
confidence interval, but same width.
Matthew Plumlee SAMSI, 2018 9 / 17
31. L2 Calibration
Tuo and Wu (2015, Annals) criticize Kennedy and O’Hagan under the
premise that the MAP is not a good estimator of the L2 minimizer.
Key idea: Take our data and come up with some estimate of the true
function ˆyRp¤q. Then
ˆuL2
arg min }ˆyR
¡yM
p¤, uq}L2 .
Under a litany of conditions, they show ˆuL2 , behaves exactly like if our
model was exactly correct!
This means that we just need to “re-center” our least squares
confidence interval, but same width.
Matthew Plumlee SAMSI, 2018 9 / 17
33. Where are we again?
Trying to construct intervals with non-linear regression for inexact
models.
1960-1995 Na¨ıve approaches, in general, will not work unless your xi s
are exactly generated from µ or your model is exact.
1995-2013 History Matching produce huge intervals. Kennedy and
O’Hagan method produces large intervals, MAP away
from L2 parameter, often does not cover L2 parameter.
2013-Today L2 and like were proposed to basically work in large
samples.
Matthew Plumlee SAMSI, 2018 11 / 17
34. Where are we again?
Trying to construct intervals with non-linear regression for inexact
models.
1960-1995 Na¨ıve approaches, in general, will not work unless your xi s
are exactly generated from µ or your model is exact.
1995-2013 History Matching produce huge intervals. Kennedy and
O’Hagan method produces large intervals, MAP away
from L2 parameter, often does not cover L2 parameter.
2013-Today L2 and like were proposed to basically work in large
samples.
Matthew Plumlee SAMSI, 2018 11 / 17
35. Orthogonal approach
Key idea:
Theorem (Plumlee, 2017)
Suppose that u is the minimizer of L2 located in the interior of u € Rp.
Then bp¤q must be such that
»
X
yM
px, uqpyR
pxq¡yM
px, uqqlooooooooooomooooooooooon
bpxq
dµpxq 0. p¦q
Simply use KO with
yR
pxq yM
px, uq bupxq,
with orthogonal bu.
Theorem (Plumlee and Joseph, 2018)
Under reasonable conditions, we can construct a prior on a Gaussian
process such that p¦q is met.
Matthew Plumlee SAMSI, 2018 12 / 17
37. Frequency of coverage of the L2 minimizer
Type Poor Exp Rich Exp Poor Obs Rich Obs
HM 1.00 1.00 1.00 1.00
KO 1.00 1.00 1.00 1.00
L2 0.87 0.92 0.58 0.98
OGP 1.00 1.00 1.00 1.00
Average Interval Width
Type Poor Exp Rich Exp Poor Obs Rich Obs
HM 3.18 3.54 3.62 3.91
KO 2.64 2.61 2.67 2.88
L2 0.61 0.37 1.15 0.63
OGP 2.55 0.94 3.75 1.23
Matthew Plumlee SAMSI, 2018 14 / 17
38. On discrepancy correction
One of the key features of Kennedy and O’Hagan was the concept of
bias correction.
That is, instead of using yMpx, u¦q as a predictor of the best model,
why not go straight to yMpx, u¦q bpxq?
Practitioners are sometimes unwilling to use the predictive discrepancy
function in making inferences because...
It rarely has the behavior they expect.
If their discrepancy is doing the heavy lifting, they want to fix their
model.
It is hard to communicate/transfer the model to colleagues.
So I argue the best model itself is something we want to conjecture on.
Matthew Plumlee SAMSI, 2018 15 / 17
39. On discrepancy correction
One of the key features of Kennedy and O’Hagan was the concept of
bias correction.
That is, instead of using yMpx, u¦q as a predictor of the best model,
why not go straight to yMpx, u¦q bpxq?
Practitioners are sometimes unwilling to use the predictive discrepancy
function in making inferences because...
It rarely has the behavior they expect.
If their discrepancy is doing the heavy lifting, they want to fix their
model.
It is hard to communicate/transfer the model to colleagues.
So I argue the best model itself is something we want to conjecture on.
Matthew Plumlee SAMSI, 2018 15 / 17
40. On discrepancy correction
One of the key features of Kennedy and O’Hagan was the concept of
bias correction.
That is, instead of using yMpx, u¦q as a predictor of the best model,
why not go straight to yMpx, u¦q bpxq?
Practitioners are sometimes unwilling to use the predictive discrepancy
function in making inferences because...
It rarely has the behavior they expect.
If their discrepancy is doing the heavy lifting, they want to fix their
model.
It is hard to communicate/transfer the model to colleagues.
So I argue the best model itself is something we want to conjecture on.
Matthew Plumlee SAMSI, 2018 15 / 17
41. On discrepancy correction
One of the key features of Kennedy and O’Hagan was the concept of
bias correction.
That is, instead of using yMpx, u¦q as a predictor of the best model,
why not go straight to yMpx, u¦q bpxq?
Practitioners are sometimes unwilling to use the predictive discrepancy
function in making inferences because...
It rarely has the behavior they expect.
If their discrepancy is doing the heavy lifting, they want to fix their
model.
It is hard to communicate/transfer the model to colleagues.
So I argue the best model itself is something we want to conjecture on.
Matthew Plumlee SAMSI, 2018 15 / 17
42. Cardiac example
time
0 5 10 15
normalizedcurrent
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
KO
time
0 5 10 15
OGP
Matthew Plumlee SAMSI, 2018 16 / 17
43. Cardiac example
time
0 5 10 15
normalizedcurrent
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
KO
time
0 5 10 15
OGP
Another side benefit: faster MCMC mixing.
KO: ESS about 6.5 / 1000 samples
OGP: ESS about 78.2 / 1000 samples
Matthew Plumlee SAMSI, 2018 16 / 17
44. Some future comments
When working on calibration, we should focus on getting
uncertainty right, not asymptotic value.
Measured relative to some parameter based on yR
p¤q and yM
p¤, ¤q.
We should not claim that we are covering nature’s parameter. This
fact depends on the state-of-the-art science and nature.
“Right” by inverse price-is-right rule.
We need to work from definitions of parameters, but L2 is not the
only one.
What is that weight function?
What about L1 or other measures of difference?
Smoothness?
So many bells and whistles were designed for KO. What needs to
be extended?
Functional Responses
Emulator interface
Multi-model cases
Matthew Plumlee SAMSI, 2018 17 / 17
45. Some future comments
When working on calibration, we should focus on getting
uncertainty right, not asymptotic value.
Measured relative to some parameter based on yR
p¤q and yM
p¤, ¤q.
We should not claim that we are covering nature’s parameter. This
fact depends on the state-of-the-art science and nature.
“Right” by inverse price-is-right rule.
We need to work from definitions of parameters, but L2 is not the
only one.
What is that weight function?
What about L1 or other measures of difference?
Smoothness?
So many bells and whistles were designed for KO. What needs to
be extended?
Functional Responses
Emulator interface
Multi-model cases
Matthew Plumlee SAMSI, 2018 17 / 17
46. Some future comments
When working on calibration, we should focus on getting
uncertainty right, not asymptotic value.
Measured relative to some parameter based on yR
p¤q and yM
p¤, ¤q.
We should not claim that we are covering nature’s parameter. This
fact depends on the state-of-the-art science and nature.
“Right” by inverse price-is-right rule.
We need to work from definitions of parameters, but L2 is not the
only one.
What is that weight function?
What about L1 or other measures of difference?
Smoothness?
So many bells and whistles were designed for KO. What needs to
be extended?
Functional Responses
Emulator interface
Multi-model cases
Matthew Plumlee SAMSI, 2018 17 / 17
47. Some future comments
When working on calibration, we should focus on getting
uncertainty right, not asymptotic value.
Measured relative to some parameter based on yR
p¤q and yM
p¤, ¤q.
We should not claim that we are covering nature’s parameter. This
fact depends on the state-of-the-art science and nature.
“Right” by inverse price-is-right rule.
We need to work from definitions of parameters, but L2 is not the
only one.
What is that weight function?
What about L1 or other measures of difference?
Smoothness?
So many bells and whistles were designed for KO. What needs to
be extended?
Functional Responses
Emulator interface
Multi-model cases
Matthew Plumlee SAMSI, 2018 17 / 17