Bayesian Uncertainty Analysis with Application to Tipping Points

Bayesian Uncertainty Analysis
for complex physical systems
modelled by computer simulators,
with application to tipping points
Michael Goldstein
Camila Caiado
Department of Mathematical Sciences, Durham University ∗

∗

Thanks to Leverhulme for funding through the Durha Tipping points project, to EPSRC for funding on
the Managing Uncertainty for Complex Models Consortium, to Ian Vernon for Galaxy Analysis, and Jonty
Rougier for Reiﬁcation

Tipping Points: Work Packages

WP1: Neoglacial climate
transitions in the
North Atlantic
WP3: Mathematical
Basis of
Tipping Points

WP4: Metaphor
and Agency

WP2: Financial
Crisis in the
Banking sector

WP5:Critical Transitions

WP3: The Mathematical Basis of Tipping Points
Objectives

• Assess the predictability of tipping events and the system’s behaviour after

such events
• Develop models for complex systems like paleoclimate reconstruction,
societal dynamics and healthcare monitoring
• Characterization of tipping points
• Investigate deterministic and stochastic approaches to the modelling of
tipping points
Current Projects

•
•
•
•

Multi-proxy paleoclimate reconstruction
Live hospital monitoring
Agent-based modelling of social dynamics
Compartmental modelling of health and social issues

Managing uncertainty in complex models
There is a growing field of study relating to uncertainty quantification for
complex physical systems modelled by computer simulators.
Many people work on different aspects of such uncertainty analyses
Great resource: the Managing Uncertainty in Complex Models web-site
http://www.mucm.ac.uk/ (for references, papers, toolkit, etc.)
[MUCM is a consortium of U. of Aston, Durham, LSE, Sheffield, Southampton with Basic Technology funding. Now mutating into MUCM community.]
The aims of this talk are
(i) to introduce and illustrate some of the basic principles in this area
(ii) to discuss how this methodology applies to the study of tipping points.

EXAMPLE:Understanding the Universe
Major advances in cosmology in the last 100 years (mainly thanks to Einstein)

• Universe began in hot dense state: The Big Bang
• Since then Universe has been expanding rapidly
Cosmologists have spent much time and money researching the beginning, the
evolution, the current content, and the ultimate fate of the Universe.
Now know that the observable Universe is composed of billions of galaxies
each made up of 10 million - 10 trillion stars
How did these galaxies form?

Andromeda Galaxy and Hubble Deep Field View

• Andromeda Galaxy: closest large galaxy to our own milky way, contains 1

trillion stars.
• Hubble Deep Field: furthest image yet taken. Covers 2 millionths of the sky
but contains over 3000 galaxies.

Dark Matter and the Evolution of the Universe
Recent observations of galaxies have suggested that only 3 percent of the
entire energy content of the universe is the normal matter which forms stars,
planets and us.
A further 23 percent is ’Dark Matter’ (and the rest is Dark Energy).
Dark Matter cannot be ’seen’ as it does not give off light (or anything else).
However it does have mass and therefore affects stars and galaxies via gravity.
In order to study the effects of Dark Matter cosmologists try to model Galaxy
formation

• Inherently linked to amount of Dark Matter
• Of fundamental interest as tests cosmologists’ knowledge of a wide range
of complicated physical phenomena

Simulating Galaxy Evolution: Two Stage approach
The simulation is performed in two parts:
[1] First an N-Body simulation is run to determine the behaviour of ﬂuctuations
of mass in the early Universe, and their subsequent growth into millions of
galaxy sized lumps of mass in the following 12 billion years.
[A very heavy simulation which takes 3 months, done on a supercomputer and
cannot be easily repeated.]
[2] These results on the behaviour of the massive lumps are then used by a
more detailed Galaxy Formation simulation (called GALFORM) which models
the far more complicated interactions of normal matter: gas cloud formation,
star formation and the effects of black holes at the centre of galaxies.
The ﬁrst simulation is done on a volume of size (500 Mega-Parsec)3 or (1.63
billion light-years)3
This volume is split into 512 sub-volumes which are independently simulated
using the second model GALFORM. This simulation is run on upto 256 parallel
processors, and takes 20-30 minutes per sub-volume per processor

Universe at < 100 million years
MB − 5 log h
−18.5

−19.0

−19.5

−20.0

−20.5

−21.0

−21.5

−22.0

−22.5

B−V
0.4

0.9

−1

20 h Mpc

−0.1

ΛCDM

z=5.0

Benson, Frenk, Baugh, Cole & Lacey (2001)

Universe at ∼ 1 billion years
MB − 5 log h
−18.5

−19.0

−19.5

−20.0

−20.5

−21.0

−21.5

−22.0

−22.5

B−V
0.4

0.9

−1

20 h Mpc

−0.1

ΛCDM

z=3.0


Universe at ∼ 2 billion years
MB − 5 log h
−18.5

−19.0

−19.5

−20.0

−20.5

−21.0

−21.5

−22.0

−22.5

B−V
0.4

0.9

−1

20 h Mpc

−0.1

ΛCDM

z=2.0


Universe 4 billion years
MB − 5 log h
−18.5

−19.0

−19.5

−20.0

−20.5

−21.0

−21.5

−22.0

−22.5

B−V
0.4

0.9

−1

20 h Mpc

−0.1

ΛCDM

z=1.0


Universe 13 billion years (Today)
MB − 5 log h
−18.5

−19.0

−19.5

−20.0

−20.5

−21.0

−21.5

−22.0

−22.5

B−V
0.4

0.9

−1

20 h Mpc

−0.1

ΛCDM

z=0.0


Galform: Inputs and Outputs
Outputs: Galform provides many outputs but we start by looking at the bj and
K luminosity functions

• bj luminosity function: the number of blue (i.e. young) galaxies of a certain

luminosity per unit volume
• K luminosity function: the number of red (i.e. old) galaxies of a certain
luminosity per unit volume
These outputs can be compared to observational data

Inputs: 17 input variables reduced to 8 after expert judgements. These include:

• vhotdisk: relative amount of energy in the form of gas blown out of a galaxy
due to star formation
• alphacool: regulates the effect the central black hole has in keeping large
galaxies ’hot’
• yield: the metal content of large galaxies

and ﬁve others: alphahot, stabledisk, epsilonStar, alphareheat and vhotburst

Observational Data: Galaxy Surveys

Earth at centre of image. Data taken by telescopes looking in two seperate
directions. Galaxies observed up to a distance of 1.2 billion light years.

Galaxy Formation: Main Issues
Basic Questions

• Do we understand how galaxies form?
• Could the galaxies we observe have been formed in the presence of large
amounts of dark matter?

Fundamental Sources of Uncertainty

• We only observe the galaxies in our ‘local’ region of the Universe: it is

possible that they are not representative of the whole Universe.
• The output of the simulation is a ‘possible’ Universe which should have
similar properties to ours, but is not an exact copy.
• The output of the simulation is 512 different computer models for “slices” of
the universe which are exchangeable with each other and (hopefully) with
slices of our universe.
• We are uncertain which values of the input parameters should be used
when running the model

Computer simulators
A simulator f is a deterministic complex computer model for the physical
system. We denote the simulator as

y = f (x)
where x are uncertain model parameters, corresponding to unknown system
properties.

• We have n evaluations of the simulator at inputs X = (x1 , . . . , xn ).
• We denote the resulting evaluations as F = (f (x1 ), . . . , f (xn )).

How to relate models and physical systems?
• Basic ingredients:

x∗ : system properties (unknown)
y : system behaviour (influenced by x∗ )
z : partial observation of y (with error)

• Ideally, we would like to construct a deterministic model f , embodying the
laws of nature, which satisfies

y = f (x∗ )
• In practice, however, the our actual model f is inadequate as:
(i) f simplifies the physics;
(ii) f approximates the solution of the physical equations
• Further practical issues:
(i) x may be high dimensional
(ii) evaluating f (x) for any x may be VERY expensive.

The Best input
How does learning about f teach us about y ?
The simplest (and therefore most popular) way to relate uncertainty about the
simulator and the system is the so-called “Best Input Approach”.
We proceed as though there exists a value x∗ independent of the function f
such that the value of f ∗ = f (x∗ ) summarises all of the information that the
simulator conveys about the system.
Deﬁne the model discrepancy as ǫ = y − f ∗
Our assumption is that ǫ is independent of f, x∗ .
(Here, and onwards, all probabilistic statements relate to the uncertainty
judgements of the analyst.)
This formulation begs questions about why should x∗ correspond to “true”
system properties. We’ll come back to this later.

Representing beliefs about f using emulators
An emulator is a probabilistic belief speciﬁcation for a deterministic function.
Our emulator for component i of f might be

βij gij (x) + ui (x)

fi (x) =
j

where B = {βij } are unknown scalars, gij are known deterministic functions
of x, and u(x) is a weakly stationary stochastic process.
[A simple case is to suppose, for each x, that ui (x) is normal with constant
variance and Corr(ui (x), ui (x′ )) is a function of x − x′ .]
Bg(x) expresses global variation in f . u(x) expresses local variation in f
The emulator expresses prior uncertainty judgements about the function.
These are modiﬁed by observation of F .

Emulator comments
We ﬁt the emulator, fi (x) =
j βij gij (x) + ui (x) given a collection of
model evaluations, using our favourite statistical tools - generalised least
squares, maximum likelihood, Bayes - with a generous helping of expert
judgement.
So, we need careful experimental design to choose which evaluations of the
model to make, and detailed diagnostics, to check emulator validity.
We have some useful backup tricks - for example, if we can only make a few
evaluations of our model, we may be able to make many evaluations of a
simpler approximate version of the model to get us started.
From the emulator, we may extract the mean, variance and covariance for the
function, at each input value x.

µi (x) = E(fi (x))
κi (x, x′ ) = Cov(fi (x), fi (x′ ))

Uncertainty analysis for complex models
Aim: to tackle problems arising from the uncertainties inherent in imperfect
computer models of highly complex physical systems, using a Bayesian
formulation. This involves

•
•
•
•

prior probability distribution for best inputs x∗
a probabilistic emulator for the computer function f
a probabilistic discrepancy measure relating f (x∗ ) to the system y
a likelihood function relating historical data z to y

This full probabilistic description provides a formal framework to synthesise
expert elicitation, historical data and a careful choice of simulator runs.
We may then use our collection of computer evaluations and historical
observations to analyse the physical process to

• determine values for simulator inputs (calibration; history matching);
• assess the future behaviour of the system (forecasting).
• “optimise” the performance of the system

Bayes linear analysis
Within the Bayesian approach, we have two choices.
(i) Full Bayes analysis: complete joint probabilistic specification for all quantities
(ii) Bayes linear analysis, based only on expectation as a primitive, involving
prior specification of means, variances and covariances
Probability is the most common choice, but there are advantages in working
with expectations - the uncertainty specification is simpler, the analysis
(particularly for experimental design) is much faster and more straightforward
and there are careful foundations.
The statistical approach based around expectation is termed Bayes linear
analysis, based around these updating equations for mean and variance:

Ez [y] = E(y) + Cov(y, z)Var(z)−1 (z − E(z)),
Varz [y] = Var(y) − Cov(y, z)Var(z)−1 Cov(z, y)
Some of the examples that we will describe use Bayes linear methods.
(There are natural (but much more complicated) probabilistic counterparts.)

History matching
Model calibration aims to identify the best choices of input parameters x∗ ,
based on matching data z to the corresponding simulator outputs fh (x).
However
(i) we may not believe in a unique true input value for the model;
(ii) we may be unsure whether there are any good choices of input parameters
(iii) full probabilistic calibration analysis may be very difﬁcult/non-robust.
A conceptually simple procedure is “history matching”, i.e. ﬁnding the
collection, C(z), of all input choices x for which you judge the match of the
model outputs fh (x) to observed data, z , to be acceptably small, taking into
account all of the uncertainties in the problem.
If C(z) is non-empty, then an analysis of its elements reveals the constraints
on the parameter space imposed by the data.
Further the model projections f (x) : x ∈ C(z) over future outcomes, reveal
the futures consistent with the model physics and the historical data.
If the data is informative for the parameter space, then C(z) will typically form
a tiny percentage of the original parameter space, so that even if we do wish to
calibrate the model, history matching is a useful prior step.

History matching by implausibility
We use an ‘implausibility measure’ I(x) based on a probabilistic metric (eg.
number of sd between z and fh (x)) where z = yh ⊕ e, yh = fh (x∗ ) ⊕ ǫ for
observational error e, and model disrepancy ǫ.
For example, if we are matching a single output, then we might choose

(z − E(fh (x)))2
I(x) =
Var(z − E(fh (x)))

where Var(z − E(fh (x))) is the sum of measurement variance, Var(e),
structural discrepancy variance, Var(ǫ), and emulator variance Var(fh (x)).
The implausibility calculation can be performed univariately, or by multivariate
calculation over sub-vectors. The implausibilities are then combined, such as
by using IM (x) = maxi I(i) (x), and can then be used to identify regions of x
with large IM (x) as implausible, i.e. unlikely to be good choices for x∗ .
With this information, we can then refocus our analysis on the ‘non-implausible’
regions of the input space, by (i) making more simulator runs & (ii) reﬁtting our
emulator over such sub-regions and repeating the analysis. This process is a
form of iterative global search.

Back to Galaxy Formation
We want to history match the Galaxy Formation model Galform using the
emulation and implausibility techniques that we have outlined.
We want to reduce the volume of input parameter space as much as we can by
discarding all points that we are (reasonably) sure will not give an ’acceptable’
ﬁt to the output data
We do this in stages, as follows:

•
•
•
•
•

design a set of runs of the simulator within the input volume of interest
choose a subset of the outputs for which we have system observations
emulate these outputs
calculate implausibility over the selected input volume
discard all x input points that have implausibility greater than a certain cutoff

This process is then repeated. This is refocusing. As we are now in a reduced
input volume, outputs may be of simpler form and therefore easier to emulate.
As we have reduced the variation in the ouputs arising from the most important
inputs, this also allows us to assess variation due to secondary inputs.

Galform analysis
Following the cosmologist own attempt to history match Galform, we chose to
run only the ﬁrst 40 sub-volumes (out of 512) and examine their mean. The
simulator function fi (x) is now taken to be the mean of the luminosity outputs
over the ﬁrst 40 sub-volumes.
Design: 1000 point Latin Hypercube design in key inputs
Outputs: Decided to choose 11 outputs from the luminosity functions as they
could be emulated accurately
Active Variables: For each output we choose 5 active variables xA , i.e. those
inputs which are the most important for explaining variation in the output.
We then emulate each of the 11 outputs univariately using:

βij gij (xA ) + ui (xA ) + δi (x)

fi (x) =
j

where now B = {βij } are unknown scalars, gij are now monomials in xA of
order 3 or less, and u(xA ) is a gaussian process. The nugget δi (x) models
the effects of inactive variables as random noise.

11 Output points Chosen

Outputs chosen to be informative enough to allow us to cut down the parameter
space, but simple enough to be emulated easily.

Model Discrepancy
Before calculating the implausibility we need to assess the Model Discrepancy
and Measurement error.
Model Discrepancy has three components:

• ΦE : Expert assessment of model discrepancy of full model with 17
parameters and using 512 sub-volumes

• Φ40 : Discrepancy term due to (i) choosing ﬁrst 40 sub-volumes from full

512 sub-volumes, and (ii) need to extrapolate to our universe. Assess this
by repeating 100 runs but now choosing 40 random regions.
[More carefully, we may construct an exchangeable system of emulators to
fully account for this discrepancy.]

• Φ12 : As we have neglected 9 parameters (due to expert advice) we need to
assess effect of this (by running latin hypercube design across all 17
parameters)

Measurement Error
Observational Errors composed of 4 parts:

• Normalisation Error: correlated vertical error on all luminosity output points
• Luminostiy Zero Point Error: correlated horizontal error on all luminosity
points

• k + e Correction Error: Outputs have to be corrected for the fact that

galaxies are moving away from us at different speeds (light is red-shifted),
and for the fact that galaxies are seen in the past (as light takes millions of
years to reach us)

• Poisson Error: assumed Poisson process to describe galaxy production (not
very accurate assumption!)

Implausibility
We can now calculate the Implausibility at any input parameter point x for
each outputs, given by:

I(i) (x) = |E(fi (x)) − zi |2 /(Var(fi (x)) + M D + OE)
where E(fi (x)) and Var(fi (x)) are the emulator expectation and variance, zi
are the observed data and M D and OE are the Model Discrepancy and
Observational Errors
We can then combine the implausibilities across outputs by maximizing over
outputs, as IM (x) = maxi I(i) (x).
Alternately, we can use a multivariate Implausibility measure:

I2(x) = (E(fi (x)) − zi )T (Var(fi (x)) + M D + OE)−1 (E(fi (x)) − zi ).
where Var(fi (x)), M D and OE are now the multivariate emulator variance,
multivariate model discrepancy and multivariate observational errors
respectively.

Summary of Results
We have completed Five Stages:
In later stages, we use a Multivariate Implausibility measure.
No. Model Runs No. Active Vars Adjusted R2

Space Remaining

Stage 1

1000

5

0.58 - 0.90

8.0 %

Stage 2

1916

8

0.83 - 0.98

2.9 %

Stage 3

1487

8

0.79 - 0.99

1.2 %

Stage 4

1899

10

0.75 - 0.99

0.12 %

In wave 5, we evaluate many good ﬁts to data, and we stop.
Some of these choices give simultaneous matches to data sets that the
Cosmologists have been unable to match before.

2D Implausibility Projections: Stage 1 (8%) to Stage 4 (0.12%)

Linking models to reality: exchangeability
We must compensate for our lack of knowledge about the state of dark matter
over all of time and space (which the simulator requires). This is typical of initial
condition/forcing function uncertainty and is a large factor in the mismatch
between model and reality.
What Galform provides is 520 exchangeable computer models, f r (x) (one for
each description of dark matter). The exchangeability representation for
random functions allows us to express each function as

f r (x) = M (x) ⊕ Rr (x)
M (x) is the mean function
Rr (x) are the (uncorrelated, exchangeable, mean zero) residual functions.
If we consider dark matter in our universe to be exchangeable with the 520
individual simulations, then our emulator for Galform evaluated on the correct
dark matter conﬁguration is
∗
∗
f ∗ (x) = (MB + RB )g(x) + Me (x) + Re (x)

We cannot evaluate this simulator (because we don’t know the appropriate dark
matter conﬁguration) but we can emulate it, based on a detailed analysis of the
Galform experiments.

Reification
Consider both our inputs x and the simulator f as abstractions/simplifications
of real physical quantities and processes (through approximations in physics,
solution methods, level of detail, limitations of current understanding) to a much
more realistic simulator f ∗ , for which real, physical x∗ would be the best input,
in the sense that (y − f ∗ (x∗ )) would be judged independent of (x∗ , f ∗ ).
We call f ∗ the reified simulator (from reify: to treat an abstract concept as if it
was real). Our model f is informative for y because f is informative for a more
elaborate model f ∗
Suppose that our emulator for f is f (x) = Bg(x) + u(x)
Our simplest emulator for f ∗ might be
f (x, w) = B ∗ g(x) + u∗ (x) + u∗ (x, w) where we might model our
judgements as B ∗ = CB + Γ for known C and uncertain Γ, correlate u(x)
and u∗ (x), but leave u∗ (x, w), involving any additional parameters, w ,
uncorrelated.
Structured reification: systematic probabilistic modelling for all those aspects of
model deficiency whose effects we are prepared to consider explicitly.

Reiﬁcation

Model, f

①
①①
①
①①
①
①①
|①

Model
evaluations

‘Best’ input, x∗

Discrepancy

Measurement
error

f (x∗ )

/ Actual

■■
■■
■■
■■
■■
$

system

/

System
observations

Reiﬁcation

‘Best’ input, x∗

Discrepancy

Measurement
error

f∗

/ f ∗ (x∗ )

/ Actual

Model, f

①
①①
①
①①
①
①①
|①

Model
evaluations

system

/

System
observations

Structural reification
Usually, we can think more carefully about the reasons for our model’s
inadequacy.

• Often we can imagine specific generalisations for f with extra model

components and parameters giving an better model f ′ (x, v).
• Suppose f ′ (x, v0 ) = f (x). We might emulate f ′ ‘on top’ of f , i.e. as

fi′ (x, v) = fi (x) +

γik gik (x, v) + ui (x, v)
k

where gik (x, v0 ) = ui (x, v0 ) = 0.
• The reified emulator for fi∗ (x, v, w) would then be
∗
γik gik (x, v) + u∗ (x) + u∗ (x, v) + u∗ (x, v, w)
i
i
i

∗
βij gij (x) +
j

k

where we now model the relationship between the coefficients in the three
emulators.

The Zickfeld et al (2004) model of the Atlantic
∗
T1

Temperature forcing,

Fresh water forcing,

F1

∗
T3

∗
T2
Fresh water forcing, F2

3

Meridional
overturning,

m

1
South

4
Tropics

2
North

∗
∗
∗
• Model parameters: T1 , T2 , T3 , Γ, k (last two not shown above).
• Model outputs: Steady state (SS) temperature for compartments 1, 2 and 3,

SS salinity differences between compartments 1 2 and between 2 3.
crit
SS. overturning m, and critical freshwater forcing F1 .

One possible generalisation
An extra compartment at the ‘southern’ end denoting ‘Other oceans’.
∗
T1



qm
∗
T5 = T5
S5 = S0

F1

∗
T2

3

(1 − q)m
1A

Meridional
overturning,

qm

m

1B
Other oceans

∗
T3

South

4
Tropics

2
North

∗
• Two extra model parameters, T5 and q , with q = 0 returning us to the

original model.
• Same model outputs as before.

The system and the data
• Our system y is the Atlantic, but, due to the very aggregate nature of our

model it is more natural to use data from a much larger model (C LIMBER -2)
that has been carefully tuned to the Atlantic in a separate experiment. The
data from C LIMBER -2 comprises SS. temperatures, SS. salinity differences
and SS. overturning. We write

y = f ∗ (x∗ , v ∗ ) + ǫ∗ and z = Hy.
• For the discrepancy,


Varǫ∗ =
i

0.842



0.0752

3.302



0.0442

i = 1, 2, 3[SS Temps]
i = 4, 5[SS Salinity Diﬀerences]
i = 6[SS Overturning]
crit
i = 7[F1 ].

Some aspects of our emulation
crit
• Our emulator for F1 (x) is (×10−3 )
∗
∗
∗
∗
∗
87 − T2 + 68(T1 − T2 ) − 5(T3 − T1 ) + 14 Γ + 21 k + u7 (x)

with Sd(u7 (x)) = 21 [based on 30 evaluations of the simulator]
• Assess the difference between f ′ and f , and between f ∗ and f ′ as

∆′ = E(Var(f ′ (x∗ , v ∗ ) − f (x∗ ))|x∗ , v ∗ )

∆∗ = E(Var(f ∗ (x∗ , v ∗ , w∗ ) − f ′ (x∗ , v ∗ ))|x∗ , v ∗ , w∗ )
crit
For F1 our modelling leads to the following assignments:

SD(f (x∗ )) = 0.079
√
∆′ = 0.033
√
∆∗ = 0.066
SD(ǫ∗ ) = 0.044

crit
Our resulting predictions for F1

1.

From 30 evaluations of f , plus the reiﬁed statistical modelling, our prior
prediction is
crit
F1 = 0.085 ± 0.224 (mean ±2 std. dev.)

2.

Using the C LIMBER -2 data z , then the Bayes linear adjusted prediction is
crit
F1 = 0.119 ± 0.099

3.

[Our conclusions are ‘reasonably insensitive’ to moderate tweaks to our
prior values, and our modelling survives ‘reasonable diagnostic testing’.]
Model Design The effect of making many more evaluations of current
crit
simulator f would be to reduce uncertainty about F1 by around 2%.
In comparison, the effect of constructing and evaluating the 5 compartment
crit
model f ′ would be to reduce uncertainty in F1 by around 10%.

History matching for tipping points
The methods that we have described involve emulation of the computer
simulator, treated as a smooth function of the inputs.
If simulator output is a time series with more than one form of limiting behaviour
(e.g. collapse or non-collapse of the MOC), then often the input space divides
into regions Ri such that simulator output is smooth within each region, but
discontinuous across boundaries.
To identify qualitative behaviour consistent with the simulator and historical
observation, we may aim to eliminate all regions but one.
We may be able to do this by
(i) history matching based on early parts of the time series, if early time
simulator behaviour is smooth across regions, or
(ii) by introducing observations on additional outputs, which allow improved
history matching.
Here, we outline a general method when neither of these possibilities applies.

Illustration: History matching for Zickﬁeld’s model
For illustration, we apply the method to Zickﬁeld’s four-box model.
We run the model forward from the present day to equilibrium.
We want to identify whether the simulator suggests collapse, or not, given
current history.
There are many analyses that we could perform.
Although the model is fast to evaluate, we aim to produce methods that are
applicable for expensive models.
(In practice, we would use many evaluations of fast models to create informed
priors for the small number of evaluations of the slow models.)
Therefore, in this illustration,
(i) we use simulator ensembles of 25 runs per iteration and
(ii) we suppose that we collect and re-analyse data once every ten years.

Reminder: the Zickfeld model of the Atlantic
∗
T1



F1

∗
T3

∗
T2

3

Meridional
overturning,

m

1
South

4
Tropics

2
North

∗
∗
∗
• Model parameters: T1 , T2 , T3 , Γ, k
• Model outputs: Temperature for compartments 1, 2 and 3,

Salinity differences between compartments 1 2 and between 2 3.
Overturning m.

Four-box model - Zickﬁeld et al (2004)

˙1 = m (T4 − T1 ) + λ1 (T ∗ − T1 )
T
1
V1
˙2 = m (T3 − T2 ) + λ2 (T ∗ − T2 )
T
2
V2
˙3 = m (T1 − T3 ) + λ3 (T ∗ − T3 )
T
3
V3
˙4 = m (T2 − T4 )
T
V4
˙1 = m (S4 − S1 ) + S0 F1
S
V1
V1
˙2 = m (S3 − S2 ) + S0 F2
S
V2
V2
˙3 = m (S1 − S3 ) + S0 (F1 − F2 )
S
V3
V3
˙4 = m (S2 − S4 )
S
V4

m = k [β(S2 − S1 ) − α(T2 − T1 )]
Γ
λi =
, i = 1, . . . , 4
cρ0 zi
Variables of interest:

• ’overturning’ m
• freshwater ﬂux into the tropical
•

box F1
thermal coupling constant Γ

We treat m as a function of F1 and Γ

x = (F1 , Γ)
m(t) = f (x, t), t ≥ 0
meq = m(t → ∞)

Time series

45

40

35

Overturning (Sv)

30

25

20

15

10

5

0
0

100

200

300
Time (years)

400

25 runs of the model for different choices of x.
Runs which collapse are in red.

500

600

Classification and emulation
Suppose the simulator output is a time series f (x, t)
Suppose there are two equilibrium regions for f , R1 , R2 , whose qualitative
behaviour is different (eg collapse/non-collapse).
We may proceed as follows.
(i) Choose a training sample X = (x1 , ..., xm ).
Run the simulator to equilibrium for each member of X .
Separate X into X1 and X2 , depending on the region for the equilibrium value.
(ii) Construct a probabilistic classifier, P(x ∈ R1 ), based on the training
sample.
Divide the input space into 3 regions, RX1 , RX2 , RX0 , depending on
whether the probabilistic classifier assigns high, low or medium probability that
x ∈ R1 , for each input choice, x.
(iii) Build emulators f1 (x, t), f2 (x, t) for
x ∈ RXi∗ = (RXi ∪ RX0 ), i = 1, 2, respectively.
[We may want/need to expand our traing sample for this.]

Equilibrium

0.2
40
0.15
35
0.1
30
0.05

F1

25
0
20
-0.05
15
-0.1

10

-0.15

-0.2
20

5

30

40

Γ

50

60

70

Equilibrium overturning value for each value of x.
Collapsed values in red. (The boundary is sharp.)

Classifying the equilibrium
Training Sample
0.2

40

0.15

0.1

30

0.05

Region 1

0.15

35

0.1

0.05

25
F1

F1

Region separation
0.2

0

20

-0.05

15

-0.05

-0.1

10

-0.1

-0.15

5

-0.15

-0.2
20

40

Γ

60

0

-0.2
20

Gray Area

40

Γ

60

LEFT Equilibrium values for the 25 runs in the training sample.
Collapsed values in red.

Region 2

Classifying the equilibrium
Training Sample
0.2

40

0.15

0.1

30

0.05

Region 1

0.15

35

0.1

0.05

25
F1

F1

Region separation
0.2

0

20

-0.05

15

-0.05

-0.1

10

-0.1

-0.15

5

-0.15

-0.2
20

40

Γ

60

0

-0.2
20

Gray Area

40

Γ

60

Region 2

RIGHT Probabilistic classiﬁer, gives posterior probability for each region, given
training sample. Light green area where posterior probability not near one for
either region.

Emulated time series

45
40
35

Overtunring (Sv)

30
25
20
15
10
5
0
0

100

200

300
time (years)

400

500

600

In region 1, we emulate as f1 (x, t) ∼ GP (a1 + a2 t + a3 t2 + a4 t3 , Σ1 ) and
in region 2, as f2 (x, t) ∼ GP (b1 + b2 t + b3 t2 , Σ2 ) where ai and bj are
functions of x and Σi = si exp (−(x − x′ )′ Pi (x − x′ )).


45
40
35

Overtunring (Sv)

30
25
20
15
10
5
0
0

100

200

300
time (years)

400

500

600

For illustration, we choose as value for x∗ , the present-day values used in the
CLIMBER 2 model, and display the emulated future overturning time series
(mean and 2 sd bands) under emulators 1 (collapse) and 2 (no collapse).


45
40
35

Overtunring (Sv)

30
25
20
15
10
5
0
0

100

200

300
time (years)

400

We add the actual time series for this choice x∗ .

500

600


45
40
35

Overtunring (Sv)

30
25
20
15
10
5
0
0

100

200

300
time (years)

400

500

600

Continuing the illustration, we now add a ‘plausible’ synthetic observed data
series, by ﬁrst adding correlated model discrepancy then observational error.

Iterative history matching
(iv) History match observed historical data, using emulator fi over
RXi∗ , i = 1, 2.
[(i) We might choose to assess different structural discrepancies for the two
cases.
(ii) We might make some additional runs of f (x, t) just over historical time - if
this is much faster than running to equilibrium.]
Remove implausible values of x from RXi , based on history match using fi ,
and from RX0 if implausible given both matches.
(v) Recompute probabilistic classiﬁer over reduced input space, re-assess
regions RX1 , RX2 , RX0 , and re-emulate f1 , f2 .
[We might want/need to resample f (x, t).]
(vi) Repeat the sequence:
sample function, classify, emulate, history match.
If we are only interested in qualitative behaviour, we stop when RX0 and one
of RX1 , RX2 is empty.
(vii) We may not be able to achieve this with current information, so we may
need to repeat this process every time period as new data comes in.


45
40
35

Overtunring (Sv)

30
25
20
15
10
5
0
0

100

200

300
time (years)

400

500

600

Now observe ﬁrst 10 years of this data and history match using each emulator.
Resample 25 new values for x, re-evaluate probabilistic classiﬁer, and
re-assess each emulator.

Implausibility

Implausibility - Region 1

Implausibility - Region 1 - Cutoff 3

0.2
30

0.15

0.15
25

0.05

0.1

20

0.1

0.05

0

F1

F1

Implausible

0.2

0

15
-0.05

-0.05
10

-0.1

5

-0.15
-0.2
20

-0.1

30

40

50
Γ

60

-0.15
-0.2
20

70


40

50
Γ

60

70

Not implausible


0.2

18
16

0.15

0.1

14

0.1

0.05

12

Implausible

0.2

0.15

10

0

0.05
F1

F1

30

0

8
-0.05

-0.05
6

-0.1

4

-0.15
-0.2
20

2
30

40

50
Γ

60

70

-0.1
-0.15
-0.2
20

30

40

50
Γ

60

70

Not implausible

Left hand side Implausibility plot using each of the two emulators.
[White line separates regions. White cross is x∗ .]

Implausibility



0.2
30

0.15

0.15
25

0.05

0.1

20

0.1

0.05

0

F1

F1

Implausible

0.2

0

15
-0.05

-0.05
10

-0.1

5

-0.15
-0.2
20

-0.1

30

40

50
Γ

60

-0.15
-0.2
20

70


40

50
Γ

60

70

Not implausible


0.2

18
16

0.15

0.1

14

0.1

0.05

12

Implausible

0.2

0.15

10

0

0.05
F1

F1

30

0

8
-0.05

-0.05
6

-0.1

4

-0.15
-0.2
20

2
30

40

50
Γ

60

70

-0.1
-0.15
-0.2
20

30

40

50
Γ

60

70

Not implausible

Right hand side Implausible points in red, not implausible in blue.
[White line separates regions. White cross is x∗ .]

Revised emulators

35
30

overturning (Sv)

25
20
15
10
5
0
0

100

200

300
time (years)

400

500

600

The two revised emulators for f (x∗ ).
Note that the actual series for f (x∗ ) is now well within the appropriate
emulator.
Before 30 years, we can identify that the model and data are inconsistent with
collapse.

Time to classiﬁcation

0.15

54.6

33.1

0.1

20.1
0.05
12.2
0

7.4

4.5

-0.05

2.7
-0.1
1.6
-0.15
20

30

40

50

60

70

0

Repeating this process each ten years (observe the data, sample 25 new
values, re-emulate and history match), for each possible choice for x∗ , we plot
the time at which we identify, for sure, the region (collapse or no collapse),
containing this value.

Concluding comments
We have discussed some of the many aspects of uncertainty analysis for
complex physical systems modelled by computer simulators. In particular:
Multi-output emulation
Structural analysis relating simulators to the world
Iterative history matching.
These methods are widely applicable. In particular, they open a variety of ways
of investigating tipping points systematically.

References
I. Vernon, M. Goldstein, R. Bower (2010), Galaxy Formation: a Bayesian
Uncertainty Analysis (with discussion), Bayesian Analysis, 5(4): 619–670.
M. Goldstein and J.C.Rougier (2009). Reiﬁed Bayesian modelling and
inference for physical systems (with discussion), JSPI, 139, 1221-1239
J. Cumming, M. Goldstein (2009) Small Sample Bayesian Designs for
Complex High-Dimensional Models Based on Information Gained Using Fast
Approximations, Technometrics, 51, 377-388
M. Goldstein Subjective Bayesian analysis: principles and practice (2006)
Bayesian Analysis, 1, 403-420 (and ‘Rejoinder to discussion’: 465-472)
M. Goldstein, D.A. Wooff (2007) Bayes Linear Statistics: Theory and
Methods, Wiley
M.C. Kennedy, A. O’Hagan (2001). Bayesian calibration of computer models
(with discussion). Journal of the Royal Statistical Society, B,63, 425-464
J. J. Bissell, C. C. S. Caiado, M. Goldstein, B. Straughan (2013)
Compartmental modelling of social dynamics with generalised peer incidence,
Mathematical Models and Methods in Applied Sciences, to appear.

Bayesian Uncertainty Analysis with Application to Tipping Points

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (12)

Similar to Bayesian Uncertainty Analysis with Application to Tipping Points

Similar to Bayesian Uncertainty Analysis with Application to Tipping Points (20)

Recently uploaded

Recently uploaded (20)

Bayesian Uncertainty Analysis with Application to Tipping Points