QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit Apte, Feb 26, 2018

Data assimilation Section 0:
Monte Carlo Techniques in Earth Sciences
Data assimilation
Amit Apte
International Centre for Theoretical Sciences (ICTS-TIFR)
Bangalore, India
SAMSI workshop, 26 Feb 2018
movies shown earlier are from Philip Brohan
https://vimeo.com/170761410
https://vimeo.com/170971015
Data assimilation Amit Apte (ICTS-TIFR, Bangalore) ( apte@icts.res.in ) page 1 of 30

Data assimilation Section 0:
Outline
*
1 An introduction to data assimilation
2 Mathematical basis of data assimilation
3 Sampling: numerical technique for approximating the posterior
* random images from google!

Data assimilation Section 1: An introduction to data assimilation
Outline

A few random(!) questions
When is the ﬁrst total solar eclipse in India after 2100?
What will be the closest approach of Halley’s comet 2060?
How many times in the next hour will a double pendulum reach the
apogee? What will be the angle of a double pendulum after 5 min.,
10 min., ...?
Breaking waves – which wave will reach you?
What will be the min/max temperatures in ﬁve largest cities in India,
tomorrow, day-after, over the next month??
What will be the major stock exchange indices tomorrow?
What will be the number of cars that will enter the golden gate
bridge in next 30 minutes?
Who will be the prime minister of India in 2020? In 2030?
How many nuclei from a given piece of U235 will decay in next 10
minutes? ...

Two essential ingredients for describing reality
Physical theories ←→ mathematical models
In order to understand this:
we first need to understand:
Fluid and thermo-dynamics
Ocean model ≡ appropriate approximation
and numerical implementation
“physical parameters” – Bathymetry
(depth of ocean) and coastline; Specific
heat of water; etc.
external forcing – Wind, temperature,
humidity of the atmosphere, inflow of
river water
parametrization of “unresolved
processes”
Even all of the above is NOT sufficient!
data assimilation – using the
measurements from the ocean

Data, of course, provide a crucial link to reality
We have a large number of observations from satellites, ships, weather
stations etc., but they are
not uniformly distributed either in space or time
quite sparse (e.g. much less in southern hemisphere)
could depend in a complicated way on the atmospheric conditions
(satellite data)
Thus, the observations are insufficient to specify the model variables
completely (and to describe the state in the physical theory).
→ under-determined, ill-posed inverse problem
A Note: This is the problem of studying a specific instance (or realization) – this specific planet.
So the chain of interactions
physical theories ↔ models ↔ data
for complex systems such as the planet leads to:

What is data assimilation?
The art of optimally incorporating
partial and noisy observational data of a
chaotic, nonlinear, complex dynamical system with an
imperfect model (of the data and the system dynamics) to get an
estimate and the associated uncertainty for the system state
——————————————————————————————–
8MQI
XVYI
XVENIGXSV]
SFWIVZEXMSRW
L SFW
JYRGXMSR
SFW IVVSV
IRWIQFPI
JSVIGEWX
YTHEXIH
IRWIQFPI
EVVS[W MRHMGEXI HEXE
EWWMQMPEXMSR TVSGIWW SFW
WTEGI
WXEXI
WTEGI

Data assimilation is a estimation problem.
Estimation of state, in time, repetitively.
Breaking waves – which wave will reach you? (insurance)
What will be the min/max temperatures in ﬁve largest cities in India,
tomorrow, day-after, over the next month? (planning)
What will be the average temperature in Bangalore, month by month,
in 2050, or up to 2050? (design)
A few characteristics of data assimilation problems:
Good physical theories, but not necessarily good models
Systems are nonlinear and chaotic (usually deterministic)
Multiscale – temporal and spatial – dynamics
Observations of the system are
noisy
partial (sparse)
discrete in time

Main ingredients
A dynamical model: given the state x(t) ∈ Rd at any time t, gives
the state x(s) at any later time s > t: Lorenz-63, Lorenz-96, etc. (for
synthetic data studies, d = 3 or d = 40 etc.) or general circulation
models (for ocean / atmosphere / coupled d = 107 or d = 104)
Observations y1 ∈ Rp at time ti, for i = 1, . . . , T (typically p d)
Observations are partial (with gaps), noisy, discrete in time
Observation operator h : Rd → Rp to relate the model variables at
time t with observations at the same time: if the state were x(t), the
observations without noise would be h(x(t))
Observational “errors”: need to account for the diﬀerence between
how the real system is represented in the model (representativeness
error) and the instrumental uncertainty (noise)

How do we represent uncertainty? Using probabilities!
p(x)dx is the probability of a state x
p(x, y)dxdy is the joint probability of the state x and observation y

Probability densities like this in 10x dimension are diﬃcult to represent.
CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1260349 and
By Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145

But densities can be represented by “samples” (the dots below)
CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=1260349 and
By Bscan - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=25235145

Main concept that you need to remember - conditional probability
p(x|y) =
p(x, y)
p(y)

If and only if two random variables are correlated, information about one
gives some information about the other
mean of
p(x|y=3)
is ~= 1.0

If and only if two random variables are correlated, information about one
gives some information about the other
mean of
p(x|y=3)
is ~= 1.0
That’s it: that is data assimilation!

So what is the big deal!? Ah... time
Unfortunately, the x and y in the previous slide are all time dependent...
so we should really be watching a movie of the probability densities, rather
than images shown earlier!

Data assimilation Section 2: Mathematical basis of data assimilation
Outline

Nonlinear ﬁltering ≡ data assimilation
Consider a stochastic dynamical model
xt+1 = m(xt) + ζt with x0 unknown
Thus we assume a probability density pa(x0) for the initial condition.
We will consider the problem of “estimating” the state x at some
time t given observations at times 1, 2, . . . , N.

Smoothing: Obtain a state estimate xt for t < N using all the
observations up to time N; In particular, determine x0
Filtering: Obtain a state estimate xN using observations up to time N
Prediction: Obtain a state estimate xt for t > N (the time horizon of
prediction is important).

In most applications in earth sciences, data is collected “all the time”
so the most relevant problem is of filtering.
Predictions are obtained by using the filtering solution as “initial
conditions” for the appropriate PDE of interest (hence the common
view that data assimilation is the problem of finding initial
conditions).

Or data assimilation ≡ determination of posterior i.e.
conditional distribution given the observations
Observations yt at time t depend on the state at that time.
yt = h(xt) + ηt t = 1, . . . , N
h is called the observation operator. ηt is observational noise. Eventually
we will assume independence between ηt and ζt.
Probabilistic statement of Data assimilation problem: ﬁnd the posterior
distribution of the state conditioned on the observations
Smoothing: p(xt|y1, y2, . . . , yN ) for t < N
Filtering: p(xN |y1, y2, . . . , yN )
Prediction: p(xt|y1, y2, . . . , yN ) for t > N

Two-step process for obtaining the ﬁltering density

Filtering density: obtained in a two step process
A notation: y1:t = {y1, y2, . . . , yt} and x1:t = {x1, x2, . . . , xt}
The ﬁrst step is “prediction”
Suppose we have the probability pa(x1:t|y1:t) of states x1:t up to time
t conditioned on observations y1:t up to time t, and recalling that
xt+1 = m(xt) + ζt (which is a Markov chain, with transition kernel
pm(xt+1|xt))
→ Then the probability pf (x1:t+1|y1:t) of the states x1:t+1 up to time
t + 1 conditioned on observations y1:t up to time t, is obtained by:
pf
(x1:t, xt+1|y1:t) = p(x1:t|y1:t) · p(xt+1|x1:t, y1:t)
↓ ↓
= pa
(x1:t|y1:t) · pm
(xt+1|xt)

Filtering density: obtained in a two step process
A notation: y1:t = {y1, y2, . . . , yt} and x1:t = {x1, x2, . . . , xt}
The next step is “update”
Given the above probability pf (x1:t+1|y1:t) of the states x1:t+1 up to
time t + 1 conditioned on observations y1:t up to time t, and recalling
yt+1 = h(xt+1) + ηt+1
→ Then the probability pa(x1:t+1|y1:t+1) of the states x1:t+1 up to
time t + 1 conditioned on observations y1:t+1 up to time t + 1 is given
by Bayes’ theorem:
pa
(x1:t+1|y1:t, yt+1) = p(x1:t+1|y1:t) · p(yt+1|x1:t+1, y1:t)
1
p(yt+1|y1:t)
↓ ↓
∝ pf
(x1:t+1|y1:t) · pη(yt+1|xt+1)

Two-step process for obtaining the ﬁltering density

Kalman ﬁlter: a “two moment” representation of the
Gaussian posterior in case of linear model
Suppose the model is linear m(x) = Mx, the observation operator is
linear h(x) = Hx, the initial distribution for x0 is Gaussian, as are the
stochasticity in the observations ηt and in the dynamical model ζt.
Kalman ﬁlter gives a recursion relation for the mean and covariance:
(xa
t , Ca
t ) for pa(xt|y1:t) and (xf
t+1, Cf
t+1) for pf (xt+1|y1:t):
“Update step” given by
xa
t = xf
t + K(yt − Hxf
t ) and Ca
t = (I − KH)Cf
t
Here K = Pf
t HT
(HPf
t HT
+ R)−1
is the Kalman gain matrix
“Prediction step” given by
xf
t+1 = Mxa
t and Cf
t+1 = MCa
t MT

Computational hurdles
Recall the recursive formulae for the exact or the Kalman filter
Exact filtering density
pa
(x1:t+1|y1:t+1) ∝ pa
(x1:t|y1:t) · pm
Kalman filter
xa
t = xf
t + K(yt − Hxf
t ) and Ca
t = (I − KH)Cf
t
xf
t+1 = Mxa
t and Cf
t+1 = MCa
t MT
Also recall: x ∈ Rd with d ∼ 106 − 107, and C is d × d matrix.
Essentially impossible to even store or forecast the covariance matrix!!
Sampling methods provide (seemingly) efficient ways to approximate the
above

Data assimilation Section 3: Sampling: numerical technique for approximating the posterior
Outline

Basic idea of sampling a density f(x)
Suppose X1, X2, . . . XN are N independent, identically distributed (IID)
random variables (RV). For any function g(x), deﬁne the sample mean of
g(x) to be
GN =
1
N
N
n=1
g(Xn)
Then
E[GN ] =
1
N
N
n=1
E[g(Xn)] = E[g(X)]
and
var[GN ] =
1
N2
N
n=1
var[g(Xn)] =
1
N
var[g(X)]
So, as N → ∞, var[GN ] → 0

Sample mean approximates the mean
Recall, E[GN ] = E[g(x)], and as N → ∞, var[GN ] → 0, thus
E[g(X)] =
∞
−∞
g(x)f(x)dx ≈
1
N
N
n=1
E[g(Xn)]
This is the basis for Monte Carlo integration and sampling methods.
For large enough N, we are guaranteed convergence! Justiﬁcation:
law of large numbers:
P {limN→∞GN = E[g(X)]} = 1
.
What about the error, for some given N, or how do we choose N if
we ﬁx an error tolerance?

Errors are given by Chebyshve inequality
P |GN − E[GN ]| ≥
var[GN ]
δ
1/2
≤ δ
But var[GN ] = var[g(X)]/N, which means:
the probability that the sample mean GN and the exact mean of g(X) diﬀer
by var[g(X)]/(δN) is no more than δ
Two ways to decrease the
error ≈
var[g(X)]
δN
increase the sample size N
decrease var[g(X)]
How can we decrease var[g(X)]? By a change of probability distribution
with respect to which we are taking the expections! This is the basic idea
of importance sampling

Importance sampling: change of measure!
First a sleight of hand: for any probability density p(x),
Ef [g(X)] = g(x)f(x)dx =
g(x)f(x)
p(x)
p(x)dx = Ep
f(X)g(X)
p(X)
So now, deﬁne ¯g(X) = f(X)g(X)
p(X) . If we take all expectations with respect
to the new probability density p(x)
varp[¯g(X)] =
f2(x)g2(x)
p2(x)
p(x)dx − E2
p[¯g(X)]
Check: the choice p(x) ∝ g(x)f(x) minimizes the variance!!
Not usable since we do not know normalization constant
But intuition is useful: choose p(x) to be as close to g(x)f(x) as
possible.

Importance sampling: weighted samples
Recall for any probability density p(x),
Ef [g(X)] = g(x)f(x)dx =
g(x)f(x)
p(x)
p(x)dx = Ep
f(X)g(X)
p(X)
If X1, X2, . . . XN are samples from p(X), then, to get the “correct”
estimate of g(X), we need to deﬁne a weighted mean:
GN =
1
N
N
n=1
wng(Xn) with wn =?
Check: E[GN ] = Ef [g(X)] (proof is essentially above.)
Heuristics: choose p(x) to be as close to g(x)f(x) as possible, but
easy to sample.

Computational opportunities
Recall the recursive formulae for the exact or the Kalman filter
Particle filters: importance sampling implementation of the following recur-
sion
pa
(x1:t+1|y1:t+1) ∝ pa
(x1:t|y1:t) · pm
Ensemble Kalman filter: Monte Carlo sampling version of KF (with a slight
(nonlinear) variation)
xa
nt = xf
nt + K(ynt − Hxf
nt) n = 1, . . . , N but not Ca
t = (I − KH)Cf
t
xf
n,t+1 = Mxa
nt n = 1, . . . , N but not Cf
t+1 = MCa
t MT

How do we get samples of functions of random variables
If we have samples X1, X2, . . . XN from a distribution for X, how do
we get samples from Z which is a function of X, e.g. Z = h(X)?
Let Zn = h(Xn). We need to show that these are indeed samples
from the distribution of Z!
How do we approximate E[r(Z)] for some function r(Z)?
HN =
1
N
N
n=1
r(Zn)
E[HN ] =
1
N
N
n=1
E[r(Zn)] =
1
N
N
n=1
E[r(h(Xn))] = E[(r ◦ h)(X)]
The samples from the distribution of a function h of the random variable
X are the function of the samples from the distribution of that random
variable.

Particle filter: a “weighted sample” representation of the
filtering recursion
pa
(x1:t+1|y1:t+1) ∝ pa
(x1:t|y1:t) · pm
Suppose we have a weighted sample {xi
t, wi
t}, i = 1, . . . , N from
pa(xt|y1:t), i.e., we approximate pa(xt|y1:t) ≈ N
i=1 wi
tδ(xt − xi
t).
If xi
t+1 is a sample from a “importance sampling density” q(x1+1|xi
t),
then the weighted sample {xi
t+1, wi
t+1}, i = 1, . . . , N approximates
the posterior at time t + 1 if we choose
wi
t+1 ∝ wi
t ·
pm(xi
t+1|xi
t) · pη(yt+1|xi
t+1)
q(xi
1+1|xi
t)
This is the main idea behind particle filtering

Summary
Data assimilation: the art of optimally incorporating
partial and noisy observational data of a
chaotic, nonlinear, complex dynamical system with an
imperfect model (of the data and the system dynamics) to get an
estimate and the associated uncertainty for the system state
Sampling (including importance sampling) provide efficient ways to
approach high dimensional data assimilation problems, with two
particularly useful methods:
particle filtering (PF)
Ensemble Kalman filtering (EnKF)

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit Apte, Feb 26, 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit Apte, Feb 26, 2018

Similar to QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit Apte, Feb 26, 2018 (20)

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Recently uploaded

Recently uploaded (20)

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit Apte, Feb 26, 2018