Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017

A Bird’s Eye View of Statistics for Remote Sensing Data
Noel Cressie
National Institute for Applied Statistics Research Australia
University of Wollongong
(ncressie@uow.edu.au)
2013-2017: Distinguished Visiting Scientist at
NASA’s Jet Propulsion Laboratory (JPL)
An earlier version of this talk was given at the March 2013 OCO-2 Science Team Meeting
Acknowledgement: Many discussions with JPL’s Amy Braverman, Hai Nguyen,
Jon Hobbs, Mike Gunson, and Mike Turmon
Cressie (UOW) Statistics for Remote Sensing Data 1 / 28

Satellite remote sensing
In science, which came first, the hypothesis (the chicken) or the data
(the egg)?
Data and a creative mind can lead to hypothesis generation. More
(and more) data should lead to better (and better) scientific
hypotheses, but we should not forget Fisher’s Design Principles
(blocking, randomisation, and replication).
In satellite remote sensing, we want to make use of unprecedented
data resources (i.e., big data) to reveal, quantify, and validate
scientific hypotheses (i.e., models) in the presence of multiple sources
of uncertainty.
Environmental Informatics: Uses tools from statistics, mathematics,
computing, and visualization to analyze environmental (e.g., remote
sensing) data.
Statistics: I propose that we use conditional probabilities to quantify
uncertainties.

NASA’s Earth observation satellites
(Credit: Illustration from NASA)

OCO-2
Throughout this talk, I shall use the example of remote sensing of
atmospheric CO2 using OCO-2 data (OCO-2 launched on July 2,
2014).
OCO-2 has sensitivity to near-surface CO2 and a small “footprint” of
1.1 × 2.25 km.
It has a repeat cycle of 16 days: Its geographic coverage is low
because of a narrow swath, but its spatial resolution is high.
There is no CO2 “thermometer”! Data are radiances taken from three
bands of the E-M spectrum: the strong CO2 band, the weak CO2
band, and the O2 A-band. These are called Level 1 data.
A forward model that relates radiances to atmospheric CO2 is used to
help solve the inverse problem of inferring the atmospheric state from
the observed radiances.

OCO-2 launch in July 2014
OCO-2 is NASA’s ﬁrst mission completely devoted to measuring CO2 in
the atmosphere. Its measurements have high near-surface sensitivity and
very ﬁne spatial resolution.
OCO-2 launch: 02:56AM (PDT), July 2, 2014
(Credit: Photo from NASA)

Astronomy picture of the day
(Image credit: Rick Baldridge)

OCO-2 in orbit
(Credit: Illustration from NASA)

Retrievals in the presence of uncertainties
Use conditional probabilities in a hierarchical statistical model to infer CO2
from a single footprint’s observed radiances, Y.
Data model (called “the forward model” by OCO-2):
Y = F(X; θ) + ε ,
where X is the atmospheric state, which includes CO2 values at a
range of geopotential heights, and ε ∼ Gau(0, Sε).
Process model (called “the prior” by OCO-2):
X ∼ Gau(Xα, Sα) .
Parameter model (largely absent from OCO-2):
θ is ﬁxed (based on calibration).
(θ includes, e.g., forward-model parameter errors. It should also
include uncertainties in F and process-model uncertainties.)
Inference is usually on the state X: The predicted state, ˆX, is called a
retrieval. (There is almost no eﬀort put into inference on θ.)

Hierarchical statistical modeling (HM)
Step away from remote sensing for the moment to get a broader
perspective on uncertainty quantiﬁcation in science.
Let Y be the data, X be the process (or state) of interest, and θ be
unknown parameters. (For example, in my research in spatio-temporal
statistics, Y might have dimension 106 − 109, X might be of the
same order, and θ might have dimension 102 − 104.)
[A|B] denotes the conditional distribution of generic quantity A, given
generic quantity B; and [B] denotes the distribution of B.
[A,B] denotes the joint distribution of A and B. Then
[A,B] = [A|B] · [B].

HM captures sources of uncertainty
Sources of uncertainty: the data, the process, and the parameters.
All uncertainties can be expressed through the joint distribution,
[Y , X, θ]. From the previous slide,
[Y , X, θ] = [Y , X|θ] · [θ]
= [Y |X, θ] · [X|θ] · [θ]
Data model: [Y |X, θ]
Process model: [X|θ]
Parameter model: [θ]
Inference is on X (and θ) through the posterior distribution,
[X, θ|Y ] = [Y |X, θ][X|θ][θ]/[Y ].
This is known as Baye’s Rule, and it relies on knowing the normalizing
constant, [Y ]. An alternative strategy is to simulate from [X, θ|Y ].

Predictive distribution
As a concept, the predictive distribution is diﬀerent from the posterior
distribution. In words, it is the conditional distribution of the process X
given the data Y . What about θ?
Three cases:
0 θ is known, and hence the parameter model is degenerate at θ. The
posterior distribution is [X|Y , θ] – since θ is known, this is also the
predictive distribution.
1 θ is ﬁxed but unknown, and it is estimated from the data Y ; call the
estimate θ. The parameter model is assumed degenerate at θ, and the
(empirical) predictive distribution is [X|Y , θ].
2 θ is unknown, and its uncertainty is captured with the parameter model
[θ]. The posterior distribution is [X, θ|Y ], and the predictive
distribution is [X|Y ], after marginalizing over θ.
Case 0 is often unrealistic; Case 1 is called empirical hierarchical
modeling (EHM); and Case 2 is called Bayesian hierarchical modeling
(BHM).

States of knowledge of θ
The three cases can be deﬁned in terms of “states of knowledge” on θ.
Case 0: θ is known; often an unrealistic state of knowledge.
Case 1: θ is ﬁxed but unknown; classical frequentist state of
knowledge of θ. This results in empirical hierarchical modeling
(EHM).
Case 2: θ is unknown and has probability distribution [θ]; Bayesian
state of knowledge of θ. This results in Bayesian hierarchical
modeling (BHM).

Inference on the state X
Since X is uncertain, it is modeled as a random quantity. Inference on
a random quantity is sometimes called “prediction.” This explains the
terminology, predictive distribution; for the three cases, it is:
Case 0: [X|Y , θ] = [Y |X, θ] · [X|θ]/[Y |θ]
Case 1: [X|Y , θ] = [Y |X, θ] · [X|θ]/[Y |θ],
Case 2: [X|Y ] = [Y |X, θ] · [X|θ] · [θ]dθ/[Y ],
and it is not always the same as the posterior distribution.
Inference on X should be based on the predictive distribution. This is
fundamental to Uncertainty Quantiﬁcation (UQ)!
In remote sensing, each footprint has an X. Typically, the retrieved
state ˆX is the mode of the predictive distribution; to my knowledge,
only Case 0 or Case 1 have been considered in the literature.
XCO2 is the average CO2 (in ppm) in the atmospheric column with
base given by the footprint; ˆXCO2 is its prediction obtained from the
predictive distribution of X given Y.

One footprint: Radiance vector Y
Figure: ABO2, WCO2, and SCO2 data on 6 August, 2014 (ﬁrst light)

Orbit geometry of OCO-2

ˆXCO2 retrievals (August 1-17, 2015)

Predictive distribution, ctd
There is often too much information in [X|Y ], which is a (possibly
high-dimensional) probability distribution.
The predictive mean and predictive variance (equivalently, predictive
covariance for multivariate X) are often chosen as summaries of the
predictive distribution:
E(X|Y ) = X[X|Y ]dX
var(X|Y ) = XXT
[X|Y ]dX − E(X|Y )E(X|Y )T
.

Predictive distribution, ctd
If X1, . . . , XK is a sample from the predictive distribution [X|Y ], then:
E(X|Y )
1
K
K
k=1
Xk ≡ XK ,
and
var(X|Y )
1
K
K
k=1
XkXT
k − XK X
T
K ≡ CK .
As mentioned above, the predictive mode:
mode(X|Y ) ≡ arg max
X
[X|Y ] ,
is a summary that is often chosen in satellite missions.
Twenty-ﬁrst-century strategy: Learn how to sample X1, . . . , XK from
the predictive distribution, [X|Y ], and approximate any summary of it
for K large.

Satellite remote sensing, revisited (1)
What is the process X we are really interested in?
X ≡ {X(x, y, h; t) : (x, y) ∈ Dg , h > 0, t ∈ Dt};
(x, y) = (lon, lat) on the geoid Dg ; h is geopotential height; t is a time
slice (e.g., a week) during a time period of interest Dt (e.g., a given
season).
X is a four-dimensional field of atmospheric CO2. That is, we are
interested in atmospheric CO2 everywhere and at any time.
Now {X(x, y, Psurf (x, y); t) : (x, y) ∈ Dg , t ∈ Dt} is the CO2 field at
Earth’s surface, where Psurf (x, y) denotes the surface pressure at
(x, y) ∈ Dg . Then define the surface flux,
∆(x, y; t) ≡
∂
∂t
X(x, y, Psurf (x, y); t) ,
and the CO2 surface-flux field,
XF ≡ {∆(x, y; t) : (x, y) ∈ Dg , t ∈ Dt}.

Our goal is to infer the fields X, or XCO2 (defined further on), or XF .
What are the data?
Y ≡ {Y(xi , yi , ; ti ) : i = 1, · · · , n},
where Y(xi , yi , ; ti ) is the vector of radiances for the i-th sounding,
(xi , yi ) ∈ Dg , and ti ∈ Dt, for i = 1, · · · , n.
Since XCO2 is derived from a column average of X (over h), it is a
three-dimensional (lon, lat, time) latent field.
OCO-2 creates a “contracted” dataset:
Y ≡ {XCO2(xi , yi ; ti ) : i = 1, · · · , n},
where for i = 1, . . . , n, XCO2(xi , yi ; ti ) is the OCO-2 algorithm’s
estimated column-averaged CO2 at (xi , yi ) ∈ Dg and ti ∈ Dt. The
dataset ˜Y is central to the OCO-2 satellite’s retrieval protocol.

First light (8/6/14): Y(x1, y1; t1)
Figure: ABO2, WCO2, and SCO2 data on 6 August, 2014 (ﬁrst light)

A contracted dataset of ˆXCO2 : ˜Y

At the next level, called Level 2, OCO-2 focuses on inferring not the
four-dimensional ﬁeld X, but the contracted set of state values:
X ≡ {XCO2(xi , yi ; ti ) : i = 1, · · · , n},
where formally the XCO2 ﬁeld is:
XCO2(x, y; t) ≡
1
Psurf (x, y)
Psurf (x,y)
0
X(x, y, h; t)dh .
If [X|θ] = n
i=1[XCO2(xi , yi ; ti )|θ] (i.e., if there is statistical
independence within X), then
[X|Y , θ] =
n
i=1
{[XCO2(xi , yi ; ti )|XCO2(xi , yi ; ti ), θ]·[XCO2(xi , yi ; ti )|θ]},
and it is appropriate that inference proceeds on a
sounding-by-sounding basis. Is the independence assumption
reasonable?

Because of atmospheric transport of CO2, the four-dimensional ﬁeld
X is spatio-temporally dependent. Then the three-dimensional ﬁeld,
XCO2 ≡ {XCO2(x, y; t) : (x, y) ∈ Dg , t ∈ Dt},
is also spatio-temporally dependent. Recall that X consists of just n
values of XCO2.
Since XCO2 is spatio-temporally dependent, so too is ˜X. Then
[X|θ] =
n
i=1
[XCO2(xi , yi ; ti )|θ],
and hence inference on X, or XCO2, or X, should not proceed on a
sounding-by-sounding basis. But it does!

Satellite Remote Sensing, Revisited (5)
Sounding-by-sounding retrievals are almost ubiquitous in satellite remote
sensing. Given this, how should Uncertainty Quantiﬁcation proceed?
The data are Y ; OCO-2 “smooths” or “processes” them, and the
estimate of ˜X is
Y = {XCO2(xi , yi , ; ti ) : i = 1, · · · , n} .
I suggest we do something diﬀerent. By contracting and marginalizing
[X|Y , θ], the predictive distribution, [ ˜X|Y , θ], can be obtained.
Consequently, the retrieval, ˆXCO2(xi , yi ; ti ), should be obtained from
all Y , not just Y(xi , yi ; ti ). This is sometimes called a joint retrieval,
but I have never seen it actually done.

Satellite Remote Sensing, Revisited (6)
We want to make inference on the CO2 field X, or its surface
derivative, XF , the surface-flux field. Then the predictive distribution
for data Y is:
[X|Y , θ] ∝ [Y |X, θ] · [X|θ].
If we use data Y (i.e., the XCO2 values), then we should build the
conditional probability models, [Y |X, θ], [X|θ] (and eventually worry
about θ). Notice that the process model, [X|θ], has not changed.
We need to build the spatio-temporal process model [X|θ]. A
geostatistical model that involves atmospheric transport is one choice;
a spatio-temporal random effects (SRE) model is another choice.
Recall XCO2 and XF . Inference on XCO2 is called Level 3
estimation, and inference on XF is called Level 4 estimation.

Science goal: Improve carbon-cycle knowledge
(Credit: IPCC 5th Assessment Report, Figure 6.1)

Concluding remarks
The principal science objective of OCO-2 is to predict a global geographic
distribution of CO2 sources and sinks (i.e., XF ) at Earth’s surface.
Sources (e.g., fires and respiration, fossil fuels, freshwater outgassing,
volcanism) and sinks (e.g., oceans, photosynthesis, soils) and their
changes over time are not known at high enough spatial resolution to
develop mitigation strategies.
XCO2(s; t) is a measurement of column-averaged CO2 in ppm, for a
retrieval located at s on the geoid, at time t.
Data-assimilation schemes invert spatio-temporal measurements, ˜Y ,
of XCO2 to infer the flux process, XF , of Earth’s CO2 sources and
sinks.
Uncertainty is present at each level, which goes from energies measured by
the OCO-2 instrument (Level 1) all the way to inferences about surface
fluxes of CO2 (Level 4). The uncertainties need to be quantified in order
to obtain a science result (e.g., that an estimated source is real).

YouTube Video
FRK of CO₂ Satellite (OCO-2) Data: 2014-10-01 to 2017-03-01
Clint Shumack, Andrew Zammit Mangion, and Noel Cressie
University of Wollongong
https://www.youtube.com/watch?v=wEws67WXvkY

Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017

More Related Content

What's hot

Viewers also liked

Similar to Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017

More from The Statistical and Applied Mathematical Sciences Institute

Recently uploaded

Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017