Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017
Remote-sensing data offer unprecedented opportunities to address Earth-system-science challenges, such as understanding the relationship between the atmosphere and Earth's surface using physics, chemistry, biology, mathematics, and computing. Statistical methods have often been seen as a hybrid of the latter two, so that a lot of attention has been given to computing estimates but far less to quantifying the uncertainty of the estimates. In my "bird's-eye view," I shall give a way to look at the problem using conditional probability models and three states of knowledge. Examples will be given of analyzing remotely sensed data of a leading greenhouse gas, carbon dioxide.
CLIM Undergraduate Workshop: Extreme Value Analysis for Climate Research - Wh...
Similar to Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017
Similar to Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017 (20)
Z Score,T Score, Percential Rank and Box Plot Graph
Program on Mathematical and Statistical Methods for Climate and the Earth System Opening Workshop, A Bird's-Eye View of Statistics for Remote Sensing Data - Noel Cressie, Aug 22, 2017
1. A Bird’s Eye View of Statistics for Remote Sensing Data
Noel Cressie
National Institute for Applied Statistics Research Australia
University of Wollongong
(ncressie@uow.edu.au)
2013-2017: Distinguished Visiting Scientist at
NASA’s Jet Propulsion Laboratory (JPL)
An earlier version of this talk was given at the March 2013 OCO-2 Science Team Meeting
Acknowledgement: Many discussions with JPL’s Amy Braverman, Hai Nguyen,
Jon Hobbs, Mike Gunson, and Mike Turmon
Cressie (UOW) Statistics for Remote Sensing Data 1 / 28
2. Satellite remote sensing
In science, which came first, the hypothesis (the chicken) or the data
(the egg)?
Data and a creative mind can lead to hypothesis generation. More
(and more) data should lead to better (and better) scientific
hypotheses, but we should not forget Fisher’s Design Principles
(blocking, randomisation, and replication).
In satellite remote sensing, we want to make use of unprecedented
data resources (i.e., big data) to reveal, quantify, and validate
scientific hypotheses (i.e., models) in the presence of multiple sources
of uncertainty.
Environmental Informatics: Uses tools from statistics, mathematics,
computing, and visualization to analyze environmental (e.g., remote
sensing) data.
Statistics: I propose that we use conditional probabilities to quantify
uncertainties.
Cressie (UOW) Statistics for Remote Sensing Data 2 / 28
3. NASA’s Earth observation satellites
(Credit: Illustration from NASA)
Cressie (UOW) Statistics for Remote Sensing Data 3 / 28
4. OCO-2
Throughout this talk, I shall use the example of remote sensing of
atmospheric CO2 using OCO-2 data (OCO-2 launched on July 2,
2014).
OCO-2 has sensitivity to near-surface CO2 and a small “footprint” of
1.1 × 2.25 km.
It has a repeat cycle of 16 days: Its geographic coverage is low
because of a narrow swath, but its spatial resolution is high.
There is no CO2 “thermometer”! Data are radiances taken from three
bands of the E-M spectrum: the strong CO2 band, the weak CO2
band, and the O2 A-band. These are called Level 1 data.
A forward model that relates radiances to atmospheric CO2 is used to
help solve the inverse problem of inferring the atmospheric state from
the observed radiances.
Cressie (UOW) Statistics for Remote Sensing Data 4 / 28
5. OCO-2 launch in July 2014
OCO-2 is NASA’s first mission completely devoted to measuring CO2 in
the atmosphere. Its measurements have high near-surface sensitivity and
very fine spatial resolution.
OCO-2 launch: 02:56AM (PDT), July 2, 2014
(Credit: Photo from NASA)
Cressie (UOW) Statistics for Remote Sensing Data 5 / 28
6. Astronomy picture of the day
(Image credit: Rick Baldridge)
Cressie (UOW) Statistics for Remote Sensing Data 6 / 28
7. OCO-2 in orbit
(Credit: Illustration from NASA)
Cressie (UOW) Statistics for Remote Sensing Data 7 / 28
8. Retrievals in the presence of uncertainties
Use conditional probabilities in a hierarchical statistical model to infer CO2
from a single footprint’s observed radiances, Y.
Data model (called “the forward model” by OCO-2):
Y = F(X; θ) + ε ,
where X is the atmospheric state, which includes CO2 values at a
range of geopotential heights, and ε ∼ Gau(0, Sε).
Process model (called “the prior” by OCO-2):
X ∼ Gau(Xα, Sα) .
Parameter model (largely absent from OCO-2):
θ is fixed (based on calibration).
(θ includes, e.g., forward-model parameter errors. It should also
include uncertainties in F and process-model uncertainties.)
Inference is usually on the state X: The predicted state, ˆX, is called a
retrieval. (There is almost no effort put into inference on θ.)
Cressie (UOW) Statistics for Remote Sensing Data 8 / 28
9. Hierarchical statistical modeling (HM)
Step away from remote sensing for the moment to get a broader
perspective on uncertainty quantification in science.
Let Y be the data, X be the process (or state) of interest, and θ be
unknown parameters. (For example, in my research in spatio-temporal
statistics, Y might have dimension 106 − 109, X might be of the
same order, and θ might have dimension 102 − 104.)
[A|B] denotes the conditional distribution of generic quantity A, given
generic quantity B; and [B] denotes the distribution of B.
[A,B] denotes the joint distribution of A and B. Then
[A,B] = [A|B] · [B].
Cressie (UOW) Statistics for Remote Sensing Data 9 / 28
10. HM captures sources of uncertainty
Sources of uncertainty: the data, the process, and the parameters.
All uncertainties can be expressed through the joint distribution,
[Y , X, θ]. From the previous slide,
[Y , X, θ] = [Y , X|θ] · [θ]
= [Y |X, θ] · [X|θ] · [θ]
Data model: [Y |X, θ]
Process model: [X|θ]
Parameter model: [θ]
Inference is on X (and θ) through the posterior distribution,
[X, θ|Y ] = [Y |X, θ][X|θ][θ]/[Y ].
This is known as Baye’s Rule, and it relies on knowing the normalizing
constant, [Y ]. An alternative strategy is to simulate from [X, θ|Y ].
Cressie (UOW) Statistics for Remote Sensing Data 10 / 28
11. Predictive distribution
As a concept, the predictive distribution is different from the posterior
distribution. In words, it is the conditional distribution of the process X
given the data Y . What about θ?
Three cases:
0 θ is known, and hence the parameter model is degenerate at θ. The
posterior distribution is [X|Y , θ] – since θ is known, this is also the
predictive distribution.
1 θ is fixed but unknown, and it is estimated from the data Y ; call the
estimate θ. The parameter model is assumed degenerate at θ, and the
(empirical) predictive distribution is [X|Y , θ].
2 θ is unknown, and its uncertainty is captured with the parameter model
[θ]. The posterior distribution is [X, θ|Y ], and the predictive
distribution is [X|Y ], after marginalizing over θ.
Case 0 is often unrealistic; Case 1 is called empirical hierarchical
modeling (EHM); and Case 2 is called Bayesian hierarchical modeling
(BHM).
Cressie (UOW) Statistics for Remote Sensing Data 11 / 28
12. States of knowledge of θ
The three cases can be defined in terms of “states of knowledge” on θ.
Case 0: θ is known; often an unrealistic state of knowledge.
Case 1: θ is fixed but unknown; classical frequentist state of
knowledge of θ. This results in empirical hierarchical modeling
(EHM).
Case 2: θ is unknown and has probability distribution [θ]; Bayesian
state of knowledge of θ. This results in Bayesian hierarchical
modeling (BHM).
Cressie (UOW) Statistics for Remote Sensing Data 12 / 28
13. Inference on the state X
Since X is uncertain, it is modeled as a random quantity. Inference on
a random quantity is sometimes called “prediction.” This explains the
terminology, predictive distribution; for the three cases, it is:
Case 0: [X|Y , θ] = [Y |X, θ] · [X|θ]/[Y |θ]
Case 1: [X|Y , θ] = [Y |X, θ] · [X|θ]/[Y |θ],
Case 2: [X|Y ] = [Y |X, θ] · [X|θ] · [θ]dθ/[Y ],
and it is not always the same as the posterior distribution.
Inference on X should be based on the predictive distribution. This is
fundamental to Uncertainty Quantification (UQ)!
In remote sensing, each footprint has an X. Typically, the retrieved
state ˆX is the mode of the predictive distribution; to my knowledge,
only Case 0 or Case 1 have been considered in the literature.
XCO2 is the average CO2 (in ppm) in the atmospheric column with
base given by the footprint; ˆXCO2 is its prediction obtained from the
predictive distribution of X given Y.
Cressie (UOW) Statistics for Remote Sensing Data 13 / 28
14. One footprint: Radiance vector Y
Figure: ABO2, WCO2, and SCO2 data on 6 August, 2014 (first light)
Cressie (UOW) Statistics for Remote Sensing Data 14 / 28
15. Orbit geometry of OCO-2
Cressie (UOW) Statistics for Remote Sensing Data 15 / 28
17. Predictive distribution, ctd
There is often too much information in [X|Y ], which is a (possibly
high-dimensional) probability distribution.
The predictive mean and predictive variance (equivalently, predictive
covariance for multivariate X) are often chosen as summaries of the
predictive distribution:
E(X|Y ) = X[X|Y ]dX
var(X|Y ) = XXT
[X|Y ]dX − E(X|Y )E(X|Y )T
.
Cressie (UOW) Statistics for Remote Sensing Data 17 / 28
18. Predictive distribution, ctd
If X1, . . . , XK is a sample from the predictive distribution [X|Y ], then:
E(X|Y )
1
K
K
k=1
Xk ≡ XK ,
and
var(X|Y )
1
K
K
k=1
XkXT
k − XK X
T
K ≡ CK .
As mentioned above, the predictive mode:
mode(X|Y ) ≡ arg max
X
[X|Y ] ,
is a summary that is often chosen in satellite missions.
Twenty-first-century strategy: Learn how to sample X1, . . . , XK from
the predictive distribution, [X|Y ], and approximate any summary of it
for K large.
Cressie (UOW) Statistics for Remote Sensing Data 18 / 28
19. Satellite remote sensing, revisited (1)
What is the process X we are really interested in?
X ≡ {X(x, y, h; t) : (x, y) ∈ Dg , h > 0, t ∈ Dt};
(x, y) = (lon, lat) on the geoid Dg ; h is geopotential height; t is a time
slice (e.g., a week) during a time period of interest Dt (e.g., a given
season).
X is a four-dimensional field of atmospheric CO2. That is, we are
interested in atmospheric CO2 everywhere and at any time.
Now {X(x, y, Psurf (x, y); t) : (x, y) ∈ Dg , t ∈ Dt} is the CO2 field at
Earth’s surface, where Psurf (x, y) denotes the surface pressure at
(x, y) ∈ Dg . Then define the surface flux,
∆(x, y; t) ≡
∂
∂t
X(x, y, Psurf (x, y); t) ,
and the CO2 surface-flux field,
XF ≡ {∆(x, y; t) : (x, y) ∈ Dg , t ∈ Dt}.
Cressie (UOW) Statistics for Remote Sensing Data 19 / 28
20. Satellite remote sensing, revisited (2)
Our goal is to infer the fields X, or XCO2 (defined further on), or XF .
What are the data?
Y ≡ {Y(xi , yi , ; ti ) : i = 1, · · · , n},
where Y(xi , yi , ; ti ) is the vector of radiances for the i-th sounding,
(xi , yi ) ∈ Dg , and ti ∈ Dt, for i = 1, · · · , n.
Since XCO2 is derived from a column average of X (over h), it is a
three-dimensional (lon, lat, time) latent field.
OCO-2 creates a “contracted” dataset:
Y ≡ {XCO2(xi , yi ; ti ) : i = 1, · · · , n},
where for i = 1, . . . , n, XCO2(xi , yi ; ti ) is the OCO-2 algorithm’s
estimated column-averaged CO2 at (xi , yi ) ∈ Dg and ti ∈ Dt. The
dataset ˜Y is central to the OCO-2 satellite’s retrieval protocol.
Cressie (UOW) Statistics for Remote Sensing Data 20 / 28
21. First light (8/6/14): Y(x1, y1; t1)
Figure: ABO2, WCO2, and SCO2 data on 6 August, 2014 (first light)
Cressie (UOW) Statistics for Remote Sensing Data 21 / 28
22. A contracted dataset of ˆXCO2 : ˜Y
Cressie (UOW) Statistics for Remote Sensing Data 22 / 28
23. Satellite remote sensing, revisited (3)
At the next level, called Level 2, OCO-2 focuses on inferring not the
four-dimensional field X, but the contracted set of state values:
X ≡ {XCO2(xi , yi ; ti ) : i = 1, · · · , n},
where formally the XCO2 field is:
XCO2(x, y; t) ≡
1
Psurf (x, y)
Psurf (x,y)
0
X(x, y, h; t)dh .
If [X|θ] = n
i=1[XCO2(xi , yi ; ti )|θ] (i.e., if there is statistical
independence within X), then
[X|Y , θ] =
n
i=1
{[XCO2(xi , yi ; ti )|XCO2(xi , yi ; ti ), θ]·[XCO2(xi , yi ; ti )|θ]},
and it is appropriate that inference proceeds on a
sounding-by-sounding basis. Is the independence assumption
reasonable?
Cressie (UOW) Statistics for Remote Sensing Data 23 / 28
24. Satellite remote sensing, revisited (4)
Because of atmospheric transport of CO2, the four-dimensional field
X is spatio-temporally dependent. Then the three-dimensional field,
XCO2 ≡ {XCO2(x, y; t) : (x, y) ∈ Dg , t ∈ Dt},
is also spatio-temporally dependent. Recall that X consists of just n
values of XCO2.
Since XCO2 is spatio-temporally dependent, so too is ˜X. Then
[X|θ] =
n
i=1
[XCO2(xi , yi ; ti )|θ],
and hence inference on X, or XCO2, or X, should not proceed on a
sounding-by-sounding basis. But it does!
Cressie (UOW) Statistics for Remote Sensing Data 24 / 28
25. Satellite Remote Sensing, Revisited (5)
Sounding-by-sounding retrievals are almost ubiquitous in satellite remote
sensing. Given this, how should Uncertainty Quantification proceed?
The data are Y ; OCO-2 “smooths” or “processes” them, and the
estimate of ˜X is
Y = {XCO2(xi , yi , ; ti ) : i = 1, · · · , n} .
I suggest we do something different. By contracting and marginalizing
[X|Y , θ], the predictive distribution, [ ˜X|Y , θ], can be obtained.
Consequently, the retrieval, ˆXCO2(xi , yi ; ti ), should be obtained from
all Y , not just Y(xi , yi ; ti ). This is sometimes called a joint retrieval,
but I have never seen it actually done.
Cressie (UOW) Statistics for Remote Sensing Data 25 / 28
26. Satellite Remote Sensing, Revisited (6)
We want to make inference on the CO2 field X, or its surface
derivative, XF , the surface-flux field. Then the predictive distribution
for data Y is:
[X|Y , θ] ∝ [Y |X, θ] · [X|θ].
If we use data Y (i.e., the XCO2 values), then we should build the
conditional probability models, [Y |X, θ], [X|θ] (and eventually worry
about θ). Notice that the process model, [X|θ], has not changed.
We need to build the spatio-temporal process model [X|θ]. A
geostatistical model that involves atmospheric transport is one choice;
a spatio-temporal random effects (SRE) model is another choice.
Recall XCO2 and XF . Inference on XCO2 is called Level 3
estimation, and inference on XF is called Level 4 estimation.
Cressie (UOW) Statistics for Remote Sensing Data 26 / 28
28. Concluding remarks
The principal science objective of OCO-2 is to predict a global geographic
distribution of CO2 sources and sinks (i.e., XF ) at Earth’s surface.
Sources (e.g., fires and respiration, fossil fuels, freshwater outgassing,
volcanism) and sinks (e.g., oceans, photosynthesis, soils) and their
changes over time are not known at high enough spatial resolution to
develop mitigation strategies.
XCO2(s; t) is a measurement of column-averaged CO2 in ppm, for a
retrieval located at s on the geoid, at time t.
Data-assimilation schemes invert spatio-temporal measurements, ˜Y ,
of XCO2 to infer the flux process, XF , of Earth’s CO2 sources and
sinks.
Uncertainty is present at each level, which goes from energies measured by
the OCO-2 instrument (Level 1) all the way to inferences about surface
fluxes of CO2 (Level 4). The uncertainties need to be quantified in order
to obtain a science result (e.g., that an estimated source is real).
Cressie (UOW) Statistics for Remote Sensing Data 28 / 28
29. YouTube Video
FRK of CO₂ Satellite (OCO-2) Data: 2014-10-01 to 2017-03-01
Clint Shumack, Andrew Zammit Mangion, and Noel Cressie
University of Wollongong
https://www.youtube.com/watch?v=wEws67WXvkY