SlideShare a Scribd company logo
1 of 110
Download to read offline
Machine Learning for
Scientific Applications
http://davidlary.info
David Lary
Need: Accounting for complex multi-variate context
which is often not fully described by theory
Monday, August 11, 14
Long Term Data Sets:
Uncertainty, Cross-Calibration,
Data Fusion & Machine Learning
Motivated by Data Assimilation
With examples from Land,Atmosphere & Ocean
Monday, August 11, 14
Bias Detection
“Who may discern his errors, ....” Psalm 19:12
7
Monday, August 11, 14
Why is it an issue?
• With fusion of multiple datasets bias is
often an issue (very relevant for climate
variables).
• Data assimilation is a least squares or a
Best Linear Unbiased Estimator (BLUE)
8
Monday, August 11, 14
.... runs deeper still
• Instrument teams have a keen sense of faithfully reporting the
data, as it is, warts and all.They are naturally loath to empirically
correct biases; they would like to theoretically understand the
cause of the bias and data issues from first principles.
The Earth System is so complex, with many interacting processes,
and often the instruments are also complex, this is not always
possible.
Residual data issues can, and usually do, remain.
• Modelers know that data bias exist, but are very reticent to make
changes to data products.
.... we therefore have a problem of closure.
9
Monday, August 11, 14
The problem!
• Biases are ubiquitous, not all of them can be explained
theoretically.Yet, we typically need to fuse multiple datasets to
construct long-term time series and/or improve global coverage.
• If the biases are not corrected before data fusion we introduce
further problems, such as ...
• spurious trends, leading to the possibility of unsuitable
policy decisions.
• when assimilation is involved, the suboptimal use of
observations, non-physical structures in the
analysis, biases in the assimilated fields, and extrapolation
of biases due to multivariate background constraints.
10
Monday, August 11, 14
A Further Problem
The instruments whose data we would like
to fuse are often not making coincident
measurements in time or space.
Imperative to inter-compare observations in
their appropriate context.
11
Monday, August 11, 14
Integrate multiple satellite datasets
for applications
The comparison above shows the total ozone column
observed by EP TOMS and Aura OMI. The high
resolution coverage that Aura OMI provides is clearly
seen. In the particular event shown there is a
tropopause fold event over Texas.
12
Monday, August 11, 14
An Example
13 representativeness
Monday, August 11, 14
14
Monday, August 11, 14
0.5 1 1.5 2 2.5 3 3.5 4
x 10
6
0
0.02
0.04
0.06
0.08
0.1
0.12
O
3
v.m.r.
RelativeFrequency
All years 01 (1900 K< < 2300 K, 90o
<
el
< 79o
)
Aura MLS O3
(23)
CLAES v9 O
3
(207)
ISAMS v10 O3
(19)
UARS MLS v5 183 GHz O
3
(379)
UARS MLS v5 205 GHz O
3
(490)
SAGE 2 v6.2 O3
(21)
SBUV v8 O3
(33)
15
Monday, August 11, 14
Geophysical Insights
(a) (b)
(c) (d)
Figure 2: N2O Equivalent PV latitude - potential temperature cross sections
of (a) representativeness uncertainty (v.m.r.), (b) observational uncertainty
16
Monday, August 11, 14
Bias is Spatially Dependent
−75 −60 −45 −30 −15 0 15 30 45 60 75
250
300
350
400
500
600
700
1000
1200
1500
2000
2500
Equivalent PV Latitude
PotentialTemperature(K)
% Bias (UARS MLS v5 183 GHz O
3
− HALOE v19 O
3
) for January of all years
−30
−20
−10
0
10
20
30
−75 −60 −45 −30 −15 0 15 30 45 60 75
250
300
350
400
500
600
700
1000
1200
1500
2000
2500
Equivalent PV Latitude
PotentialTemperature(K)
% Bias (UARS MLS v5 183 GHz O
3
− HALOE v19 O
3
) for January of all years
−30
−20
−10
0
10
20
30
17
Monday, August 11, 14
So what can we do
about this?
.... we do not have a theoretical explanation
18
Monday, August 11, 14
Machine Learning
for when our understanding is incomplete
19
... and that is quite often!
Monday, August 11, 14
What is Machine Learning?
• Machine learning is a sub-field of artificial
intelligence that is concerned with the design and
development of algorithms that allow computers
to learn the behavior of data sets empirically.
• A major focus of machine-learning research is to
produce (induce) empirical models from data
automatically.
• This approach is usually used because of the
absence of adequate and complete
theoretical models that are more desirable
conceptually.
20
Monday, August 11, 14
What is Machine Learning?
The use of machine learning can actually help
us to construct a more complete theoretical
model, as it allows us to determine which
factors are statistically capable of providing
the data mappings we seek— e.g. the
multi-variate, non-linear, non-
parametric mapping between satellite
radiances and a suite of ocean products.
21
Monday, August 11, 14
Machine Learning
Is for:
Regression
➡ Multivariate, non-linear, non-parametric
Classification
➡ Supervised and unsupervised
22
Monday, August 11, 14
Machine Learning
Comes in Several Flavors, for example:
• Neural Networks
• SupportVector Machines
• Gaussian Process Models
• Decision Trees
• Random Forests
23
Monday, August 11, 14
Machine Learning Regression
x1 x2 x3 x4 x5 xn y
Inputs
Output(s)
Inputs
Inputs
Inputs
Inputs
Inputs
Inputs
y = f (x1,x2,x3,x4 ,x5,…,xn )
Multivariate, non-linear, non-parametric
n can be very large
Training Data
Monday, August 11, 14
Machine Learning Supervised
Classification
x1 x2 x3 x4 x5 xn y
Inputs
Output(s)
Inputs
Inputs
Inputs
Inputs
Inputs
Inputs
Multivariate, non-linear, non-parametric
n can be very large
Training Data
Monday, August 11, 14
Machine Learning
Unsupervised Classification
Multivariate, non-linear, non-parametric
n can be very large
x1 x2 x3 x4 x5 xn
Inputs
Inputs
Inputs
Inputs
Inputs
Inputs
Inputs
Training Data
Monday, August 11, 14
Neural Networks
In a neural network model simple
nodes (neurons), are connected
together to form a network of
nodes. Its practical use comes
with algorithms designed to alter
the strength (weights) of the
connections in the network to
produce a desired signal flow.
27
Monday, August 11, 14
SupportVector Machines
Support vector machines (SVMs)
are a set of related supervised
learning methods used for
classification and regression.
Intuitively, an SVM model is a
representation of the training
examples as points in space,
mapped so that the examples of
the separate categories are
divided by a clear gap that is as
wide as possible.
VladimirVapnik
28
Monday, August 11, 14
Gaussian Process Models
Gaussian processes (GPs)
(Rasmussen and Williams 2006) fit a
multivariate Gaussian probability
distribution to any set of regressors,
allowing for analytic inference.As a
principled Bayesian technique, GPs
go beyond SVMs by allowing us to
supply a full posterior distribution
for our regressors, giving us both
mean estimates as well as an
indication of the uncertainty in them.
29
Monday, August 11, 14
Random Forest
Random forests are an ensemble learning method for
classification (and regression) that operate by
constructing a multitude of decision trees, hence a
forest.The approach was developed by Leo Breiman
and Adele Cutler.
Monday, August 11, 14
A key issue is training
dataset size, the bigger
the better!
..... until we run out of memory
31
Monday, August 11, 14
Variations in Stratospheric Cly
Between 1991 and the present
David Lary, Anne Douglass, Darryn Waugh,
Richard Stolarski, Paul Newman, Hamse Mussa
• Data can be biased,
maybe as a function of
many parameters.
• May be observing a
proxy for what we
really want to know.32
Monday, August 11, 14
ozone reductions there (SOCOL and E39C), and the model
with the largest cold bias in the Antarctic lower strato-
sphere in spring (LMDZrepro) simulates very low ozone.
CCMs show a large range of ozone trends over the
past 25 years (see left panels in Figure 3-26 of Chapter 3)
and large differences from observations. Some of these
differences may in part be related to differences in the sim-
recovery due to declining ODSs, we place importance on
the models’ ability to correctly simulate stratospheric Cly
as well as the representation of transport characteristics
and polar temperatures. Therefore, more credence is given
to those models that realistically simulate these processes.
Figure 6-7 shows a subset of the diagnostics used to eval-
uate these processes and CCMs shown with solid curves
21st
CENTURY OZONE LAYER
Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.
Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly
in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura
satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb
(see Figure 4-8), respectively.
50 hPa 80°S October 50 hPa 80°S October
Cly
–Cly
(1980)(ppbv)
Cly
(ppbv)
Year Year
33
Monday, August 11, 14
ozone reductions there (SOCOL and E39C), and the model
with the largest cold bias in the Antarctic lower strato-
sphere in spring (LMDZrepro) simulates very low ozone.
CCMs show a large range of ozone trends over the
past 25 years (see left panels in Figure 3-26 of Chapter 3)
and large differences from observations. Some of these
differences may in part be related to differences in the sim-
recovery due to declining ODSs, we place importance on
the models’ ability to correctly simulate stratospheric Cly
as well as the representation of transport characteristics
and polar temperatures. Therefore, more credence is given
to those models that realistically simulate these processes.
Figure 6-7 shows a subset of the diagnostics used to eval-
uate these processes and CCMs shown with solid curves
21st
CENTURY OZONE LAYER
Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.
Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly
in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura
satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb
(see Figure 4-8), respectively.
50 hPa 80°S October 50 hPa 80°S October
Cly
–Cly
(1980)(ppbv)
Cly
(ppbv)
Year Year
A large range of Cly in
the model simulations
Constrained by a limited number of
Cly observations
33
Monday, August 11, 14
• We need to know the distribution of
inorganic chlorine (Cly) in the
stratosphere to:
• Attribute changes in stratospheric
ozone to changes in halogens.
• Assess the realism of chemistry-
climate models.
34
Monday, August 11, 14
Cly=HCl+ClONO2+ClO+HOCl
+2Cl2O2+2Cl2
Long time-series
Sporadic
Long time-series
Since 2004
Estimating Cly is hampered by lack of observations
Estimating Cly is hampered by inter-instrument biases
35
Monday, August 11, 14
Using PDFs for Bias Detection
0.8 1 1.2 1.4 1.6 1.8 2
x 10
9
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
HCl v.m.r.
RelativeFrequency
2005/01 (460 K< < 590 K, 49
o
< el
< 61
o
)
ACE v2.2 HCl (75)
Aura MLS HCl (1544)
HALOE v19 HCl (101)
http://www.pdfcentral.info/
HALOE -Aura
HCl
If we now repeat
this globally for all
periods of overlap
36
Monday, August 11, 14
0
1 2 3 4
0
1
2
3
4
HALOE HCl (ppbv)
ATMOSHCl(ppbv)
Slope = 1.05
Intercept = 0.23 ppbv
Data
1:1
Weighted Fit
HCl Inter-comparisons
37
Monday, August 11, 14
0 1 2 3 4
0
1
2
3
4
HALOE HCl (ppbv)
ACEHClv2.2(ppbv)
Slope = 1.18
Intercept = −0.050 ppbv
Data
1:1
Weighted Fit
HCl Inter-comparisons
37
Monday, August 11, 14
0 1 2 30
1
2
3
HALOE HCl (ppbv)
MLSHCl(ppbv)
Slope = 1.09
Intercept = 0.070 ppbv
Data
1:1
Weighted Fit
Fit
HCl Inter-comparisons
37
Monday, August 11, 14
0 1 2 30
1
2
3
HALOE HCl (ppbv)
MLSHCl(ppbv)
Slope = 1.09
Intercept = 0.070 ppbv
Data
1:1
Weighted Fit
Fit
0 1 2 3
0
1
2
3
HALOE HCl (ppbv) NN adjusted
MLSHCl(ppbv)
Slope = 0.995
Iintercept = 0.0093 ppbv
Data
1:1
Weighted Fit
HCl Inter-comparisons
37
Monday, August 11, 14
Neurological algorithms
InputsOutputs
Process
38
Monday, August 11, 14
An example neural network
Inputs
Outputs
Process
39
Monday, August 11, 14
An example neural network
Inputs
Outputs
Process
39
Objective design of neural networks
using genetic algorithms
Monday, August 11, 14
An example neural network
40
Monday, August 11, 14
Re-calibration
using a Neural Network
0.5 1 1.5 2 2.5 3 3.5
x 10
9
0.5
1
1.5
2
2.5
3
3.5
x 10
9
Targets T
OutputsA,LinearFit:A=(0.97)T+(5e11)
HCl Training Outputs vs. Targets, R=0.98739
Training Data Points
Best Linear Fit
A = T
0.5 1 1.5 2 2.5 3 3.5
x 10
9
0.5
1
1.5
2
2.5
3
3.5
x 10
9
Targets T
OutputsA,LinearFit:A=(0.98)T+(2.9e11)
HCl Validation Outputs vs. Targets, R=0.99232
Validation Data Points
Best Linear Fit
A = T
41
Monday, August 11, 14
Re-calibration
using a Neural Network
0.5 1 1.5 2 2.5 3 3.5
x 10
9
0.5
1
1.5
2
2.5
3
3.5
x 10
9
Targets T
OutputsA,LinearFit:A=(0.97)T+(5e11)
HCl Training Outputs vs. Targets, R=0.98739
Training Data Points
Best Linear Fit
A = T
0.5 1 1.5 2 2.5 3 3.5
x 10
9
0.5
1
1.5
2
2.5
3
3.5
x 10
9
Targets T
OutputsA,LinearFit:A=(0.98)T+(2.9e11)
HCl Validation Outputs vs. Targets, R=0.99232
Validation Data Points
Best Linear Fit
A = T
Totally independent
validation
41
Monday, August 11, 14
Long-term continuity
42
Monday, August 11, 14
Long-term continuity
Applied Neural Network
Re-calibration to HALOE
42
Monday, August 11, 14
1995 2000 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
9
Year
Cly
Monthly average 2
o
800 K
525 K
6 Year Age
5 Year Age
4 Year Age
3 Year Age
2 Year Age
1995 2000 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
9
Year
Cl
y
Monthly average 61
o
800 K
525 K
6 Year Age
5 Year Age
4 Year Age
3 Year Age
2 Year Age
October
Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.
Long-term continuity for Cly
43
Monday, August 11, 14
1995 2000 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
9
Year
Cly
Monthly average 2
o
800 K
525 K
6 Year Age
5 Year Age
4 Year Age
3 Year Age
2 Year Age
1995 2000 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
9
Year
Cl
y
Monthly average 61
o
800 K
525 K
6 Year Age
5 Year Age
4 Year Age
3 Year Age
2 Year Age
October
Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.
Long-term continuity for Cly
ozone reductions there (SOCOL and E39C), and the model
with the largest cold bias in the Antarctic lower strato-
sphere in spring (LMDZrepro) simulates very low ozone.
CCMs show a large range of ozone trends over the
past 25 years (see left panels in Figure 3-26 of Chapter 3)
and large differences from observations. Some of these
differences may in part be related to differences in the sim-
recovery due to declining ODSs, we place importance on
the models’ ability to correctly simulate stratospheric Cly
as well as the representation of transport characteristics
and polar temperatures. Therefore, more credence is given
to those models that realistically simulate these processes.
Figure 6-7 shows a subset of the diagnostics used to eval-
uate these processes and CCMs shown with solid curves
21st
CENTURY OZONE LAYER
Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.
Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly
in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura
satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb
(see Figure 4-8), respectively.
50 hPa 80°S October 50 hPa 80°S October
Cly
–Cly
(1980)(ppbv)
Cly
(ppbv)
Year Year
43
Monday, August 11, 14
1995 2000 2005
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
9
Year
Cly
Monthly average 2
o
800 K
525 K
6 Year Age
5 Year Age
4 Year Age
3 Year Age
2 Year Age
October
Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ.
Long-term continuity for Cly
ozone reductions there (SOCOL and E39C), and the model
with the largest cold bias in the Antarctic lower strato-
sphere in spring (LMDZrepro) simulates very low ozone.
CCMs show a large range of ozone trends over the
past 25 years (see left panels in Figure 3-26 of Chapter 3)
and large differences from observations. Some of these
differences may in part be related to differences in the sim-
ulated Cly, e.g., E39C and SOCOL show a trend smaller
than observed, whereas AMTRAC and UMETRAC show
a trend larger than observed in extrapolar area weighted
mean column ozone. However, other factors also con-
tribute, e.g., biases in tropospheric ozone (Austin and
Wilson, 2006).
The CCM evaluation discussed above and in Eyring
et al. (2006) has guided the level of confidence we place
on each model simulation. The CCMs vary in their skill
in representing different processes and characteristics of
the atmosphere. Because the focus here is on ozone
recovery due to declining ODSs, we place importance on
the models’ ability to correctly simulate stratospheric Cly
as well as the representation of transport characteristics
and polar temperatures. Therefore, more credence is given
to those models that realistically simulate these processes.
Figure 6-7 shows a subset of the diagnostics used to eval-
uate these processes and CCMs shown with solid curves
in Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those that
are in good agreement with the observations in Figure
6-7. However, these line styles should not be over-
interpreted as both the ability of the CCMs to represent
these processes as well as the relative importance of Cly,
temperature, and transport vary between different regions
and altitudes. Also, analyses of model dynamics in the
Arctic, and differences in the chlorine budget/partitioning
in these models, when available, might change this evalu-
ation for some regions and altitudes.
21st
CENTURY OZONE LAYER
6.26
Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs.
Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly
in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura
satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb
(see Figure 4-8), respectively.
50 hPa 80°S October 50 hPa 80°S October
Cly
–Cly
(1980)(ppbv)
Cly
(ppbv)
Year Year
43
Monday, August 11, 14
44
Monday, August 11, 14
45
Monday, August 11, 14
Other uses of machine
learning
• Cross calibration of vegetation indices from AVHRR, MODIS,
SPOT and SeaWIFS
• Inferring CO2 fluxes from vegetation indices and surface
temperature
• Inferring ocean pigment concentrations and other parameters
• Inferring drought stress and endophyte infection in cacao (coffee)
• Learning the chaotically tumbling orbit of the Hubble space
telescope
• Detecting online ebay fraud
• Acceleration of expensive code elements
46
Monday, August 11, 14
Another application
dissolved organic carbon
47
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
48
Monday, August 11, 14
Method used to estimate DOC R
SeaWiFS bands GP NL 0.99977
MODIS bands GP NL 0.9997
All bands GP NL 0.99901
UV & SeaWiFS bands GP NL 0.99899
All bands NN 0.95859
UV & SeaWiFS bands NN 0.94609
MODIS bands NN 0.92585
SeaWiFS bands NN 0.91653
49
Monday, August 11, 14
5
10
15
0
5
0
10
0
15
1
0.99
0.95
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.20.10
Standarddeviation
Co r r e
l a
t
i o
n
C
o
e
f
f
ic
ient
RM
S
D
A
B
C
D
E
F
G
H
Gaussian Process Models50
Monday, August 11, 14
Relative Importance of the Inputs
Wavelength Relative Importance
Rrs490 0.00087123
Rrs555 0.011976
Rrs670 1.5876
Rrs510 9.8423
Rrs443 13.0898
Rrs412 20.2553
The GPM hyper-parameters give an indication of the relative
importance of the inputs. For the DOC SeaWiFS bands the best inputs
are those with the smallest values, here they are sorted in order of
importance
Most
Important
Least
Important
51
Monday, August 11, 14
−0.5 0 0.5 1 1.5 2
0
5
10
15
20
25
30
35
40
a412−a443
Salinity
Salinity
Data
Polynomial (r2
=0.928)
NN (r
2
=0.933)
SVM (r2
=0.933)
52
Monday, August 11, 14
Visibility
Variable R
Td
q
T
U
RH
SLP
-0.29
-0.26
-0.19
-0.18
0.13
0.05
53
Monday, August 11, 14
High Resolution Identification of
Dust Sources Using Machine
Learning and Remote Sensing Data
Annette Walker and David J. Lary
A42A-08
Monday, August 11, 14
NRL High-resolution Dust Source Database
20030820 NRL DEP20030820 NRL DEP
Iran
Pakistan
Iran
Pakistan
• 10 years of DEP (2 yr MSG/RGB) imagery
• COAMPS 10 m wind overlays
• Surface weather plots
• ENVI (Gis-like software)
• NGDC topographical 10ºX10º tiles
• Overlay 0.25º grid or use Google Earth (GE)
• Dust source area entered into database
(cursor location tool = 1km precision)
• Cross-correlate land and water features
using maps, atlases, Landsat images
(detailed topographic, geographic,
and geomorphic information, GE)
• Technical and governmental reports
Approach and Methodology
20110630 NRL MSG/RGB
Saudi
Arabia
20030820 MODIS True Color
Monday, August 11, 14
NRL High-resolution Dust Source Database
20030820 NRL DEP20030820 NRL DEP
Iran
Pakistan
Iran
Pakistan
• 10 years of DEP (2 yr MSG/RGB) imagery
• COAMPS 10 m wind overlays
• Surface weather plots
• ENVI (Gis-like software)
• NGDC topographical 10ºX10º tiles
• Overlay 0.25º grid or use Google Earth (GE)
• Dust source area entered into database
(cursor location tool = 1km precision)
• Cross-correlate land and water features
using maps, atlases, Landsat images
(detailed topographic, geographic,
and geomorphic information, GE)
• Technical and governmental reports
Approach and Methodology
20110630 NRL MSG/RGB
Saudi
Arabia
20030820 MODIS True Color20030820 NRL DEP
Iran
Pakistan
Monday, August 11, 14
NRL High-resolution Dust Source Database
Solid red and purple shapes identify dust source
areas located using DEP and MSG.
SW Asia DSD East Asia DSD
Mongolia
Saudi
Arabia
Monday, August 11, 14
Self-Organizing Map
Self-organizing maps (SOMs) are a
data visualization and unsupervised
classification technique invented by
Professor Teuvo Kohonen (Kohonen
1982; 1990) that reduce the
dimensions of data through the use
of self-organizing neural networks.
They help us address the issue that
humans simply cannot visualize high
dimensional data.
Monday, August 11, 14
Self-Organizing Map
SOMs reduce dimensionality by
producing a map that objectively plots
the similarities of the data by grouping
similar data items together.
SOMs learn to classify input vectors
according to how they are grouped in
the input space.
SOMs learn both the distribution and
topology of the input vectors they are
trained on.This approach allows SOMs
to accomplish two things, reduce
dimensions and display similarities.
Monday, August 11, 14
Detecting Dust Sources
Monday, August 11, 14
Self Organizing Map Classification
7 Bands
MODIS MCD43C3
bihemispherical reflectance
Monday, August 11, 14
All 1000-Classes mapped for North Africa
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Libyan Dust Event: May 9, 2010 (8Z – 12Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Plumes originate on leeward side of
Al Jabal al Akhdar where drainage occurs
along slopes.
Corresponding SOM-Classes: 49, 93, 94
Libyan Dust Event: May 9, 2010 (6Z – 8Z)
Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains)
A coastal mountain range with height 1.0-1.5 km.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Chad: Bodélé Depression
Dust Event: March 16, 2010 (7Z -12Z)
Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on
average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest.
Monday, August 11, 14
Selected SOM Classes
Chad: Bodélé Depression
NRL MSG-RGB 20110109
Source area is not
designated in first pass of
MODIS reflectance and land
surface classification.
1000 SOM Classes
Monday, August 11, 14
Selected Classes with Class 137
Chad: Bodélé Depression
NRL MSG-RGB 20110109
Class 137 maps diatom
sediment in depression.
1000 SOM Classes
Monday, August 11, 14
Selected Classes Without Class 137
Chad: Bodélé Depression
NRL MSG-RGB 201101091000 SOM Classes
Class 137 maps diatom
sediment in depression.
Monday, August 11, 14
Solid black circles/ovals show plume source
Corresponding SOM Classes within open circles/ovals
Northern Sahara: 36, 40, 63, 100
Sahel: 147, 229, 230, 405
West Africa: Feb 2, 2011 13Z
Monday, August 11, 14
Selected Classes for North Africa
(This involves 40 distinct classes)
Monday, August 11, 14
Jan 1, 2006 True Color
Jan 1, 2006 NRL DEP
Sources along New Mexico/Texas border
The North American sources have a different
spectral signature than those we saw in SW Asia
Agricultural on high planes
Blue dessert areas
Monday, August 11, 14
Sources in Arizona and Colorado
Apr 17, 2006 NRL DEP
Apr 17, 2006 True color
Monday, August 11, 14
Selected Classes for North America (n=64)
Monday, August 11, 14
All 1000-Classes mapped for South America
Monday, August 11, 14
All 1000-Classes mapped for South America
Blue colored SOM-Classes are concentrated in
Atacama and Salar de Uyuni deserts
White areas are salt flats
Monday, August 11, 14
South America: Bolivia and Chile
July 18, 2010 MODIS Terra True Color
Monday, August 11, 14
South America: Bolivia and Chile
July 18, 2010 MODIS Terra True ColorSelected SOM-Classes in 200s, 300s, and 400s
Monday, August 11, 14
• SOMs provide an effective mechanism for
automating the identification of dust sources.
• Using the SOMs let us globally map dust sources
at high resolution 1-10 km.
• Saved time in finding dust sources while
comparing to satellite imagery.
• This can be done in real time to have dynamically
changing dust sources.
Monday, August 11, 14
Model&
Exis+ng&
New&
Exis+ng&
New&
78
Monday, August 11, 14
Model&
Exis+ng&
New&
Model&
Exis+ng&
New&
• Personalized Health
Care
• Proactive Health
Care System
• Business Analytics
• Smart Logistics
• Disaster Response
• Fraud Detection
http://holistics3.com
Monday, August 11, 14
Visualiza1on(
Decision(
Support(
Machine(
Learning(
Insight(&(
Discovery(
Exis%ng(
• Social(Media(
• Socioeconomic,(Census(
• News(feeds(
• Environmental(
• Weather(
• Satellite(
• Sensors(
• Health(
• Economic(
New(
• Business(Analy%cs(2.0(
• UAVs(
• HyperHspectral(Imaging(
• Smart(Dust(
• Wearable(Sensors(
• Autonomous(Cars(
Simula%on(
• Global(Weather(Models(
• Economic(Models(
• Earthquake(Models(
GigaPop(Pipe(
TACC
Monday, August 11, 14

More Related Content

What's hot

Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
plan4all
 
GMT AITC brief_final
GMT AITC brief_finalGMT AITC brief_final
GMT AITC brief_final
Naomi Mathers
 

What's hot (20)

Remote sensing with drones: The challenges of obtaining truly quantitative da...
Remote sensing with drones: The challenges of obtaining truly quantitative da...Remote sensing with drones: The challenges of obtaining truly quantitative da...
Remote sensing with drones: The challenges of obtaining truly quantitative da...
 
Sensornets and Global Change
Sensornets and Global ChangeSensornets and Global Change
Sensornets and Global Change
 
How can drone data be used in modelling?
How can drone data be used in modelling?How can drone data be used in modelling?
How can drone data be used in modelling?
 
Andy Hardy-Enfermedades transmitidas por vectores
Andy Hardy-Enfermedades transmitidas por vectoresAndy Hardy-Enfermedades transmitidas por vectores
Andy Hardy-Enfermedades transmitidas por vectores
 
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
IMED 2018: Innovations and Challenges in the Use of Open-source Remote Sensin...
 
Lect 6 em spectrum-rs
Lect 6 em spectrum-rsLect 6 em spectrum-rs
Lect 6 em spectrum-rs
 
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
TERN eMAST : Observations and terrestrial ecosystem models : Terrestrial Ecos...
 
Using deep learning in remote sensing
Using deep learning in remote sensingUsing deep learning in remote sensing
Using deep learning in remote sensing
 
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
IRJET -  	  Intelligent Weather Forecasting using Machine Learning TechniquesIRJET -  	  Intelligent Weather Forecasting using Machine Learning Techniques
IRJET - Intelligent Weather Forecasting using Machine Learning Techniques
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
REMOTE SENSING
REMOTE SENSINGREMOTE SENSING
REMOTE SENSING
 
Cobweb: Using citizen science data to support flood risk modelling
Cobweb: Using citizen science data to support flood risk modellingCobweb: Using citizen science data to support flood risk modelling
Cobweb: Using citizen science data to support flood risk modelling
 
The FiRe CTO Design Challenge: Wildfire Technology
The FiRe CTO Design Challenge: Wildfire TechnologyThe FiRe CTO Design Challenge: Wildfire Technology
The FiRe CTO Design Challenge: Wildfire Technology
 
Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...Air quality challenges and business opportunities in China: Fusion of environ...
Air quality challenges and business opportunities in China: Fusion of environ...
 
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSSMetadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
Metadata Standards in CKAN for Biodiversity Pilot in NextGEOSS
 
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5
 
Efficient data reduction and analysis of DECam images using multicore archite...
Efficient data reduction and analysis of DECam images using multicore archite...Efficient data reduction and analysis of DECam images using multicore archite...
Efficient data reduction and analysis of DECam images using multicore archite...
 
Ecosystem science requirements for uas remote sensing
Ecosystem science requirements for uas remote sensing Ecosystem science requirements for uas remote sensing
Ecosystem science requirements for uas remote sensing
 
DRI UAV Expertise and Related Interests
DRI UAV Expertise and Related InterestsDRI UAV Expertise and Related Interests
DRI UAV Expertise and Related Interests
 
GMT AITC brief_final
GMT AITC brief_finalGMT AITC brief_final
GMT AITC brief_final
 

Similar to Machine Learning for Scientific Applications

Constructing a long time series of soil moisture using SMOS data with statist...
Constructing a long time series of soil moisture using SMOS data with statist...Constructing a long time series of soil moisture using SMOS data with statist...
Constructing a long time series of soil moisture using SMOS data with statist...
grssieee
 
UNF Undergrad Physics
UNF Undergrad PhysicsUNF Undergrad Physics
UNF Undergrad Physics
Nick Kypreos
 
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesAstronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Sérgio Sacani
 
A comparison of classification techniques for seismic facies recognition
A comparison of classification techniques for seismic facies recognitionA comparison of classification techniques for seismic facies recognition
A comparison of classification techniques for seismic facies recognition
Pioneer Natural Resources
 
Porosity prediction from seismic using geostatistic
Porosity prediction from seismic using geostatisticPorosity prediction from seismic using geostatistic
Porosity prediction from seismic using geostatistic
Melani Khairunisa
 
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
The Statistical and Applied Mathematical Sciences Institute
 

Similar to Machine Learning for Scientific Applications (20)

2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
2013 ASPRS Track, Ozone Modeling for the Contiguous United States by Michael ...
 
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
Land Cover and Land use Classifiction from Satellite Image Time Series Data u...
 
Constructing a long time series of soil moisture using SMOS data with statist...
Constructing a long time series of soil moisture using SMOS data with statist...Constructing a long time series of soil moisture using SMOS data with statist...
Constructing a long time series of soil moisture using SMOS data with statist...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
CLIM Fall 2017 Course: Statistics for Climate Research, Analysis for Climate ...
 
UNF Undergrad Physics
UNF Undergrad PhysicsUNF Undergrad Physics
UNF Undergrad Physics
 
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesAstronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
 
A comparison of classification techniques for seismic facies recognition
A comparison of classification techniques for seismic facies recognitionA comparison of classification techniques for seismic facies recognition
A comparison of classification techniques for seismic facies recognition
 
Data analysis03 timeasa-variable
Data analysis03 timeasa-variableData analysis03 timeasa-variable
Data analysis03 timeasa-variable
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embedding
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
 
Porosity prediction from seismic using geostatistic
Porosity prediction from seismic using geostatisticPorosity prediction from seismic using geostatistic
Porosity prediction from seismic using geostatistic
 
Self-organzing maps in Earth Observation Data Cube Analysis
Self-organzing maps in Earth Observation Data Cube AnalysisSelf-organzing maps in Earth Observation Data Cube Analysis
Self-organzing maps in Earth Observation Data Cube Analysis
 
Andy Jarvis Parasid Near Real Time Monitoring Of Habitat Change Using A Neura...
Andy Jarvis Parasid Near Real Time Monitoring Of Habitat Change Using A Neura...Andy Jarvis Parasid Near Real Time Monitoring Of Habitat Change Using A Neura...
Andy Jarvis Parasid Near Real Time Monitoring Of Habitat Change Using A Neura...
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Blue Waters Enabled Advances in the Fields of Atmospheric Science, Climate, a...
Blue Waters Enabled Advances in the Fields of Atmospheric Science, Climate, a...Blue Waters Enabled Advances in the Fields of Atmospheric Science, Climate, a...
Blue Waters Enabled Advances in the Fields of Atmospheric Science, Climate, a...
 
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
CLIM: Transition Workshop - Investigating Precipitation Extremes in the US Gu...
 
The physics background of the BDE SC5 pilot cases
The physics background of the BDE SC5 pilot casesThe physics background of the BDE SC5 pilot cases
The physics background of the BDE SC5 pilot cases
 
2 1 xie_solar_2016_pv_systems
2 1 xie_solar_2016_pv_systems2 1 xie_solar_2016_pv_systems
2 1 xie_solar_2016_pv_systems
 
CLIM: Transition Workshop - Discussion of Statistics in Oceanography - Michae...
CLIM: Transition Workshop - Discussion of Statistics in Oceanography - Michae...CLIM: Transition Workshop - Discussion of Statistics in Oceanography - Michae...
CLIM: Transition Workshop - Discussion of Statistics in Oceanography - Michae...
 
Computational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain ScientistsComputational Training and Data Literacy for Domain Scientists
Computational Training and Data Literacy for Domain Scientists
 

More from David Lary (6)

The West Africa-America Chamber of Commerce & Industries presents: Big Data &...
The West Africa-America Chamber of Commerce & Industries presents: Big Data &...The West Africa-America Chamber of Commerce & Industries presents: Big Data &...
The West Africa-America Chamber of Commerce & Industries presents: Big Data &...
 
The West Africa-America Chamber of Commerce & Industries presents: Sub sahara...
The West Africa-America Chamber of Commerce & Industries presents: Sub sahara...The West Africa-America Chamber of Commerce & Industries presents: Sub sahara...
The West Africa-America Chamber of Commerce & Industries presents: Sub sahara...
 
The West Africa-America Chamber of Commerce & Industries presents:
The West Africa-America Chamber of Commerce & Industries presents: The West Africa-America Chamber of Commerce & Industries presents:
The West Africa-America Chamber of Commerce & Industries presents:
 
West Africa-America Chamber of Commerce & Industries: E mist
West Africa-America Chamber of Commerce & Industries: E mistWest Africa-America Chamber of Commerce & Industries: E mist
West Africa-America Chamber of Commerce & Industries: E mist
 
Big Data & Machine Learning for Societal Benefit
Big Data & Machine Learning for Societal BenefitBig Data & Machine Learning for Societal Benefit
Big Data & Machine Learning for Societal Benefit
 
Why geni
Why geniWhy geni
Why geni
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Machine Learning for Scientific Applications

  • 1. Machine Learning for Scientific Applications http://davidlary.info David Lary Need: Accounting for complex multi-variate context which is often not fully described by theory Monday, August 11, 14
  • 2. Long Term Data Sets: Uncertainty, Cross-Calibration, Data Fusion & Machine Learning Motivated by Data Assimilation With examples from Land,Atmosphere & Ocean Monday, August 11, 14
  • 3. Bias Detection “Who may discern his errors, ....” Psalm 19:12 7 Monday, August 11, 14
  • 4. Why is it an issue? • With fusion of multiple datasets bias is often an issue (very relevant for climate variables). • Data assimilation is a least squares or a Best Linear Unbiased Estimator (BLUE) 8 Monday, August 11, 14
  • 5. .... runs deeper still • Instrument teams have a keen sense of faithfully reporting the data, as it is, warts and all.They are naturally loath to empirically correct biases; they would like to theoretically understand the cause of the bias and data issues from first principles. The Earth System is so complex, with many interacting processes, and often the instruments are also complex, this is not always possible. Residual data issues can, and usually do, remain. • Modelers know that data bias exist, but are very reticent to make changes to data products. .... we therefore have a problem of closure. 9 Monday, August 11, 14
  • 6. The problem! • Biases are ubiquitous, not all of them can be explained theoretically.Yet, we typically need to fuse multiple datasets to construct long-term time series and/or improve global coverage. • If the biases are not corrected before data fusion we introduce further problems, such as ... • spurious trends, leading to the possibility of unsuitable policy decisions. • when assimilation is involved, the suboptimal use of observations, non-physical structures in the analysis, biases in the assimilated fields, and extrapolation of biases due to multivariate background constraints. 10 Monday, August 11, 14
  • 7. A Further Problem The instruments whose data we would like to fuse are often not making coincident measurements in time or space. Imperative to inter-compare observations in their appropriate context. 11 Monday, August 11, 14
  • 8. Integrate multiple satellite datasets for applications The comparison above shows the total ozone column observed by EP TOMS and Aura OMI. The high resolution coverage that Aura OMI provides is clearly seen. In the particular event shown there is a tropopause fold event over Texas. 12 Monday, August 11, 14
  • 11. 0.5 1 1.5 2 2.5 3 3.5 4 x 10 6 0 0.02 0.04 0.06 0.08 0.1 0.12 O 3 v.m.r. RelativeFrequency All years 01 (1900 K< < 2300 K, 90o < el < 79o ) Aura MLS O3 (23) CLAES v9 O 3 (207) ISAMS v10 O3 (19) UARS MLS v5 183 GHz O 3 (379) UARS MLS v5 205 GHz O 3 (490) SAGE 2 v6.2 O3 (21) SBUV v8 O3 (33) 15 Monday, August 11, 14
  • 12. Geophysical Insights (a) (b) (c) (d) Figure 2: N2O Equivalent PV latitude - potential temperature cross sections of (a) representativeness uncertainty (v.m.r.), (b) observational uncertainty 16 Monday, August 11, 14
  • 13. Bias is Spatially Dependent −75 −60 −45 −30 −15 0 15 30 45 60 75 250 300 350 400 500 600 700 1000 1200 1500 2000 2500 Equivalent PV Latitude PotentialTemperature(K) % Bias (UARS MLS v5 183 GHz O 3 − HALOE v19 O 3 ) for January of all years −30 −20 −10 0 10 20 30 −75 −60 −45 −30 −15 0 15 30 45 60 75 250 300 350 400 500 600 700 1000 1200 1500 2000 2500 Equivalent PV Latitude PotentialTemperature(K) % Bias (UARS MLS v5 183 GHz O 3 − HALOE v19 O 3 ) for January of all years −30 −20 −10 0 10 20 30 17 Monday, August 11, 14
  • 14. So what can we do about this? .... we do not have a theoretical explanation 18 Monday, August 11, 14
  • 15. Machine Learning for when our understanding is incomplete 19 ... and that is quite often! Monday, August 11, 14
  • 16. What is Machine Learning? • Machine learning is a sub-field of artificial intelligence that is concerned with the design and development of algorithms that allow computers to learn the behavior of data sets empirically. • A major focus of machine-learning research is to produce (induce) empirical models from data automatically. • This approach is usually used because of the absence of adequate and complete theoretical models that are more desirable conceptually. 20 Monday, August 11, 14
  • 17. What is Machine Learning? The use of machine learning can actually help us to construct a more complete theoretical model, as it allows us to determine which factors are statistically capable of providing the data mappings we seek— e.g. the multi-variate, non-linear, non- parametric mapping between satellite radiances and a suite of ocean products. 21 Monday, August 11, 14
  • 18. Machine Learning Is for: Regression ➡ Multivariate, non-linear, non-parametric Classification ➡ Supervised and unsupervised 22 Monday, August 11, 14
  • 19. Machine Learning Comes in Several Flavors, for example: • Neural Networks • SupportVector Machines • Gaussian Process Models • Decision Trees • Random Forests 23 Monday, August 11, 14
  • 20. Machine Learning Regression x1 x2 x3 x4 x5 xn y Inputs Output(s) Inputs Inputs Inputs Inputs Inputs Inputs y = f (x1,x2,x3,x4 ,x5,…,xn ) Multivariate, non-linear, non-parametric n can be very large Training Data Monday, August 11, 14
  • 21. Machine Learning Supervised Classification x1 x2 x3 x4 x5 xn y Inputs Output(s) Inputs Inputs Inputs Inputs Inputs Inputs Multivariate, non-linear, non-parametric n can be very large Training Data Monday, August 11, 14
  • 22. Machine Learning Unsupervised Classification Multivariate, non-linear, non-parametric n can be very large x1 x2 x3 x4 x5 xn Inputs Inputs Inputs Inputs Inputs Inputs Inputs Training Data Monday, August 11, 14
  • 23. Neural Networks In a neural network model simple nodes (neurons), are connected together to form a network of nodes. Its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow. 27 Monday, August 11, 14
  • 24. SupportVector Machines Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Intuitively, an SVM model is a representation of the training examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. VladimirVapnik 28 Monday, August 11, 14
  • 25. Gaussian Process Models Gaussian processes (GPs) (Rasmussen and Williams 2006) fit a multivariate Gaussian probability distribution to any set of regressors, allowing for analytic inference.As a principled Bayesian technique, GPs go beyond SVMs by allowing us to supply a full posterior distribution for our regressors, giving us both mean estimates as well as an indication of the uncertainty in them. 29 Monday, August 11, 14
  • 26. Random Forest Random forests are an ensemble learning method for classification (and regression) that operate by constructing a multitude of decision trees, hence a forest.The approach was developed by Leo Breiman and Adele Cutler. Monday, August 11, 14
  • 27. A key issue is training dataset size, the bigger the better! ..... until we run out of memory 31 Monday, August 11, 14
  • 28. Variations in Stratospheric Cly Between 1991 and the present David Lary, Anne Douglass, Darryn Waugh, Richard Stolarski, Paul Newman, Hamse Mussa • Data can be biased, maybe as a function of many parameters. • May be observing a proxy for what we really want to know.32 Monday, August 11, 14
  • 29. ozone reductions there (SOCOL and E39C), and the model with the largest cold bias in the Antarctic lower strato- sphere in spring (LMDZrepro) simulates very low ozone. CCMs show a large range of ozone trends over the past 25 years (see left panels in Figure 3-26 of Chapter 3) and large differences from observations. Some of these differences may in part be related to differences in the sim- recovery due to declining ODSs, we place importance on the models’ ability to correctly simulate stratospheric Cly as well as the representation of transport characteristics and polar temperatures. Therefore, more credence is given to those models that realistically simulate these processes. Figure 6-7 shows a subset of the diagnostics used to eval- uate these processes and CCMs shown with solid curves 21st CENTURY OZONE LAYER Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs. Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb (see Figure 4-8), respectively. 50 hPa 80°S October 50 hPa 80°S October Cly –Cly (1980)(ppbv) Cly (ppbv) Year Year 33 Monday, August 11, 14
  • 30. ozone reductions there (SOCOL and E39C), and the model with the largest cold bias in the Antarctic lower strato- sphere in spring (LMDZrepro) simulates very low ozone. CCMs show a large range of ozone trends over the past 25 years (see left panels in Figure 3-26 of Chapter 3) and large differences from observations. Some of these differences may in part be related to differences in the sim- recovery due to declining ODSs, we place importance on the models’ ability to correctly simulate stratospheric Cly as well as the representation of transport characteristics and polar temperatures. Therefore, more credence is given to those models that realistically simulate these processes. Figure 6-7 shows a subset of the diagnostics used to eval- uate these processes and CCMs shown with solid curves 21st CENTURY OZONE LAYER Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs. Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb (see Figure 4-8), respectively. 50 hPa 80°S October 50 hPa 80°S October Cly –Cly (1980)(ppbv) Cly (ppbv) Year Year A large range of Cly in the model simulations Constrained by a limited number of Cly observations 33 Monday, August 11, 14
  • 31. • We need to know the distribution of inorganic chlorine (Cly) in the stratosphere to: • Attribute changes in stratospheric ozone to changes in halogens. • Assess the realism of chemistry- climate models. 34 Monday, August 11, 14
  • 32. Cly=HCl+ClONO2+ClO+HOCl +2Cl2O2+2Cl2 Long time-series Sporadic Long time-series Since 2004 Estimating Cly is hampered by lack of observations Estimating Cly is hampered by inter-instrument biases 35 Monday, August 11, 14
  • 33. Using PDFs for Bias Detection 0.8 1 1.2 1.4 1.6 1.8 2 x 10 9 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 HCl v.m.r. RelativeFrequency 2005/01 (460 K< < 590 K, 49 o < el < 61 o ) ACE v2.2 HCl (75) Aura MLS HCl (1544) HALOE v19 HCl (101) http://www.pdfcentral.info/ HALOE -Aura HCl If we now repeat this globally for all periods of overlap 36 Monday, August 11, 14
  • 34. 0 1 2 3 4 0 1 2 3 4 HALOE HCl (ppbv) ATMOSHCl(ppbv) Slope = 1.05 Intercept = 0.23 ppbv Data 1:1 Weighted Fit HCl Inter-comparisons 37 Monday, August 11, 14
  • 35. 0 1 2 3 4 0 1 2 3 4 HALOE HCl (ppbv) ACEHClv2.2(ppbv) Slope = 1.18 Intercept = −0.050 ppbv Data 1:1 Weighted Fit HCl Inter-comparisons 37 Monday, August 11, 14
  • 36. 0 1 2 30 1 2 3 HALOE HCl (ppbv) MLSHCl(ppbv) Slope = 1.09 Intercept = 0.070 ppbv Data 1:1 Weighted Fit Fit HCl Inter-comparisons 37 Monday, August 11, 14
  • 37. 0 1 2 30 1 2 3 HALOE HCl (ppbv) MLSHCl(ppbv) Slope = 1.09 Intercept = 0.070 ppbv Data 1:1 Weighted Fit Fit 0 1 2 3 0 1 2 3 HALOE HCl (ppbv) NN adjusted MLSHCl(ppbv) Slope = 0.995 Iintercept = 0.0093 ppbv Data 1:1 Weighted Fit HCl Inter-comparisons 37 Monday, August 11, 14
  • 39. An example neural network Inputs Outputs Process 39 Monday, August 11, 14
  • 40. An example neural network Inputs Outputs Process 39 Objective design of neural networks using genetic algorithms Monday, August 11, 14
  • 41. An example neural network 40 Monday, August 11, 14
  • 42. Re-calibration using a Neural Network 0.5 1 1.5 2 2.5 3 3.5 x 10 9 0.5 1 1.5 2 2.5 3 3.5 x 10 9 Targets T OutputsA,LinearFit:A=(0.97)T+(5e11) HCl Training Outputs vs. Targets, R=0.98739 Training Data Points Best Linear Fit A = T 0.5 1 1.5 2 2.5 3 3.5 x 10 9 0.5 1 1.5 2 2.5 3 3.5 x 10 9 Targets T OutputsA,LinearFit:A=(0.98)T+(2.9e11) HCl Validation Outputs vs. Targets, R=0.99232 Validation Data Points Best Linear Fit A = T 41 Monday, August 11, 14
  • 43. Re-calibration using a Neural Network 0.5 1 1.5 2 2.5 3 3.5 x 10 9 0.5 1 1.5 2 2.5 3 3.5 x 10 9 Targets T OutputsA,LinearFit:A=(0.97)T+(5e11) HCl Training Outputs vs. Targets, R=0.98739 Training Data Points Best Linear Fit A = T 0.5 1 1.5 2 2.5 3 3.5 x 10 9 0.5 1 1.5 2 2.5 3 3.5 x 10 9 Targets T OutputsA,LinearFit:A=(0.98)T+(2.9e11) HCl Validation Outputs vs. Targets, R=0.99232 Validation Data Points Best Linear Fit A = T Totally independent validation 41 Monday, August 11, 14
  • 45. Long-term continuity Applied Neural Network Re-calibration to HALOE 42 Monday, August 11, 14
  • 46. 1995 2000 2005 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 9 Year Cly Monthly average 2 o 800 K 525 K 6 Year Age 5 Year Age 4 Year Age 3 Year Age 2 Year Age 1995 2000 2005 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 9 Year Cl y Monthly average 61 o 800 K 525 K 6 Year Age 5 Year Age 4 Year Age 3 Year Age 2 Year Age October Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ. Long-term continuity for Cly 43 Monday, August 11, 14
  • 47. 1995 2000 2005 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 9 Year Cly Monthly average 2 o 800 K 525 K 6 Year Age 5 Year Age 4 Year Age 3 Year Age 2 Year Age 1995 2000 2005 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 9 Year Cl y Monthly average 61 o 800 K 525 K 6 Year Age 5 Year Age 4 Year Age 3 Year Age 2 Year Age October Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ. Long-term continuity for Cly ozone reductions there (SOCOL and E39C), and the model with the largest cold bias in the Antarctic lower strato- sphere in spring (LMDZrepro) simulates very low ozone. CCMs show a large range of ozone trends over the past 25 years (see left panels in Figure 3-26 of Chapter 3) and large differences from observations. Some of these differences may in part be related to differences in the sim- recovery due to declining ODSs, we place importance on the models’ ability to correctly simulate stratospheric Cly as well as the representation of transport characteristics and polar temperatures. Therefore, more credence is given to those models that realistically simulate these processes. Figure 6-7 shows a subset of the diagnostics used to eval- uate these processes and CCMs shown with solid curves 21st CENTURY OZONE LAYER Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs. Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb (see Figure 4-8), respectively. 50 hPa 80°S October 50 hPa 80°S October Cly –Cly (1980)(ppbv) Cly (ppbv) Year Year 43 Monday, August 11, 14
  • 48. 1995 2000 2005 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 9 Year Cly Monthly average 2 o 800 K 525 K 6 Year Age 5 Year Age 4 Year Age 3 Year Age 2 Year Age October Use neural networks to infer Cly from HCl, CH4, ϕpv, and θ. Long-term continuity for Cly ozone reductions there (SOCOL and E39C), and the model with the largest cold bias in the Antarctic lower strato- sphere in spring (LMDZrepro) simulates very low ozone. CCMs show a large range of ozone trends over the past 25 years (see left panels in Figure 3-26 of Chapter 3) and large differences from observations. Some of these differences may in part be related to differences in the sim- ulated Cly, e.g., E39C and SOCOL show a trend smaller than observed, whereas AMTRAC and UMETRAC show a trend larger than observed in extrapolar area weighted mean column ozone. However, other factors also con- tribute, e.g., biases in tropospheric ozone (Austin and Wilson, 2006). The CCM evaluation discussed above and in Eyring et al. (2006) has guided the level of confidence we place on each model simulation. The CCMs vary in their skill in representing different processes and characteristics of the atmosphere. Because the focus here is on ozone recovery due to declining ODSs, we place importance on the models’ ability to correctly simulate stratospheric Cly as well as the representation of transport characteristics and polar temperatures. Therefore, more credence is given to those models that realistically simulate these processes. Figure 6-7 shows a subset of the diagnostics used to eval- uate these processes and CCMs shown with solid curves in Figures 6-7, 6-8, 6-10 and 6-12 to 6-14 are those that are in good agreement with the observations in Figure 6-7. However, these line styles should not be over- interpreted as both the ability of the CCMs to represent these processes as well as the relative importance of Cly, temperature, and transport vary between different regions and altitudes. Also, analyses of model dynamics in the Arctic, and differences in the chlorine budget/partitioning in these models, when available, might change this evalu- ation for some regions and altitudes. 21st CENTURY OZONE LAYER 6.26 Figure 6-8. October zonal mean values of total inorganic chlorine (Cly in ppb) at 50 hPa and 80°S from CCMs. Panel (a) shows Cly and panel (b) difference in Cly from that in 1980. The symbols in (a) show estimates of Cly in the Antarctic lower stratosphere in spring from measurements from the UARS satellite in 1992 and the Aura satellite in 2005, yielding values around 3 ppb (Douglass et al., 1995; Santee et al., 1996) and around 3.3 ppb (see Figure 4-8), respectively. 50 hPa 80°S October 50 hPa 80°S October Cly –Cly (1980)(ppbv) Cly (ppbv) Year Year 43 Monday, August 11, 14
  • 51. Other uses of machine learning • Cross calibration of vegetation indices from AVHRR, MODIS, SPOT and SeaWIFS • Inferring CO2 fluxes from vegetation indices and surface temperature • Inferring ocean pigment concentrations and other parameters • Inferring drought stress and endophyte infection in cacao (coffee) • Learning the chaotically tumbling orbit of the Hubble space telescope • Detecting online ebay fraud • Acceleration of expensive code elements 46 Monday, August 11, 14
  • 52. Another application dissolved organic carbon 47 Monday, August 11, 14
  • 61. Method used to estimate DOC R SeaWiFS bands GP NL 0.99977 MODIS bands GP NL 0.9997 All bands GP NL 0.99901 UV & SeaWiFS bands GP NL 0.99899 All bands NN 0.95859 UV & SeaWiFS bands NN 0.94609 MODIS bands NN 0.92585 SeaWiFS bands NN 0.91653 49 Monday, August 11, 14
  • 62. 5 10 15 0 5 0 10 0 15 1 0.99 0.95 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.20.10 Standarddeviation Co r r e l a t i o n C o e f f ic ient RM S D A B C D E F G H Gaussian Process Models50 Monday, August 11, 14
  • 63. Relative Importance of the Inputs Wavelength Relative Importance Rrs490 0.00087123 Rrs555 0.011976 Rrs670 1.5876 Rrs510 9.8423 Rrs443 13.0898 Rrs412 20.2553 The GPM hyper-parameters give an indication of the relative importance of the inputs. For the DOC SeaWiFS bands the best inputs are those with the smallest values, here they are sorted in order of importance Most Important Least Important 51 Monday, August 11, 14
  • 64. −0.5 0 0.5 1 1.5 2 0 5 10 15 20 25 30 35 40 a412−a443 Salinity Salinity Data Polynomial (r2 =0.928) NN (r 2 =0.933) SVM (r2 =0.933) 52 Monday, August 11, 14
  • 66. High Resolution Identification of Dust Sources Using Machine Learning and Remote Sensing Data Annette Walker and David J. Lary A42A-08 Monday, August 11, 14
  • 67. NRL High-resolution Dust Source Database 20030820 NRL DEP20030820 NRL DEP Iran Pakistan Iran Pakistan • 10 years of DEP (2 yr MSG/RGB) imagery • COAMPS 10 m wind overlays • Surface weather plots • ENVI (Gis-like software) • NGDC topographical 10ºX10º tiles • Overlay 0.25º grid or use Google Earth (GE) • Dust source area entered into database (cursor location tool = 1km precision) • Cross-correlate land and water features using maps, atlases, Landsat images (detailed topographic, geographic, and geomorphic information, GE) • Technical and governmental reports Approach and Methodology 20110630 NRL MSG/RGB Saudi Arabia 20030820 MODIS True Color Monday, August 11, 14
  • 68. NRL High-resolution Dust Source Database 20030820 NRL DEP20030820 NRL DEP Iran Pakistan Iran Pakistan • 10 years of DEP (2 yr MSG/RGB) imagery • COAMPS 10 m wind overlays • Surface weather plots • ENVI (Gis-like software) • NGDC topographical 10ºX10º tiles • Overlay 0.25º grid or use Google Earth (GE) • Dust source area entered into database (cursor location tool = 1km precision) • Cross-correlate land and water features using maps, atlases, Landsat images (detailed topographic, geographic, and geomorphic information, GE) • Technical and governmental reports Approach and Methodology 20110630 NRL MSG/RGB Saudi Arabia 20030820 MODIS True Color20030820 NRL DEP Iran Pakistan Monday, August 11, 14
  • 69. NRL High-resolution Dust Source Database Solid red and purple shapes identify dust source areas located using DEP and MSG. SW Asia DSD East Asia DSD Mongolia Saudi Arabia Monday, August 11, 14
  • 70. Self-Organizing Map Self-organizing maps (SOMs) are a data visualization and unsupervised classification technique invented by Professor Teuvo Kohonen (Kohonen 1982; 1990) that reduce the dimensions of data through the use of self-organizing neural networks. They help us address the issue that humans simply cannot visualize high dimensional data. Monday, August 11, 14
  • 71. Self-Organizing Map SOMs reduce dimensionality by producing a map that objectively plots the similarities of the data by grouping similar data items together. SOMs learn to classify input vectors according to how they are grouped in the input space. SOMs learn both the distribution and topology of the input vectors they are trained on.This approach allows SOMs to accomplish two things, reduce dimensions and display similarities. Monday, August 11, 14
  • 73. Self Organizing Map Classification 7 Bands MODIS MCD43C3 bihemispherical reflectance Monday, August 11, 14
  • 74. All 1000-Classes mapped for North Africa Monday, August 11, 14
  • 75. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 76. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 77. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 78. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 79. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 80. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 81. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 82. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 83. Libyan Dust Event: May 9, 2010 (8Z – 12Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 84. Plumes originate on leeward side of Al Jabal al Akhdar where drainage occurs along slopes. Corresponding SOM-Classes: 49, 93, 94 Libyan Dust Event: May 9, 2010 (6Z – 8Z) Jabal al Akhdar (‫األخضر‬ ‫الجبل‬‎ Al Ǧabal al 'Aḫḍar, English: Green Mountains) A coastal mountain range with height 1.0-1.5 km. Monday, August 11, 14
  • 85. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 86. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 87. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 88. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 89. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 90. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 91. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 92. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 93. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 94. Chad: Bodélé Depression Dust Event: March 16, 2010 (7Z -12Z) Located at the southern edge of the Sahara Desert in north central Africa, is the lowest point in Chad. Dust storms from the Bodélé Depression occur on average about 100 days per year. The Bodélé depression is a single spot in the Sahara that provides most of the mineral dust to the Amazon forest. Monday, August 11, 14
  • 95. Selected SOM Classes Chad: Bodélé Depression NRL MSG-RGB 20110109 Source area is not designated in first pass of MODIS reflectance and land surface classification. 1000 SOM Classes Monday, August 11, 14
  • 96. Selected Classes with Class 137 Chad: Bodélé Depression NRL MSG-RGB 20110109 Class 137 maps diatom sediment in depression. 1000 SOM Classes Monday, August 11, 14
  • 97. Selected Classes Without Class 137 Chad: Bodélé Depression NRL MSG-RGB 201101091000 SOM Classes Class 137 maps diatom sediment in depression. Monday, August 11, 14
  • 98. Solid black circles/ovals show plume source Corresponding SOM Classes within open circles/ovals Northern Sahara: 36, 40, 63, 100 Sahel: 147, 229, 230, 405 West Africa: Feb 2, 2011 13Z Monday, August 11, 14
  • 99. Selected Classes for North Africa (This involves 40 distinct classes) Monday, August 11, 14
  • 100. Jan 1, 2006 True Color Jan 1, 2006 NRL DEP Sources along New Mexico/Texas border The North American sources have a different spectral signature than those we saw in SW Asia Agricultural on high planes Blue dessert areas Monday, August 11, 14
  • 101. Sources in Arizona and Colorado Apr 17, 2006 NRL DEP Apr 17, 2006 True color Monday, August 11, 14
  • 102. Selected Classes for North America (n=64) Monday, August 11, 14
  • 103. All 1000-Classes mapped for South America Monday, August 11, 14
  • 104. All 1000-Classes mapped for South America Blue colored SOM-Classes are concentrated in Atacama and Salar de Uyuni deserts White areas are salt flats Monday, August 11, 14
  • 105. South America: Bolivia and Chile July 18, 2010 MODIS Terra True Color Monday, August 11, 14
  • 106. South America: Bolivia and Chile July 18, 2010 MODIS Terra True ColorSelected SOM-Classes in 200s, 300s, and 400s Monday, August 11, 14
  • 107. • SOMs provide an effective mechanism for automating the identification of dust sources. • Using the SOMs let us globally map dust sources at high resolution 1-10 km. • Saved time in finding dust sources while comparing to satellite imagery. • This can be done in real time to have dynamically changing dust sources. Monday, August 11, 14
  • 109. Model& Exis+ng& New& Model& Exis+ng& New& • Personalized Health Care • Proactive Health Care System • Business Analytics • Smart Logistics • Disaster Response • Fraud Detection http://holistics3.com Monday, August 11, 14