2. Satellite Observations Meteorological Analyses Population Density In-situ observationsSocial Media
Combined Using
Machine Learning
to Provide a
High-Resolution
Global Products
Combined with Electronic Health Records to provide:
1. Real time personal health alerts
2. Physician Decision Support Tools
3. Logistical Planning for Emergency Rooms
4. Improved Policy Decisions
Next Generation of High Speed Networks to Facilitate the
Next Generation of Proactive Smart Health Care Applications
Veteran’s Administration
Country’s Largest Health Care Provider
Requires ultra-low latency gigabit to the end user
Local cloud computing coupled with widely distributed national and global sensor networks
Multiple global high-resolution datasets
Prof. David Lary
Monday, August 11, 14
3. Next Generation of High Speed Low latency Networks to Facilitate the
Next Generation of Smart Fire Detection & Water Conservation Applications
Requires ultra-low latency wireless gigabit for very-high resolution hyperspectral video imagery for real time flight control of aerial vehicles
11 drought-ridden western and central states have just
been declared as primary natural disaster areas seriously
threatening US food security. Further, every year between
$1 and $2 billion dollars are spent on fire suppression
costs alone.
A fleet of low cost aerial vehicles working together
autonomously utilizing uncompressed very-high
resolution hyperspectral video imagery.The geo-tagged
imagery is streamed using high-speed low-latency wireless
networks to communicate to a powerful cloud computing
cluster running machine learning and image processing
algorithms for real time direction of the optimal flight
patterns, and the delivery of early warning for timely
interventions.
Fire: Appropriate preemptive fire prevention can lead to
massive savings in fire control costs, loss of life, and
property damage.
Agriculture: Appropriate and timely early warning of
crop infestations, infections and/or water stress can
prevent massive avoidable losses.
Prof. David Lary
20 lb Airborne 385 channel
hyperspectral imaging system
Monday, August 11, 14
4. Next Generation of High Speed Networks to Facilitate the
Next Generation of Smart Water Management Applications
With Drought Disaster Declarations in 11 western and central
states, smart water management is now more critical than ever
for sustainable water conservation and US Food Security.
Coupling high resolution remote sensing from satellites, with
machine learning, and the next generation of high speed low
latency networks is facilitating the next generation of smart
water management systems. These systems will benefit
individual home owners, farmers, corporate campuses, golf
courses, etc. and allow optimum monitoring and control of
irrigation using mobile devices.
Sports fields
blown valves lead to flooding
uneven irrigation
Prof. David Lary
Monday, August 11, 14
6. Unprecedented levels of air pollution in Singapore and Malaysia in June led to respiratory illnesses, school closings, and
grounded aircraft. This year it was so bad that in some affected areas there was a 100 percent rise in the number of asthma
cases, and the government of Malaysia distributed gas masks.
MODIS Aqua July 21, 2013.
David Lary
Monday, August 11, 14
9. This is a BigData Problem of
Great Societal Relevance
• Collecting data in real time from national and
global networks requires bandwidth.
• With the next generation of wearable sensors and
the internet of things this data volume will
rapidly increase.
• A variety of applications enabled by BigData,
higher bandwidth and cloud processing.
• Future finer granularity and two way
communication will dramatically increase the size
of the data bringing air quality to the micro scale,
just like weather data.
Time Taken
10 Mbps 20 Mbps 50 Mbps 1 Gbps
40 TB training data
4 Gb update
185 days 93 days 37 days 1 day 21 hours
54m 27m 11m 32s
Monday, August 11, 14
12. Using Multiple Big Datasets and Machine Learning
to Produce a New Global Particulate Dataset
A Technology Challenge Case Study
David Lary
Hanson Center for Space Science
University of Texas at Dallas
Monday, August 11, 14
17. How?
Used around 40 different BigData sets from satellites, meteorology,
demographics, scraped web-sites and social media to estimate PM2.5. Plot
below shows the average of 5,935 days from August 1, 1997 to the present.
Monday, August 11, 14
18. Which Platform?
Requirements:
1. Large persistent storage for multiple BigData sets, 100TB+ (otherwise
before have had time to process the massive datasets the scratch space
time limit has expired)
2. High Bandwidth connections
3. Ability to harvest social media (e.g. twitter) and scrape web sites for data
4. High level language with wide range of optimized toolboxes, matlab
5. Algorithms capable of dealing with massive non-linear, non-parametric,
non-Gaussian multivariate datasets (13,000+ variables)
6. Easy to make use of multiple GPUs and CPUs
7. Ability to schedule tasks at precise times and time intervals to automate
workflows (in this case tasks executed at intervals of 5 minutes, 15 minutes,
1 hour, 3 hours, 1 day)
Monday, August 11, 14
23. Hourly Measurements from 55 countries and more
than 8,000 measurement sites from 1997-present
A lot of measurements,
but notice the large gaps!
Monday, August 11, 14
24. Gaps are inevitable because of the
infrastructure and cost associated with
making the measurements.
Hourly Measurements from 55 countries and more than
8,000 measurement sites from 1997-present
Monday, August 11, 14
25. Challenge 1: Obtaining the in-situ PM2.5 data
Real time data from:
1. EPA AirNow data for USA and Canada
2. EEA data for Europe
3. Tasmania and Australia
4. Israel
5. Russia
6. Asia and Latin America by scraping http://aqicn.org/map/
7. Harvesting social media (twitter feeds from US Embassies)
Relative low bandwidth from multiple sites every 5 minutes
Monday, August 11, 14
26. Challenge 2: (Easier)
Obtaining the Satellite & Meteorological Data
Real time data from:
1. Multiple satellites MODIS Terra, MODIS Aqua, SeaWIFS, VIIRS NPP etc
2. Global Meteorological Analyses
High bandwidth from few sites every 1 to 3 hours
Monday, August 11, 14
27. Challenge 3:
Combine multiple BigData Sets with Machine Learning
Large member machine learning ensemble using massively parallel computing
to produce PM2.5 data product
Algorithms capable of dealing with massive non-linear, non-parametric, non-
Gaussian multivariate datasets (13,000+ variables)
Drastically reduced development time by using a high level language (Matlab)
that can easily exploit parallel execution using both multiple CPUs and GPUs.
Massively parallel every 3 hours
High level language which can readily use CPUs and GPUs
Monday, August 11, 14
28. Challenge 4:
Continual Performance Improvement
Currently on around 400th version of system.
Have been making continuous improvements in:
1. Coverage of in-situ training data set
2. Inclusion of new satellite sensors
3. Additional BigData sets that help improve fidelity of the non-linear, non-
parametric, non-Gaussian multivariate machine learning fits
4. Using many alternative machine learning strategies
5. Estimate uncertainties.
6. This requires frequent reprocessing of the entire multi-year record from
1997-present
Persistent massive data storage, much more
than usual scratch space at HPC centers
Monday, August 11, 14
35. Lake Eyre after a long dry period is a
region of high PM2.5 abundance.
Lake Eyre after heavy rains is a region of
lower PM2.5 abundance than usual
Monday, August 11, 14
38. Key System Requirements:
Not always available on current HPC systems
Requirements:
1. Large persistent storage for multiple BigData sets, 100TB+ (otherwise
before have had time to process the massive datasets the scratch space
time limit has expired)
2. High Bandwidth connections
3. Ability to harvest social media (e.g. twitter) and scrape web sites for data
4. High level language with wide range of optimized toolboxes, matlab
5. Algorithms capable of dealing with massive non-linear, non-parametric,
non-Gaussian multivariate datasets (13,000+ variables)
6. Easy to make use of multiple GPUs and CPUs
7. Ability to schedule tasks at precise times and time intervals to automate
workflows (in this case tasks executed at intervals of 5 minutes, 15 minutes,
1 hour, 3 hours, 1 day)
Monday, August 11, 14
39. THRIVETimely Health indicators using Remote sensing &
Innovation for the Vitality of the Environment
Prevention is better than cure
David.Lary@utdallas.edu
Address key societal issues
Monday, August 11, 14
40. 41
VA Decision Support Tools
More Than 40 Data Products from In-situ Observations, NASA Earth Observations, Earth System
Models, Population Density & Emission Inventories
Personalized Alerts Dr. Watson
Staffing & Resource
Management
Machine Learning
Daily Global Air
Quality Estimates
NASA Earth
Observation Data
NASA Earth System
Model Products
Population Density and
Other Related Products
ER Admissions
All ICD Codes
All Prescriptions
Machine
Learning
Machine
Learning
THRIVE Medical
Environment Analytics
Engine
Monday, August 11, 14
42. Environmental Impact
Have the environmental conditions
for any day, anywhere on the planet
from 1997-present as a context and
simulate the likely health outcomes
Midday
Monday, August 11, 14
43. Environmental Impact
Have the environmental conditions
for any day, anywhere on the planet
from 1997-present as a context and
simulate the likely health outcomes
Midday
Monday, August 11, 14
44. June 27, 2012 July 21, 2012 September 14, 2012July 5, 2011
Set training in any time or place and retrieve the
actual environmental conditions, visibility, weather,
air borne particulates and simulate the health
outcomes. ALL driven by data.
Monday, August 11, 14
45. Water
The Motive
Do well with the Details by
embracing the Big Picture
Prof. David Lary
+1 (972) 489-2059
http://davidlary.info
david.lary@utdallas.edu
Monday, August 11, 14
46. Western U.S. Drought Prompts
Disaster Declarations In 11 States
By MICHELLE RINDELS 01/16/14 07:51 PM ET EST
LAS VEGAS (AP) — Federal officials have designated portions of 11 drought-
ridden western and central states as primary natural disaster areas,
highlighting the financial strain the lack of rain is likely to bring to farmers in
those regions.
The announcement by the U.S. Department of Agriculture on Wednesday
included counties in Colorado, New Mexico, Nevada, Kansas, Texas, Utah,
Arkansas, Hawaii, Idaho, Oklahoma and California.
Rancher Ralph Miller, 79, checks on one of many “stock tanks” of water that are
receding due to the severe drought. “I’d say it’s just about as bad as it can get.”
Barnhart, Texas
Monday, August 11, 14
47. “Water is the new oil”
Jim Rogers, chief executive of Duke Energy
... and many others
Water crisis in California, Texas threatens
US food security
Western water scarcity issues becoming more severe
Western Farm Press, Jun. 5, 2012
University of Texas at Austin
California and Texas produced agricultural products worth $56 billion
in 2007, accounting for much of the nation's food production. They
also account for half of all groundwater depletion in the U.S.,
mainly as a result of irrigating crops.
The nation’s food supply may be vulnerable to rapid groundwater
depletion from irrigated agriculture, according to a new study by
researchers at The University of Texas at Austin and elsewhere.
http://westernfarmpress.com/irrigation/water-crisis-california-texas-threatens-us-food-security
Monday, August 11, 14
48. Since 1980 the population
of Texas has more than
doubled, but the reservoir
capacity has remained
almost unchanged.
During 2011the reservoir
levels were the lowest
during Sep-Dec that they
have been since 1990.
In 2014 we are starting out
with lower levels than 2013.
Monday, August 11, 14
49. Smarter irrigation control is invaluable!
If we can use existing
infrastructure it is even better!
.... from farm, to corporate campus, to golf course, to your back yard.
Monday, August 11, 14
50. When great societal need meets
appropriate scalable solution
there is much societal and
economic benefit to be
gained
Monday, August 11, 14
53. DATE: 12-Feb-2010
DOC NO: 0115056
ISSUE: 02
DMC DATA PRODUCT MANUAL
STATUS: FINAL
Page 16 of 127
Figure 4: Detector and channel layout of the SLIM-6-22 imager
Imager Bank 0
Channel 6
Green
Channel 5
Red
Channel 4
NIR
Imager Bank 1
Channel 1
NIR
Channel 2
Red
Channel 3
Green
Pixel 1
Pixel 14436
Pixel 14436
Pixel 1
Monday, August 11, 14
54. 20 lb Airborne hyperspectral imaging system
385 channels between 400-1,700 nm
Hyperspectral data cube
Monday, August 11, 14
60. J S Famiglietti, and M Rodell Science 2013;340:1300-1301
an accuracy of 1.5 cm equivalent water height.
Because GRACE measures changes in
total water storage, it integrates the impacts
of natural climate fluctuations, global change,
and human water use, including groundwater
extraction, which in many parts of the world
is unmeasured and unmanaged. GRACE-
derived rates of groundwater losses in the
world’s major aquifer systems (4–6) under-
score the critical need to improve monitor-
ing and regulation of groundwater systems
before they run dry.
Regional flooding and drought are driven
by the surplus or deficit of water in a river
basin or an aquifer, yet few hydrologic
observing networks yield sufficient data for
comprehensive monitoring of changes in
the total amount of water stored in a region.
GRACE observations have helped to fill
this gap. They have been used to character-
ize regional flood potential (8) and to assess
water storage deficits in the U.S. Drought
Monitor (9) and are included in annual State
of the Climate reports (10). As an integrated
measure of all surface and groundwater stor-
age changes, GRACE data implicitly contain
a record of seasonal to interannual water stor-
age variations that can likely be exploited to
key tools for predicting future water avail-
ability, difficult to validate. Low-resolution
GRACE data, when combined with higher-
resolution model simulations, provide an
independent constraint on simulated water
balances, while also adding spatial detail to
GRACE’s low-resolution perspective (11).
They are widely used to evaluate land surface
models used by weather and climate forecast-
ing centers around the world (12).
Evapotranspiration is a key factor in
interbasin water allocations, yet because it
disperses into the atmosphere in the vapor
phase, it confounds standard measurement
techniques. The ability of GRACE to weigh
changes in water stored in an entire river
basin allows evapotranspiration to be esti-
mated in a water balance framework (13).
Transboundary water availability issues
require sharing hydrologic data across politi-
cal boundaries. However, national hydrolog-
ical records are often withheld for political,
socioeconomic, and defense purposes, com-
plicating regional water management discus-
sions. Several studies have used GRACE data
to circumvent international data denial prac-
tices, including in those involving lakes (14),
river basins (6), and aquifers (4, 6). Likewise,
higher spatial (<50,000 km ) and tempor
(weekly or biweekly) resolution, for exam
ple through novel orbital configurations, s
that smaller river basins and aquifers can b
observed directly.The availability of GRAC
data at these finer scales, at which most pla
ning decisions are made, would likely ensu
their broader use in water management.
The GRACE-FO mission is on sched
ule for a 2017 launch, but a next-generatio
improved GRACE mission is still und
design and as yet unconfirmed. Given i
demonstrated contributions to date and th
potential for much more, a future without
GRACE mission in orbit would be an unfo
tunate and unnecessarily risky backward ste
for regional water management.
References
1. P. J. Durack et al., Science 336, 455 (2012).
2. K. E. Trenberth, Clim. Res. 47, 123 (2011).
3. I. M. Held, B. J. Soden, J. Clim. 19, 5686 (2006).
4. V. M. Tiwari, J. Wahr, S. Swenson, Geophys. Res. Lett. 36,
L18401 (2009).
5. B. R. Scanlon et al., Proc. Natl. Acad. Sci. U.S.A. 109,
9320 (2012).
6. K. A. Voss et al., Water Resour. Res. 49, 904 (2013).
7. B. D. Tapley et al., Science 305, 503 (2004).
8. J. T. Reager, J. S. Famiglietti, Geophys. Res. Lett. 36,
L23402 (2009).
9. R. Houborg et al., Water Resour. Res. 48, W07525 (2012
10. J. Blunden, D. S. Arndt, Eds., Bull. Am. Meteorol. Soc. 9
S1 (2012).
11. B. F. Zaitchik et al., J. Hydrometeorol. 9, 535 (2008).
12. S. C. Swenson, P. C. D. Milly, Water Resour. Res. 42,
W03201 (2006).
13. G. Ramillien et al., Water Resour. Res. 42, W10403 (2006
14. S. Swenson, J. Wahr, J. Hydrol. 370, 163 (2009).
15. J. S. Famiglietti, Abstract GC31D-01, fall meeting, AGU
San Francisco, 3 to 7 December 2012.
Supplementary Materials
Mixed picture. Between 2003 and 2012, GRACE data show water losses in agricultural regions such as Cali-
fornia’s Central Valley (1) ( 1.5 ± 0.1 cm/year) and the Southern High Plains Aquifer (2) ( 2.5 ± 0.2 cm/
year), caused by overreliance on groundwater to supply irrigation water. Regions where groundwater is being
depleted as a result of prolonged drought include Houston (3) ( 2.3 ± 0.6 cm/year), Alabama (4) ( 2.1 ±
0.8 cm/year), and the Mid-Atlantic states (5) ( 1.8 ± 0.6 cm/year). Water storage is increasing in the flood-
prone Upper Missouri River basin (6) (2.5 ± 0.2 cm/year). See fig. S1 for monthly time series for all hot spots.
Data from (15) and from GRACE data release CSR RL05.
Monday, August 11, 14
61. Summary
• Vegetation Index is dependent on amount of
irrigation
• Regular (weekly) remote sensing inspection
could allow us to:
• Appropriate irrigation zones
• Help identify regions of over watering
• Help identify any burst pipes/valves
• Optimize irrigation patterns
• Automate sprinkler system controls
• Progressively more benefit as a specific
history of the plots/site is built up
Monday, August 11, 14
63. FUTURE Water Management
Water Mgmt.
“Smart-GRID*”
Delivery
Models
Basin
Geodata
Water/Crop
Status & Forecast
Water Need
Status & Forecast
Water
Agric +Others Use
& Forecast
*Dept. of Energy 2013Alfonso Torres
Monday, August 11, 14
64. FUTURE Water Management
Why Agriculture? ~80% water use US (USDA 2013)
Challenges: Climate change, Drought, Population, non-ag water uses.
Water Use Efficiency: ~50% US (USDA 2004)
Water Mgmt.
“Smart-GRID*”
Delivery
Models
Basin
Geodata
Water/Crop
Status & Forecast
Water Need
Status & Forecast
Water
Agric +Others
Status & Forecast
Monday, August 11, 14
65. CURRENT Water Management
Current Water
Mgmt.
Delivery
Models
Basin
Geodata
Water/Crop
Status & Forecast
Water
Status & Forecast
Water
Agric +Others
Status & Forecast
Alfonso Torres
Monday, August 11, 14
67. CWMIS Case Example: Water Use vs. Delivery
TOP: crop water use vs. water delivery (ac-ft).
BOTTOM: water use difference (ac-ft)
Typically save at least 10%
Can be done on a field by field, campus by campus, home by home,
or golf course by golf course basis or for an entire basin.
Alfonso Torres
Monday, August 11, 14
68. Culex tarsalis
West Nile Virus
The same data infrastructure can also
be used to help combat West Nile Virus
by identifying breeding sites.
Monday, August 11, 14
69. P. vivax is carried by the female Anopheles mosquito
Monday, August 11, 14
70. Plasmodium vivax is a protozoal parasite and a human
pathogen. The most frequent and widely distributed
cause of recurring (Benign tertian) malaria, P. vivax is
one of the six species of malaria parasites that
commonly infect humans.[1] It is less virulent than
Plasmodium falciparum, the deadliest of the six, but
vivax malaria can lead to severe disease and death.[2]
[3] P. vivax is carried by the female Anopheles mosquito,
since it is only the female of the species that bite.
Plasmodium vivax
Plasmodium falciparum http://www.worldmalariareport.org/
Monday, August 11, 14
71. Seasonal climatic suitability for malaria transmission (CSMT)
Climatic conditions are considered to be suitable for transmission when the monthly precipitation
accumulation is at least 80 mm, the monthly mean temperature is between 18°C and 32°C and the
monthly relative humidity is at least 60%. These thresholds are based on a consensus of the
literature. In practice, the optimal and limiting conditions for transmission are dependent on the
particular species of the parasite and vector.
Commentary: Web-based climate information resources for malaria control in Africa
Emily K Grover-Kopec, M Benno Blumenthal, Pietro Ceccato, Tufa Dinku, Judy A Omumbo and Stephen J Connor*
Malaria Journal 2006, 5:38 doi:10.1186/1475-2875-5-38
Monday, August 11, 14
73. 0 500 1,000 Km
Vectorial Capacity
In Zones with Malaria Epidemic Potential
05 August - 12 August 2013
VCAP Values
0
0 - 2
2 - 4
4 - 6
6 - 8
8 - 10
10 - 15
15 - 20
> 20
Country Boundaries
Monday, August 11, 14
74. Satellite imagery can be used to track mosquito habitats.
High-resolution (5 m) satellite images can identify
very small water bodies, wetlands and other
malaria-relevant land-cover types.
Of the 225 million annual reported
cases of the disease, 212 million of
these occur in Africa. Of the
800,000 Malaria-related deaths
each year, 90% of these fatalities
occur in sub-Saharan Africa.
http://www.itweb.co.za/index.php?option=com_content&view=article&id=52695
Monday, August 11, 14
75. US IgniteEnables
Innovative “Big Data” Application Development for an
Application Development Hub
!!
Don Hicks
Monday, August 11, 14