Organizations around the world are facing a "data tsunami" as next-generation sensors produce enormous volumes of Earth observation data. Come learn how NASA is leveraging AWS to efficiently work with data and computing resources at massive scales. NASA is transforming its Earth Sciences EOSDIS (Earth Observing System Data Information System) program by moving data processing and archiving to the cloud. NASA anticipates that their Data Archives will grow from 16PB today to over 400PB by 2023 and 1 Exabyte by 2030, and they are moving to the cloud in order to scale their operations for this new paradigm. Learn More: https://aws.amazon.com/government-education/
2. Why does AWS care about open data?
Sharing data on AWS makes it accessible to
a large and growing community of
researchers, entrepreneurs, and enterprises
who use the AWS cloud.
3. “…data must be organized, well-
documented, consistently formatted, and
error free. Cleaning the data is often the
most taxing part of data science, and is
frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
Undifferentiated Heavy Lifting
4. NEXRAD on AWS
5
Done in collaboration with Unidata
and NOAA’s National Centers for
Environmental Information
Climate Corporation cut two weeks
out of an analysis pipeline
Increased NEXRAD usage 2.3✕
A weather data company stopped
storing their own NEXRAD archive,
freeing up revenue to build new
products.
Hundreds of terabytes of high-resolution RADAR data available on AWS.
10. NASA’s Earth Science Data Systems Program
Actively manages NASA’s earth science
data as a national asset (Satellite,
Airborne and Field).
Develops capabilities optimized to
support rigorous science investigations.
Processes (and reprocesses)
instrument data to create high quality
long-term earth science data records.
http://go.nasa.gov/2mMd5g1
11. Earth Science Open Data Policy
NASA Earth Science data are free and open to all
users for any purpose as quickly as practical after
instrument checkout and calibration
12. Earth Observing System Data and Information
System (EOSDIS)
EOSDIS
Applications
Capture
and Clean
Education
Process
Archive
Transform*
Distribute
Research
*Subset, reformat, reproject
13. SIPS DAAC
Distributed Active Archive Centers (DAACs), collocated with centers of
science discipline expertise, archive and distribute standard data
products produced by Science Investigator-led Processing Systems
(SIPS)ASF DAAC
SAR Products, Sea Ice,
Polar Processes
PO.DAAC
Ocean Circulation
Air-Sea Interactions
NSIDC DAAC
Cryosphere, Polar
Processes
LPDAAC
Land Processes
and Features
GHRC
Hydrological Cycle and
Severe Weather
ORNL
Biogeochemical
Dynamics, EOS
Land Validation
ASDC
Radiation Budget,
Clouds, Aerosols, Tropo
Composition
LAADS/MODAPS
Atmosphere
OB.DAAC
Ocean Biology and
Biogeochemistry
SEDAC
Human Interactions in
Global Change
CDDIS
Crustal Dynamics
Solid Earth
GES DISC
Atmos Composition &
Dynamics, Global
Modeling, Hydrology,
Radiance
NCAR, U. of Co.
MOPITT
JPL
MLS, TES, SNPP
Sounder
U. of Wisc.
SNPP
Atmosphere
GHRC
AMSR-U,
LIS
GSFC
SNPP, MODIS,
OMI, OBPG
15. Lightning fast, always available
- 95% queries complete in <1s
- 99.98% uptime (last 365d)
Big Data Ready
- 34K collections
- 367 million files indexed
- Prepared to scale 1B+ records
Standards-focused
Community-focused
Internationally Recognized
Common Metadata Repository
16. Starting AWS Migration
Since September 2016, EOSDIS has migrated
two of its core systems, Common Metadata
Repository (CMR) and Earthdata Search, into
the Amazon Cloud to immense success.
• One year migration effort
• Supported by NASA CIO
• Over 500K queries per day
• Open Source
• Open Access API
17. Data Centric End Users
https://search.earthdata.nasa.gov
Imagery Centric End Users
https://worldview.earthdata.nasa.gov
DEMO
18. Total EOSDIS Data Archive Volume (Petabytes)
2000-2017 (through 1/31/17)
20. What could a future data system architecture look like?
EOSDIS works well, but can we do better?
• Can we evolve NASA archives to better support interdisciplinary
Earth science researchers?
• What system architecture(s) will allow our holdings to become
interactive and easier to use for research and commercial users?
• Can we afford additional functionality?
• How will data from multiple agencies, international partners and the
private sector be combined to study the earth as a system?
• GOES-R, CubeSats, Copernicus…
21. Landsat 9
(2020)
NISAR (2022)
SWOT (2021)
TEMPO (2018)
JPSS-2 (NOAA)
OMPS-Limb (2018)
GRACE-FO (2) (2018)
ICESat-2 (2018)
CYGNSS (2016)
ISS
SORCE, (2017)
TCTE (NOAA)
NISTAR, EPIC (2019)
(NOAA’S DSCOVR)
QuikSCAT (2017)
EO-1
(2017)Landsat 7
(USGS)
(~2022)
Terra
(>2021
)Aqua(>2022)
CloudSat (~2018)
CALIPSO (>2022)
Aura
(>2022)
SMAP
(>2022)
Suomi NPP
(NOAA) (>2022)
Landsat 8
(USGS) (>2022)
GPM (>2022)
OCO-2
(>2022)
GRACE (2)
(2018)
OSTM/Jason 2 (>2022)
(NOAA)
(Pre)Formulation
Implementation
Primary Ops
Earth Science Instruments on ISS:
CATS, (2020)
LIS, (2017)
SAGE III, (2017)
TSIS-1, (2018)
ECOSTRESS, (2018)
GEDI, (2019)
TSIS-2 (2020)
Sentinel-6A/B (2020, 2025)
MAIA (~2021)
TROPICS (~2021)
GeoCarb (~2022)
Formulation
Implementation
Primary Ops
Extended Ops
InVEST/Cubesats
MiRaTA (2017)
RAVAN (2016)
IceCube (2017)
HARP (2017)
TEMPEST-D (2018)
RainCube (2018*)
CubeRRT (2018*)
CIRiS (2018*)
CIRAS (2018*)
LMPC (----)
*Target date, not yet
manifested
16 new instruments
and missions.
New missions and
measurements from
Decadal Survey.
User expectations
continue to evolve.
5 Years from
Today
22. So Our Archive Slated to Grow Substantially...
You are here
24. Conceptual “Data Close to Compute”
Application and service layer using
AWS compute, storage (S3, S3IA,
Glacier), and cloud native
technologies
EOSDIS Applications & Services
Science community brings algorithms to
the data. Support for NASA & non-NASA
Non-ESDIS / Public Applications
& Services
Archive
CatalogSearch
Ingest Access
Analytics
Processing
Application
Data
Centralized mission
observation & model
datasets stored in auto
graduated AWS object
storage (S3, S3-IA, Glacier)
Large Volume Data Storage
Scalable Compute
Provision, Access, and
terminate dynamically
based on need. Cost by
use
Cloud Native
Compute
Cloud vendor service software stacks
and microservices easing deployment
of user based applications
Compute
Native
Compute
25. Cloud Benefits for Data Systems (EOSDIS)
Cost-Effective: Only pay for the compute and
storage needed. Easy to separate and fund by
project/function.
Scalable Performance: Data Close to compute,
auto-scaling, elastic load balancing, scale up or
down based on need.
Flexibility: Maximum flexibility selecting
operating systems, compute (CPU),
programming languages, databases, and more
based on mission need.
Accessibility: Centralized, redundant, enterprise level holdings
in the cloud allows for “additional” more effective ways of
accessing extremely large datasets of new missions.
Cost-Effective
Scalable
Performance
Flexibility
Deployment
Speed
Accessibility
Deployment Speed: With an established cloud
platform development is faster, access; to
compute, storage, and IT Services requires only
funding. Significantly simplifying procurement and
provisioning.
26. 1
Decision Considerations
High level decision considerations for individual project prototypes and capabilities to
operationalize into AWS (commercial cloud)
01
02
03
04
Cost
IT Security
Performance
Operational
Is AWS (commercial cloud) affordable?
Is NASA IT Security compliance and tactical operations
achievable in AWS (commercial cloud)?
Is performance equal to or better than current on-
premises solutions?
Can we operate “Operationally” in AWS (commercial
cloud), technical and business?
29. What is Cumulus?
Lightweight cloud-native framework for data ingest, archive,
distribution and management
Goals
- Provide core DAAC functionality in a configurable manner
- Data acquisition
- Data ingest (Validation, Preprocessing)
- Metadata harvesting, creation, publication into the catalog
- Data archiving and distribution
- Metrics publication
- Enable DAACs to help each other with re-usable,
compatible containers (e.g. widely applicable GIS
components or sub-setters)
- Enable DAAC-specific customizations
35. NISAR Quick Facts
“The NASA-ISRO Synthetic Aperture
Radar (NISAR) mission is a joint project
between NASA and ISRO to co-develop and
launch a dual frequency synthetic aperture
radar satellite. The satellite will be the first
radar imaging satellite to use dual frequency
and it is planned to be used for remote
sensing to observe and understand natural
processes of the Earth.”
https://en.wikipedia.org/wiki/NISAR_(satellite)Key Scientific Objectives:
•Understand the response of ice sheets to climate
change and the interaction of sea ice and climate
•Understand the dynamics of carbon storage and
uptake in wooded, agricultural, wetland, and
permafrost systems
•Determine the likelihood of earthquakes,
volcanic eruptions, and landslides
Payload:
L-band (24-centimeter wavelength)
polarimetric SAR (NASA)
S-band (12-centimeter wavelength)
polarimetric SAR (ISRO)
Launch: 2021-ish from India
36. GRFN Project Overview
The GRFN project will aid in
preparing for the large data
volumes expected from the
NISAR mission using ESA’s
Sentinel-1 data as a NISAR
surrogate.
NISAR SDS
37. GRFN Technical Approach
● Leveraging AWS Cloud
● JPL SDS and ASF DAAC working collaboratively to
understand and react to NISAR impacts
● Investigating seamless data delivery, bulk
reprocessing scenarios, and on-demand
processing
● Engaging and encouraging science community
to take advantage of cloud-based compute
capabilities
● Engaging SWOT DAAC as appropriate to provide
lessons-learned and guidance
● Proving out use cases and cost models and
discovering pain points prior to launch
39. Summary
• EOSDIS has been operational for > 20 years
• In just the past 5 years, 14 new missions have been added
• Future missions (e.g., SWOT, NISAR) will generate
significantly greater data volumes, driving exploration of new
strategies for data processing, storage and distribution
• Prototypes to help guide decisions on future approaches –
involves significant collaboration with stakeholders
• Results are promising, but significant work remains, and
both technical and business operations issues are being
addressed