SlideShare a Scribd company logo
1 of 61
Download to read offline
Evolving NASA’s Data and Information
Systems for Earth Science
Dr. Rahul Ramachandran
Manager | Inter-Agency Implementation and Advanced Concepts Team (IMPACT)
Senior Research Scientist | Earth Science Branch (ST11)
Marshall Space Flight Center/NASA
Huntsville, Alabama 35812, USA
HPC User Forum, April 1-3 2019 Santa Fe, NM
Presentation Roadmap
• Overview
• What, Why and How?
• Data Discovery
• Metadata
• Data Use
• Cloud Infrastructure
• Analytics
*roadmap by Ben Davis from the Noun Project
Overview
Earth Science – NASA’s Strategic Goal
This ability to observe our planet comprehensively matters to
each of us, on a daily level. Earth
information—for use in Internet maps, daily weather
forecasts, land use planning, transportation
efficiency, and agricultural productivity, to name a few—is
central to our lives, providing substantial contributions to
our economies, our national security, and our personal
safety. It helps ensure we are a
thriving society. - NRC, 2018
NASA’s Strategic Goal 1.1: “Understand
The Sun, Earth, Solar System, And
Universe.”
Earth Science Data System Program
The Earth Science Data System Program is an essential component of the Earth
Science Division and is responsible for:
● Actively managing NASA’s Earth science data (Satellite, Airborne, and Field).
● Developing unique data system capabilities optimized to support rigorous science investigations and interdisciplinary
research.
● Processing (and reprocessing) instrument data to create high quality long-term Earth science data records.
● Upholding NASA’s policy of full and open sharing of all data, tools, and ancillary information for all users.
● Engaging members of the Earth science community in the evolution of data systems.
The Earth Science Data and Information System (ESDIS) project at GSFC maintains and operates a data and
information system for NASA's Science Mission Directorate (SMD) and its Earth Science Division (ESD) to
support multidisciplinary research in Earth science and public data access.
Users
EOSDIS
Earth Observing System Data and Information System (EOSDIS)
Applications
capture
and
clean
data
downlink
Education
process
archive
subset
distribute
Research
Earth Observing System Data and Information System (EOSDIS)
EOSDIS is managed by the Earth Science Data and Information System (ESDIS) Project at GSFC and includes the following
major core components:
*Red highlight indicates EOSDIS boundary.
• Perform forward processing of standard data products and
reprocess data to incorporate algorithm improvements
Science Investigator-led Processing
Systems (SIPS)
• Co-located with centers of science discipline expertise;
archive and distribute standard data products produced by
the SIPS and others
Distributed Active Archive Centers
(DAACs)
• Allows users to search, discover, visualize, refine, and access
NASA Earth Observation data. Includes networking and
security
Earthdata and Core Services
SIPS DAAC
ASF DAAC
SAR Products, Sea Ice,
Polar Processes
PO.DAAC
Ocean Circulation
Air-Sea Interactions
NSIDC DAAC
Cryosphere, Polar
Processes
LPDAAC
Land Processes and
Features
GHRC
Hydrological Cycle and
Severe Weather
ASDC
Radiation Budget,
Clouds, Aerosols, Tropo
Composition
LAADS/MODAPS
Atmosphere
OB.DAAC
Ocean Biology and
Biogeochemistry
SEDAC
Human Interactions in
Global ChangeCDDIS
Crustal Dynamics
Solid Earth
NCAR, U. of
Co.
MOPITTJPL
MLS, TES, SNPP
Sounder
U. of Wisc.
SNPP
Atmosphere
GHRC
AMSR-U,
LIS
GSFC
SNPP, MODIS,
OMI, OBPG
ORNL
Biogeochemical
Dynamics
Data are Produced and Managed by
Science Discipline Experts
Extensive Data Collection
Started in the 1990s, EOSDIS today has 11,000+ data types (collections)
• Cover & Usage
• Surface temperature
• Soil moisture
• Surface topography
Land
• Surface temperature
• Surface wind fields &
heat flux
• Surface topography
• Ocean color
Ocean
• Winds & Precipitation
• Aerosols & Clouds
• Temperature & Humidity
• Solar radiation
Atmosphere
• Population & Land Use
• Human & Environmental
Health
• Ecosystems
Human
Dimensions
• Sea/Land Ice & Snow
Cover
Cryosphere
Data Sources
Type Example Missions
Satellite/on-orbit Missions
Terra, Aqua, Aura, Suomi-NPP, SORCE, GPM, GRACE, CloudSat,
CALIPSO, etc.
Airborne Missions IceBridge, Earth Ventures (5+ missions), UAVSAR, etc.
In Situ Measurement Missions
Field campaigns on land (e.g., LBA-ECO) and in the ocean (e.g.,
SPURS)
Applications support
Near-real time creation and distribution of selected products for
applications communities
Earth Science Research support
Research products from efforts like MEaSUREs. This also includes
data from older, heritage missions (prior to EOS Program) that the
DAACs rescue – e.g., Nimbus, SeaSat
NASA Earth Science Data and Information Policy
In effect since the early 1990s, full and open sharing of data from satellites, sub-orbital platforms and field
campaigns with all users as soon as such data become available.
• No period of exclusive access. Following a post-launch checkout period, all data will be made available
to the user community. Any variation in access will result solely from user capability, equipment, and
connectivity.
• Make available all NASA-generated standard products along with the source code for algorithm
software, coefficients, and ancillary data used to generate these products.
• Non-discriminatory data access so that all users will be treated equally.
• Ensure that all data required for Earth system science research are archived. Include easily accessible
information about data holdings - quality assessments, supporting relevant information, and guidance for
locating and obtaining data.
• Interagency cooperation - sharing of data from satellites and other sources, mutual validation and
calibration data, and consolidation of duplicative capabilities and functions.
• Collect metrics to assess efficacy of data systems and services, and assess user satisfaction.
Science Investigator-led Processing Systems (SIPS) Requirements
• Perform forward processing and produce standard data products from mission instrument data; includes:
• Generation of Level 1, Level 2 and/or Level 3 granules
• Associated granule-level metadata
• Associated browse products (as appropriate)
• Deliver all standard data products to the assigned DAAC so they can be available within 48 hours of the
observation.
• Produce or enable production of near-real time data products to meet delivery (latency) goals.
• Reprocess standard data products to reflect algorithm improvements and to ensure consistent time series.
• Deliver documentation to the assigned DAAC for distribution.
• Assist the DAAC with information related to scientific content, format and product generation history of SIPS
products.
• Deliver each version of data production source code used in production of the Standard Products.
• Participate with DAAC in collection of items for long-term preservation
Role of DAACs
DAACs were selected and established based on the Earth Science discipline expertise and heritage of their host
organizations.
• Provide unique support and expert services to their user communities
• Provide data and services to the research community for comprehensive, cross-discipline studies needed to understand Earth as an
interrelated system
• Ensure safe stewardship of NASA’s data
DAAC
FunctionsIngest Level 0 data
from EDOS
Ingest higher level
products produced
by SIPS
Perform processing
Archive assigned
data sets of higher
level products in
some cases
Archive assigned
data sets
Export metadata to
the Common
Metadata Repository
(CMR) Provide user
interfaces, tools, and
services
Distribute data to
users (primarily
electronically)
Provide metrics data
to ESDIS Metrics
System (EMS)
Procedures for Archiving
No matter what type of data (on orbit, aircraft, in situ, etc.) DAAC Staff follow these procedures for
handling datasets
Planning • Collaborate with mission teams, data producers, and ESDIS to develop Interproject Agreements
(IPAs), Interface Control Documents (ICDs), and Operations Agreements (OAs)
• Support data producers with the creation of Data Management Plans (DMPs)
Acquire and/or
Produce Data
• Advise data producers on data formats, structure, and delivery methods
• Establish automated processes for the transfer of data into DAAC data systems
• Develop or integrate, test, verify, and run data production code (for applicable data)
Preserve Data • Ensure redundant online disk and tape data storage
• Establish file management (e.g. duplicate file detection) and file integrity (e.g. checksum
verification)
Describe Data • Create collection-level metadata and, when necessary, employ software to extract file-level
metadata for the purposes of preservation, discovery and usage
• Export metadata to NASA’s CMR for inter-mission and inter-sensor data discovery
• Develop user documentation and supplemental information
• Create DOIs and data citations for proper attribution of data and data creators
Distribute Data • Provide discovery through the DAAC Web site, with direct HTTPS access
• Support automated data transfer to users through subscriptions and APIs
• Facilitate search, visualization, and customization through NASA Earthdata Search
• Develop specialized portals and data services per mission and user needs
Support Data • Assist user communities with the selection and usage of data
• Create “How To” guides, FAQs, and other resources to address specific user needs
• Work with user communities to identify needed improvements for data and tools
• Provide outreach and education to broaden the user community
Levels of Service
To be cost efficient, not every dataset gets the same level of service
Preservation involves ensuring long-term protection of…
PRESERVE
Bits
Understandability
Usability
Reproducibility of
Results
Readability
Discoverability and
Accessibility
NASA’s Preservation Content Specification for Earth Science Data
1. Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre-
operations performance measurements; calibration method; radiometric and spectral
response; noise characteristics; detector offsets
2. Science Data Products: Raw instrument data, Level 0 through Level 4 data products and
associated metadata
3. Science Data Product Documentation: Structure and format with definitions of all
parameters and metadata fields; algorithm theoretical basis; processing history and product
version history; quality assessment information
4. Mission Data Calibration: Instrument/sensor calibration method (in operation) and data;
calibration software used to generate lookup tables; instrument and platform events and
maneuvers
5. Science Data Product Software: Product generation software and software documentation
6. Science Data Product Algorithm Input: Any ancillary data or other data sets used in
generation or calibration of the data or derived product; ancillary data description and
documentation
7. Science Data Product Validation: Records, publications and data sets
8. Science Data Software Tools: product access (reader) tools.
ESDIS Project has
supported this as an
addition to the ISO
– 19165 –
“Geographic
Information -
Preservation of
digital data and
metadata” We
submitted 19165-2
as specific to Earth
Observation Data
- Open APIs
- Free Data Download
- DAAC specific tools
Earth Science Data Holdings
- CMR: Metadata Catalog (Search
Engine)*
- User Login
- GIBS: Global Imagery Browse
Services*
Open Service APIs
- Earthdata.nasa.gov
- Earthdata Search: data
access/discovery*
- Worldview: imagery*
End User Web Clients
Metrics!
Federated resources
EOSDIS core tools
*open source software
all by Dinosoft Labes from thenounproject.com (CC 3.0)
Core Services
Common Metadata Repository
Provides a single source of unified, high-quality, and reliable Earth Science metadata with a high performance ingest and
search architecture for submission and discovery of all EOSDIS data sets.
Lightning fast, always available
- 95% queries complete in <1s
- 99.98% uptime (last 365d)
Big Data Ready
- 34K collections
- 367 million files indexed
- Prepared to scale 1B+ records
Standards-focused
- ISO-19115 metadata
- OpenSearch/OGC CSW
- REST based APIs
Community-focused
- Developer’s portal
- Active Developer’s forum
- Ecosystem of supported tools
- Open Source codebase in github
Internationally Recognized
- Provides the backbone of the Community of
Earth Observing Satellites International
Directory Network (CEOS IDN)
Data Discovery and Access
• Earthdata Search facilitates discovery of data across all 12 DAACs
(search.earthdata.nasa.gov)
• Data is online and can be readily downloaded
Full-Resolution Browse Capabilities
• Global Imagery Browse
System (GIBS) aids in
discovery and access
• GIBS software is open
source; anyone can
develop clients;
Worldview is NASA’s
client
Near-Real Time Data Support
• More than 380 unique datasets available within 3 hours of observation to serve a growing
applications community
• In FY17, EOSDIS distributed over 1 Petabyte of data (123+ million files) to over 370,000 users
AIRS/AMSU-A,
27,765.25 , 3%
AMSR2, 850.53 ,
0%
MISR,…
MLS, 15,588.87
MODIS - Aqua,
395,928.65
MODIS - Terra,
455,312.83 , 50%
OMI, 2,769.82
VIIRS, 11,633.67
Other1, 121.26
Distribution Volume (GBs) of NRT Products
AIRS/AMSU-A,
3,027,266 , 3%
AMSR2, 485,070
, 0%
MISR, 38,048 ,
0%
MLS, 61,466,749
, 50%
MODIS - Aqua,
32,416,963 , 26%
MODIS - Terra,
21,712,972 , 18%
OMI, 117,362 ,
0%
VIIRS,
3,598,043 ,
3%
Other1, 317,446
, 0%
Distribution of NRT Products
0.00
2.00
4.00
6.00
8.00
10.00
07/08/18
07/09/18
07/10/18
07/11/18
07/12/18
07/13/18
07/14/18
07/15/18
07/16/18
07/17/18
07/18/18
07/19/18
07/20/18
07/21/18
07/22/18
07/23/18
07/24/18
07/25/18
07/26/18
07/27/18
07/28/18
07/29/18
07/30/18
07/31/18
08/01/18
08/02/18
08/03/18
08/04/18
Latency(Hours)
Date
AIRS-AQUA AMSR2-GCOM-W1 MISR-TERRA MLS-AURA MODIS-AQUA MODIS-TERRA
MOPITT-TERRA OMI-AURA OMPS-SNPP VIIRS-JPSS1 VIIRS-SNPP Latency Requirement
Four Week LANCE-Wide Latency and Distribution Trend for Level 0, 1, & 2 Products
(8 July - 4 August, 2018)
LANCE
Instrument-Mission
4-Wk Curr-Wk
AIRS-AQUA 91 91
AMSR2-GCOM-W1 71 70
MISR-TERRA 66 68
MLS-AURA 65 87
MODIS-AQUA 91 93
MODIS-TERRA 64 66
MOPITT-TERRA 48 50
OMI-AURA 75 75
OMPS-SNPP 136 136
VIIRS-JPSS1 131 126
VIIRS-SNPP 166 166
Average
Latency (Min)
LANCE
Instrument-Mission
Distribution
Vol (GB)
# of Files
Distributed
4-Wk 4-Wk
AIRS-AQUA 1,317.99 132,608
AMSR2-GCOM-W1 154.37 30,782
MISR-TERRA 0.27 3,251
MLS-AURA 234.50 922,200
MODIS-AQUA 35,599.97 1,983,230
MODIS-TERRA 26,782.46 1,343,567
MOPITT-TERRA 0.00 0
OMI-AURA 119.62 7,703
OMPS-SNPP 0.48 394
VIIRS-JPSS1 29.94 13,750
VIIRS-SNPP 2,206.89 536,919
Data Analysis Tools
Users can discover, analyze and
visualize hundreds of products using
technique customized for their science
discipline
Land Processes DAAC (USGS) - AppEEARS
Goddard DAAC (GES DISC) - GIOVANNI
User Support and Services
• Each DAAC has a user
services group to address
users’ questions about data
• Frequently Asked Questions
(FAQ) Pages
• User Forums for information
exchange
• Webinars and tutorials
• Data recipes
• Specific documentation
American Customer Satisfaction Index
(ACSI) survey scoring 79 from over
4,000 respondents
EOSDIS delivered over
1.6 Billion data products
to over 4.1 Million
users from around the world
Easy access and discovery of data to
over
12,500 unique data products
33,000 Data
Collections
in the Common Metadata
Repository (CMR)
… of which 95% of granule searches
complete in less than 1 Second
Over 330,000 users
have registered with
EOSDIS to date
NASA’s Earth Science Data System in 2018
EOSDIS currently has over 27
Petabytes
of accessible Earth science
data
EOSDIS also delivers near-
real-time products in under
3 hours
from observation …
And Over 380
Million
data granules
https://earthdata.nasa.gov/about/system-performance
Data Discovery
Metadata
User Growth
New, easy to use software, tools, services and data formats have
exposed EO data to an ever growing user base.
Includes 2 user types:
• Very knowledgeable about the specific scientific context within which data were collected
• Don’t require as much contextual information to find and use relevant data
• Includes:
• Domain specific research scientists
• Principal investigators who originally collected the data
Local Users
• Leverage data for research and applications beyond the data’s original intended use
• Includes:
• Scientists conducting research across siloed domain environments
• Users from the applications and decision making communities
• Data scientists using data in innovative new ways
Global Users
Where Do Data and Users Come Together?
• For local users -> local data centers
• For global users -> Centralized, or
aggregated catalogs
• Provides a single discovery point for data
from multiple sources
• Brings together metadata from different
data centers into an aggregated catalog
and presents the metadata in a unified user
interface
• NASA’s aggregated catalog for Earth
observation data is the Common
Metadata Repository (CMR) and the
unified user interface is the Earthdata
Search client.
Metadata In Aggregated Catalogs
Metadata sets the stage for data -
• Metadata makes it possible to search for data
• Metadata limits and focuses attention to the relevant information about a dataset
• Metadata helps a user understand whether data is relevant to a given research problem
When metadata isn’t at its best, users can’t –
• Find the right data
• Understand the data
1. Tag by Guilhem from the Noun Project
2. Big data by ProSymbols from the Noun Project
3. users by Marksu Desu from the Noun Project
Image Credit: https://earthobservatory.nasa.gov/images/696/spring-vegetation-in-
north-america
When Metadata Doesn’t Work…
• Conducting a faceted search for ‘NDVI’ in
Earthdata Search returns 14 datasets
• NDVI, or the Normalized Difference
Vegetation Index, is key for many
applications based research questions
• MODIS datasets are missing from the search
results:
• MODIS is a key instrument for calculating
NDVI
• MODIS Level 3 vegetation indices datasets
which include NDVI as a vegetation layer are
missing
How Can We Assess Metadata Quality?
• Metadata needs to be informative to both subject matter experts and applications
based, or global, users
• High quality metadata helps both user groups
• However, creating and maintaining high quality metadata can cause metadata
friction for data centers
• NASA has established the Analysis and Review of CMR (ARC) team to define and
assess metadata quality for Earth observation data and to lower metadata friction
for data centers by:
• Creating a metadata quality framework to assess metadata quality consistently and rigorously
• Leveraging automated and manual checks to assess this quality
• Building a team of reviewers with backgrounds in Earth system science, Atmospheric science,
remote sensing and informatics
• Defining a priority matrix to help lower metadata friction for data centers
ARC Metadata Quality Review Process
• Leverages
metadata curation
framework and
ARC priority matrix
• Process followed
for each collection
level metadata
record and one
randomly selected
granule/file level
record
ARC Priority Matrix
Priority Categorization Justification
Red = High Priority
Issues
High priority issues emphasize several characteristics of metadata quality
including completeness, accuracy and accessibility.
Issues flagged as red are required to be addressed by the data provider.
Yellow = Medium
Priority Issues
Medium priority issues emphasize consistency and completeness.
Data providers are strongly encouraged to address yellow flagged issues.
If a yellow flagged issue is not addressed, the data provider will be
asked to provide a justification as to why.
Blue = Low Priority
Issues
Low priority issues also focus on completeness, consistency and accuracy.
Any additional information that may be provided to make the metadata
more robust or complete is categorized as blue.
Green = No Issue Elements flagged green are free of issues. Green flagged elements
require no action on behalf of the data provider.
Top Metadata Quality Issues
• Broken URLs:
• Data access URLs that
do not conform to NASA
requirements (ftp vs
https)
• No URLs to essential
data documentation
• No data access URLs
provided at all
Top Metadata Issues
• DOIs and Collection
State are new concepts
that were recently
added
• Slow adoption of new
concepts by data centers
despite creating DOIs
for data
Top Metadata Issues
• Data format information
not widely adopted by
data centers
• Not viewed as an
information priority in
the past but important to
users
Top Metadata Issues
• Abstracts are
particularly problematic
o Too lengthy
o Non-existent
o Not specific enough to
describe data
o Too technical for a
global user
Metadata Improvement To Date
Sample of improved metadata quality to date (3 of 12 data centers)
All data centers are actively participating in the improvement effort
Lessons Learned
1. Leveraging a metadata quality framework
operationally requires communication, compromise
and reiteration
2. The metadata curation process is not a “Do-it-
once-right-and-forget-about-it” activity but should
instead be viewed as an iterative process
3. Curating metadata within an aggregated catalog
may require an organizational mindset change
• Need to consider not just local users needs and
local metadata needs when curating metadataLearning by Becris from the Noun Project
Data Use
Cloud Infrastructure
Analytics
Current EOSDIS Architecture
Open data policies drive system architecture
Discipline specific support and tools (DAAC data)
Optimized for archive, search and distribution
Easily add new data products
Supports millions of users – High ACSI score
Uneven service and performance
Significant interface coordination
Limited on-demand product generation
Fragmentation – duplication of services, software and
storage
(Pre)Formulation
Implementation
Primary Ops
Extended Ops
ISS Instruments
LIS (2020), SAGE III (2020)
TSIS-1 (2018), OCO-3 (2019),
ECOSTRESS (2019), GEDI (2020)
CLARREO-PF (2020), EMIT (TBD)
InVEST/CubeSats
RAVAN (2016)
RainCube (2018)
TEMPEST-D (2018)
CubeRRT (2018)
CSIM (2018)
CIRiS (2019)
HARP (2019)
SNoOPI*
HyTI*
CTIM*
TACOS*
* Launch date TBD
NASA Earth Science
Missions: Present through 2023
Landsat 9 (2020)
PACE (2022)
NI-SAR (2021)
SWOT (2021)
TEMPO (2018)
GRACE-FO (2) (2023)
ICESat-2 (2021)
CYGNSS (8) (2019)
NISTAR, EPIC
(DSCOVR / NOAA)
(2019)
Landsat 7
(USGS)
(~2022)
Terra (>2021)
Aqua (>2022)
CloudSat (~2018)
CALIPSO (>2022)
Aura (>2022)
SMAP
(>2022)
Suomi NPP
(NOAA)
(>2022)
Landsat 8
(USGS)
(>2022)
GPM (>2022)
OCO-2 (>2022)
Sentinel-6A/B (2020, 2025)
MAIA (~2021)
GeoCARB (~2021)
TROPICS (12) (~2021)
SORCE,
TCTE (NOAA)
(2017)
OSTM/Jason-2 (NOAA)
(>2022)
JPSS-2 Instruments
OMPS-Limb (2019)
02.22.19
PREFIRE (2) (TBD)
EOSDIS Data System Evolution
Petabytes
Cloud offers benefits like the ability to analyze data at scale, analyze multiple data sets together
easily and avoid lengthy expensive moves of large data sets allowing scientists to work on data “in
place”
The current architecture
will not be cost effective as
the annual ingest rate
increases from 4 to
50PB/year
EOSDIS is developing open
source cloud native
software for reuse across
the agency, throughout the
government and for any
other user.
Conceptual Cloud Architecture - 2021
Open Science = Open Data + Open Source Software + Open Services
Discipline specific support and tools (All data)
Infrastructure specialization by DAACs
Processing next to data for anyone
Optimized for multidisciplinary research
Clear integration path for ACCESS, AIST…
Supports external distribution
Develop coordination and documentation
Cost management
Security, ITAR, SBU, EAR99
Business processes and skillsets
Vendor lock-in
Conceptual ‘data close to compute’
The operational model of consolidating data—allowing users to compute
on the data in place with a platform of common tools—is natural to cloud;
it is a cost-effective way to leverage cloud and could be applicable to
many businesses and missions
Large volume data storage: Centralized mission observation
and model datasets stored in auto graduated AWS object
storage (Amazon S3, Amazon S3 IA, Amazon Glacier)
Scalable compute: Provision, access, and terminate
dynamically based on need. Cost by use
Cloud Native Compute: Cloud vendor service software stacks
and microservices easing deployment of user based
applications
EOSDIS applications and services: Application and service
layer using AWS compute, storage (Amazon S3, Amazon S3
IA, Amazon Glacier), and cloud native technologies
Non-EOSDIS/public applications and services: Science
community brings algorithms to the data. Support for NASA
and non-NASA
Bring customers to the data
So we made this thing.
Lightweight, cloud-native framework for data ingest, archive, distribution and
management
Goals
- Provide core DAAC functionality in a configurable manner
- Enable DAACs to help each other with re-usable, compatible containers
(e.g. data retrieval, metadata extraction, metrics delivery)
- Enable DAAC-specific customizations
What is Cumulus?
A lightweight framework consisting of:
Tasks a discrete action in a workflow, invoked as a
Lambda function or EC2 service, common protocol supports
chaining
Orchestration engine (AWS Step Functions) that controls
invocation of tasks in a workflow
Database store status, logs, and other system state
information
Workflows(s) file(s) that define the ingest, processing,
publication, and archive operations (json)
Dashboard create and execute workflows, monitor system
Cumulus Major System Components
Workshop- “Enabling Analytics in the Cloud for
Earth Science Data”
• The NASA Earth Science Data Systems Program sponsored a workshop
on 21-23 February 2018 where the participants discussed the
convergence of Big Data, Cloud Computing, and Analytics.
• These discussions clustered around these themes
• Strategic Alignment of Cloud Effort
• Reference Architecture for Cloud Analytics
• Analytics Optimized Data Stores
• Reuse of Cloud Analytics Related Services
• Wider Deep Learning Adoption
Analytics-Optimized Data
• Identified “Analytics Optimized Data Stores” (AODS) as a solution to Big Data Volume and
Variety challenges
• AODS are data stored to minimize the need for data-wrangling for a large user community
which enable fast access
• AODS are optimized, cost-effective storage structures enabling iterative queries relevant to
users
• Building AODS to enable new analytic tools and services is viewed as a best path forward
Conceptual Architecture for Supporting Analytics
Multi-Mission Algorithm and Analysis Platform (MAAP)
Outcomes
• Enable researchers to easily discover, process, visualize, and analyze large volumes of data from
both ESA and NASA
• Provide tools and the supporting infrastructure needed to enable data comparison, analysis,
evaluation, and generation
Deliverables
• Establish NASA MAAP data infrastructure including the use of CMR, MMT and Cumulus Workflow
• Ingest high priority datasets for pilot MAAP and curate metadata to high quality standards
Vision: The goal of the MAAP is to establish
a collaboration framework between ESA
and NASA to share data, science algorithms
and compute resources in order to support
scientific research conducted by NASA and
ESA scientists.
MAAP Layered Architecture (Pilot)
Data Expedition: Exploring data for Gap Winds
• Tehuantepecer - gap
wind triggered by a
synoptic scale high
pressure system over the
Great Plains of North
America that pushes air
through a narrow Sierra
Madre mountain range
gap
CCMP data coverage of three Central American sites: (A) Tehuantepec,
Mexico, (B) Payagayo, Costa Rico and (C) Panama, for March 27, 2008
• Strong gap winds produce associated cold water ocean regions
by triggering intense vertical mixing of the ocean, a phenomenon
referred to as ocean upwelling.
Exploring data for Gap Winds
Seasonal
distribution
of these
events
Exploring data for Gap Winds
Heat map
for the month
showing Gap
Winds
Three major
events can be
seen for the
month
Exploring data for Gap Winds
Verify results
by looking at
browse images
for individual
days
Thank you!
Kevin Murphy ESDS Program PE, NASA HQ
Andy Mitchell ESDIS Project Manager, NASA GSFC
IMPACT Team, NASA MSFC
Data Access Centralized Reusable Capabilities
• Earthdata: The EOSDIS website https://earthdata.nasa.gov will increase visibility to the
interdisciplinary use of data and demonstrate how data are used.
• High Performance Data Search and Discovery
• Common Metadata Repository (CMR): Provide sub-second search and discovery services
across the Sentinel and other EOSDIS holdings.
• Earthdata Search Client: Data search and order tool https://search.earthdata.nasa.gov
• Imagery and Data Visualization Tools
• Global Imagery Browse Services (GIBS): full resolution imagery in a community
standards-based set of imagery services
• Worldview: highly responsive interface to explore GIBS imagery and download the
underlying data granules https://earthdata.nasa.gov/labs/worldview/
• Giovanni: Quick-start exploratory data visualization and analysis tool
• Near Real-Time Capabilities: Provided by “LANCE” (Land Atmosphere Near real-time
Capability for EOS) which produces products within < 3 hours of observation. Near real-time
capabilities are co-located with the standard science production facilities.
• EOSDIS Metrics System (EMS): collects and reports on data ingest, archive, and distribution
metrics across EOSDIS
• Earthdata Infrastructure (EDI DevOps): platform for requirement management, code
development, testing and deployment to operations
• User Support Tool (UST): user relationship management and issue resolution (Kayako)
• Earthdata Log-in (User Registration System): provides a centralized and simplified mechanism
for user registration and account management for all EOSDIS system components.
Digital Object Identifiers/Citations
• Authors using data in
publications should cite and
reference the datasets
properly
• DOIs are assigned to
EOSDIS datasets by the
ESDIS Project
• Associated with each DOI is
a common landing page
that provides details about
datasets and shows how to
cite them
0
200
400
600
800
1000
1200
NumberofDOIs
Data Distributor
Reserved: Created by ESDIS but not registered with the EZID
Registered: Created by ESDIS and also registered with the EZID
Total DOIs Registered
~10,000
Status of Digital Object Identifiers created with the ESDIS
by Data Providers as of July 31, 2018

More Related Content

What's hot

dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...
dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...
dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...dkNET
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET
 
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...African Open Science Platform
 
Facing data sharing in a heterogeneous research community: lights and shadows...
Facing data sharing in a heterogeneous research community: lights and shadows...Facing data sharing in a heterogeneous research community: lights and shadows...
Facing data sharing in a heterogeneous research community: lights and shadows...Research Data Alliance
 

What's hot (8)

Glasgow University Geo Metadata Workshop
Glasgow University Geo Metadata WorkshopGlasgow University Geo Metadata Workshop
Glasgow University Geo Metadata Workshop
 
Geospatial Metadata Workshop
Geospatial Metadata WorkshopGeospatial Metadata Workshop
Geospatial Metadata Workshop
 
dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...
dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...
dkNET Webinar "The Stimulating Peripheral Activity To Relieve Conditions (SPA...
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
dkNET Webinar "Pancreatlas™: Mapping the Human Pancreas in Health and Disease...
 
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...
The African Open Science Platform: Policy | Infrastructure | Skills | Incenti...
 
Facing data sharing in a heterogeneous research community: lights and shadows...
Facing data sharing in a heterogeneous research community: lights and shadows...Facing data sharing in a heterogeneous research community: lights and shadows...
Facing data sharing in a heterogeneous research community: lights and shadows...
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 

Similar to Evolving NASA’s Data and Information Systems for Earth Science

IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...inside-BigData.com
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014iedadata
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
AusCover portal presentation
AusCover portal presentationAusCover portal presentation
AusCover portal presentationTERN Australia
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science CloudTERN Australia
 
AusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesAusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesTERN Australia
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsARDC
 
The Role of Librarians in transforming the world through Open Data and Open S...
The Role of Librarians in transforming the world through Open Data and Open S...The Role of Librarians in transforming the world through Open Data and Open S...
The Role of Librarians in transforming the world through Open Data and Open S...African Open Science Platform
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalHostedbyConfluent
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewPlanet OS
 
Towards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide SensorsTowards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide SensorsCybera Inc.
 
RDC - Benoit Pierenne: Data Interoperability I
RDC - Benoit Pierenne: Data Interoperability IRDC - Benoit Pierenne: Data Interoperability I
RDC - Benoit Pierenne: Data Interoperability ICASRAI
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014aceas13tern
 

Similar to Evolving NASA’s Data and Information Systems for Earth Science (20)

IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
Evolving Storage and Cyber Infrastructure at the NASA Center for Climate Simu...
 
IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014IEDA Overview & Updates, March 2014
IEDA Overview & Updates, March 2014
 
EPOS GNSS Data and Products TCS - What we do...
EPOS GNSS Data and Products TCS - What we do...EPOS GNSS Data and Products TCS - What we do...
EPOS GNSS Data and Products TCS - What we do...
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
AusCover portal presentation
AusCover portal presentationAusCover portal presentation
AusCover portal presentation
 
Australian Ecosystems Science Cloud
Australian Ecosystems Science CloudAustralian Ecosystems Science Cloud
Australian Ecosystems Science Cloud
 
Baker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated AudiencesBaker - Evolution of Data Products and Designated Audiences
Baker - Evolution of Data Products and Designated Audiences
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
 
AusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data CubesAusCover Earth Observation Services and Data Cubes
AusCover Earth Observation Services and Data Cubes
 
RFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status UpdateRFCs for HDF5 and HDF-EOS5 Status Update
RFCs for HDF5 and HDF-EOS5 Status Update
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
The Role of Librarians in transforming the world through Open Data and Open S...
The Role of Librarians in transforming the world through Open Data and Open S...The Role of Librarians in transforming the world through Open Data and Open S...
The Role of Librarians in transforming the world through Open Data and Open S...
 
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC FederalKafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
Kafka Migration for Satellite Event Streaming Data | Eric Velte, ASRC Federal
 
NASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) OverviewNASA Earth Exchange (NEX) Overview
NASA Earth Exchange (NEX) Overview
 
Towards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide SensorsTowards the Wikipedia of World Wide Sensors
Towards the Wikipedia of World Wide Sensors
 
RDC - Benoit Pierenne: Data Interoperability I
RDC - Benoit Pierenne: Data Interoperability IRDC - Benoit Pierenne: Data Interoperability I
RDC - Benoit Pierenne: Data Interoperability I
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 
SMRUDAS
SMRUDAS SMRUDAS
SMRUDAS
 

More from inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Evolving NASA’s Data and Information Systems for Earth Science

  • 1. Evolving NASA’s Data and Information Systems for Earth Science Dr. Rahul Ramachandran Manager | Inter-Agency Implementation and Advanced Concepts Team (IMPACT) Senior Research Scientist | Earth Science Branch (ST11) Marshall Space Flight Center/NASA Huntsville, Alabama 35812, USA HPC User Forum, April 1-3 2019 Santa Fe, NM
  • 2. Presentation Roadmap • Overview • What, Why and How? • Data Discovery • Metadata • Data Use • Cloud Infrastructure • Analytics *roadmap by Ben Davis from the Noun Project
  • 4. Earth Science – NASA’s Strategic Goal This ability to observe our planet comprehensively matters to each of us, on a daily level. Earth information—for use in Internet maps, daily weather forecasts, land use planning, transportation efficiency, and agricultural productivity, to name a few—is central to our lives, providing substantial contributions to our economies, our national security, and our personal safety. It helps ensure we are a thriving society. - NRC, 2018 NASA’s Strategic Goal 1.1: “Understand The Sun, Earth, Solar System, And Universe.”
  • 5. Earth Science Data System Program The Earth Science Data System Program is an essential component of the Earth Science Division and is responsible for: ● Actively managing NASA’s Earth science data (Satellite, Airborne, and Field). ● Developing unique data system capabilities optimized to support rigorous science investigations and interdisciplinary research. ● Processing (and reprocessing) instrument data to create high quality long-term Earth science data records. ● Upholding NASA’s policy of full and open sharing of all data, tools, and ancillary information for all users. ● Engaging members of the Earth science community in the evolution of data systems. The Earth Science Data and Information System (ESDIS) project at GSFC maintains and operates a data and information system for NASA's Science Mission Directorate (SMD) and its Earth Science Division (ESD) to support multidisciplinary research in Earth science and public data access.
  • 6. Users EOSDIS Earth Observing System Data and Information System (EOSDIS) Applications capture and clean data downlink Education process archive subset distribute Research
  • 7. Earth Observing System Data and Information System (EOSDIS) EOSDIS is managed by the Earth Science Data and Information System (ESDIS) Project at GSFC and includes the following major core components: *Red highlight indicates EOSDIS boundary. • Perform forward processing of standard data products and reprocess data to incorporate algorithm improvements Science Investigator-led Processing Systems (SIPS) • Co-located with centers of science discipline expertise; archive and distribute standard data products produced by the SIPS and others Distributed Active Archive Centers (DAACs) • Allows users to search, discover, visualize, refine, and access NASA Earth Observation data. Includes networking and security Earthdata and Core Services
  • 8. SIPS DAAC ASF DAAC SAR Products, Sea Ice, Polar Processes PO.DAAC Ocean Circulation Air-Sea Interactions NSIDC DAAC Cryosphere, Polar Processes LPDAAC Land Processes and Features GHRC Hydrological Cycle and Severe Weather ASDC Radiation Budget, Clouds, Aerosols, Tropo Composition LAADS/MODAPS Atmosphere OB.DAAC Ocean Biology and Biogeochemistry SEDAC Human Interactions in Global ChangeCDDIS Crustal Dynamics Solid Earth NCAR, U. of Co. MOPITTJPL MLS, TES, SNPP Sounder U. of Wisc. SNPP Atmosphere GHRC AMSR-U, LIS GSFC SNPP, MODIS, OMI, OBPG ORNL Biogeochemical Dynamics Data are Produced and Managed by Science Discipline Experts
  • 9. Extensive Data Collection Started in the 1990s, EOSDIS today has 11,000+ data types (collections) • Cover & Usage • Surface temperature • Soil moisture • Surface topography Land • Surface temperature • Surface wind fields & heat flux • Surface topography • Ocean color Ocean • Winds & Precipitation • Aerosols & Clouds • Temperature & Humidity • Solar radiation Atmosphere • Population & Land Use • Human & Environmental Health • Ecosystems Human Dimensions • Sea/Land Ice & Snow Cover Cryosphere
  • 10. Data Sources Type Example Missions Satellite/on-orbit Missions Terra, Aqua, Aura, Suomi-NPP, SORCE, GPM, GRACE, CloudSat, CALIPSO, etc. Airborne Missions IceBridge, Earth Ventures (5+ missions), UAVSAR, etc. In Situ Measurement Missions Field campaigns on land (e.g., LBA-ECO) and in the ocean (e.g., SPURS) Applications support Near-real time creation and distribution of selected products for applications communities Earth Science Research support Research products from efforts like MEaSUREs. This also includes data from older, heritage missions (prior to EOS Program) that the DAACs rescue – e.g., Nimbus, SeaSat
  • 11. NASA Earth Science Data and Information Policy In effect since the early 1990s, full and open sharing of data from satellites, sub-orbital platforms and field campaigns with all users as soon as such data become available. • No period of exclusive access. Following a post-launch checkout period, all data will be made available to the user community. Any variation in access will result solely from user capability, equipment, and connectivity. • Make available all NASA-generated standard products along with the source code for algorithm software, coefficients, and ancillary data used to generate these products. • Non-discriminatory data access so that all users will be treated equally. • Ensure that all data required for Earth system science research are archived. Include easily accessible information about data holdings - quality assessments, supporting relevant information, and guidance for locating and obtaining data. • Interagency cooperation - sharing of data from satellites and other sources, mutual validation and calibration data, and consolidation of duplicative capabilities and functions. • Collect metrics to assess efficacy of data systems and services, and assess user satisfaction.
  • 12. Science Investigator-led Processing Systems (SIPS) Requirements • Perform forward processing and produce standard data products from mission instrument data; includes: • Generation of Level 1, Level 2 and/or Level 3 granules • Associated granule-level metadata • Associated browse products (as appropriate) • Deliver all standard data products to the assigned DAAC so they can be available within 48 hours of the observation. • Produce or enable production of near-real time data products to meet delivery (latency) goals. • Reprocess standard data products to reflect algorithm improvements and to ensure consistent time series. • Deliver documentation to the assigned DAAC for distribution. • Assist the DAAC with information related to scientific content, format and product generation history of SIPS products. • Deliver each version of data production source code used in production of the Standard Products. • Participate with DAAC in collection of items for long-term preservation
  • 13. Role of DAACs DAACs were selected and established based on the Earth Science discipline expertise and heritage of their host organizations. • Provide unique support and expert services to their user communities • Provide data and services to the research community for comprehensive, cross-discipline studies needed to understand Earth as an interrelated system • Ensure safe stewardship of NASA’s data DAAC FunctionsIngest Level 0 data from EDOS Ingest higher level products produced by SIPS Perform processing Archive assigned data sets of higher level products in some cases Archive assigned data sets Export metadata to the Common Metadata Repository (CMR) Provide user interfaces, tools, and services Distribute data to users (primarily electronically) Provide metrics data to ESDIS Metrics System (EMS)
  • 14. Procedures for Archiving No matter what type of data (on orbit, aircraft, in situ, etc.) DAAC Staff follow these procedures for handling datasets Planning • Collaborate with mission teams, data producers, and ESDIS to develop Interproject Agreements (IPAs), Interface Control Documents (ICDs), and Operations Agreements (OAs) • Support data producers with the creation of Data Management Plans (DMPs) Acquire and/or Produce Data • Advise data producers on data formats, structure, and delivery methods • Establish automated processes for the transfer of data into DAAC data systems • Develop or integrate, test, verify, and run data production code (for applicable data) Preserve Data • Ensure redundant online disk and tape data storage • Establish file management (e.g. duplicate file detection) and file integrity (e.g. checksum verification) Describe Data • Create collection-level metadata and, when necessary, employ software to extract file-level metadata for the purposes of preservation, discovery and usage • Export metadata to NASA’s CMR for inter-mission and inter-sensor data discovery • Develop user documentation and supplemental information • Create DOIs and data citations for proper attribution of data and data creators Distribute Data • Provide discovery through the DAAC Web site, with direct HTTPS access • Support automated data transfer to users through subscriptions and APIs • Facilitate search, visualization, and customization through NASA Earthdata Search • Develop specialized portals and data services per mission and user needs Support Data • Assist user communities with the selection and usage of data • Create “How To” guides, FAQs, and other resources to address specific user needs • Work with user communities to identify needed improvements for data and tools • Provide outreach and education to broaden the user community
  • 15. Levels of Service To be cost efficient, not every dataset gets the same level of service
  • 16. Preservation involves ensuring long-term protection of… PRESERVE Bits Understandability Usability Reproducibility of Results Readability Discoverability and Accessibility
  • 17. NASA’s Preservation Content Specification for Earth Science Data 1. Preflight/Pre-Operations: Instrument/Sensor characteristics including pre-flight/pre- operations performance measurements; calibration method; radiometric and spectral response; noise characteristics; detector offsets 2. Science Data Products: Raw instrument data, Level 0 through Level 4 data products and associated metadata 3. Science Data Product Documentation: Structure and format with definitions of all parameters and metadata fields; algorithm theoretical basis; processing history and product version history; quality assessment information 4. Mission Data Calibration: Instrument/sensor calibration method (in operation) and data; calibration software used to generate lookup tables; instrument and platform events and maneuvers 5. Science Data Product Software: Product generation software and software documentation 6. Science Data Product Algorithm Input: Any ancillary data or other data sets used in generation or calibration of the data or derived product; ancillary data description and documentation 7. Science Data Product Validation: Records, publications and data sets 8. Science Data Software Tools: product access (reader) tools. ESDIS Project has supported this as an addition to the ISO – 19165 – “Geographic Information - Preservation of digital data and metadata” We submitted 19165-2 as specific to Earth Observation Data
  • 18. - Open APIs - Free Data Download - DAAC specific tools Earth Science Data Holdings - CMR: Metadata Catalog (Search Engine)* - User Login - GIBS: Global Imagery Browse Services* Open Service APIs - Earthdata.nasa.gov - Earthdata Search: data access/discovery* - Worldview: imagery* End User Web Clients Metrics! Federated resources EOSDIS core tools *open source software all by Dinosoft Labes from thenounproject.com (CC 3.0) Core Services
  • 19. Common Metadata Repository Provides a single source of unified, high-quality, and reliable Earth Science metadata with a high performance ingest and search architecture for submission and discovery of all EOSDIS data sets. Lightning fast, always available - 95% queries complete in <1s - 99.98% uptime (last 365d) Big Data Ready - 34K collections - 367 million files indexed - Prepared to scale 1B+ records Standards-focused - ISO-19115 metadata - OpenSearch/OGC CSW - REST based APIs Community-focused - Developer’s portal - Active Developer’s forum - Ecosystem of supported tools - Open Source codebase in github Internationally Recognized - Provides the backbone of the Community of Earth Observing Satellites International Directory Network (CEOS IDN)
  • 20. Data Discovery and Access • Earthdata Search facilitates discovery of data across all 12 DAACs (search.earthdata.nasa.gov) • Data is online and can be readily downloaded
  • 21. Full-Resolution Browse Capabilities • Global Imagery Browse System (GIBS) aids in discovery and access • GIBS software is open source; anyone can develop clients; Worldview is NASA’s client
  • 22. Near-Real Time Data Support • More than 380 unique datasets available within 3 hours of observation to serve a growing applications community • In FY17, EOSDIS distributed over 1 Petabyte of data (123+ million files) to over 370,000 users AIRS/AMSU-A, 27,765.25 , 3% AMSR2, 850.53 , 0% MISR,… MLS, 15,588.87 MODIS - Aqua, 395,928.65 MODIS - Terra, 455,312.83 , 50% OMI, 2,769.82 VIIRS, 11,633.67 Other1, 121.26 Distribution Volume (GBs) of NRT Products AIRS/AMSU-A, 3,027,266 , 3% AMSR2, 485,070 , 0% MISR, 38,048 , 0% MLS, 61,466,749 , 50% MODIS - Aqua, 32,416,963 , 26% MODIS - Terra, 21,712,972 , 18% OMI, 117,362 , 0% VIIRS, 3,598,043 , 3% Other1, 317,446 , 0% Distribution of NRT Products 0.00 2.00 4.00 6.00 8.00 10.00 07/08/18 07/09/18 07/10/18 07/11/18 07/12/18 07/13/18 07/14/18 07/15/18 07/16/18 07/17/18 07/18/18 07/19/18 07/20/18 07/21/18 07/22/18 07/23/18 07/24/18 07/25/18 07/26/18 07/27/18 07/28/18 07/29/18 07/30/18 07/31/18 08/01/18 08/02/18 08/03/18 08/04/18 Latency(Hours) Date AIRS-AQUA AMSR2-GCOM-W1 MISR-TERRA MLS-AURA MODIS-AQUA MODIS-TERRA MOPITT-TERRA OMI-AURA OMPS-SNPP VIIRS-JPSS1 VIIRS-SNPP Latency Requirement Four Week LANCE-Wide Latency and Distribution Trend for Level 0, 1, & 2 Products (8 July - 4 August, 2018) LANCE Instrument-Mission 4-Wk Curr-Wk AIRS-AQUA 91 91 AMSR2-GCOM-W1 71 70 MISR-TERRA 66 68 MLS-AURA 65 87 MODIS-AQUA 91 93 MODIS-TERRA 64 66 MOPITT-TERRA 48 50 OMI-AURA 75 75 OMPS-SNPP 136 136 VIIRS-JPSS1 131 126 VIIRS-SNPP 166 166 Average Latency (Min) LANCE Instrument-Mission Distribution Vol (GB) # of Files Distributed 4-Wk 4-Wk AIRS-AQUA 1,317.99 132,608 AMSR2-GCOM-W1 154.37 30,782 MISR-TERRA 0.27 3,251 MLS-AURA 234.50 922,200 MODIS-AQUA 35,599.97 1,983,230 MODIS-TERRA 26,782.46 1,343,567 MOPITT-TERRA 0.00 0 OMI-AURA 119.62 7,703 OMPS-SNPP 0.48 394 VIIRS-JPSS1 29.94 13,750 VIIRS-SNPP 2,206.89 536,919
  • 23. Data Analysis Tools Users can discover, analyze and visualize hundreds of products using technique customized for their science discipline Land Processes DAAC (USGS) - AppEEARS Goddard DAAC (GES DISC) - GIOVANNI
  • 24. User Support and Services • Each DAAC has a user services group to address users’ questions about data • Frequently Asked Questions (FAQ) Pages • User Forums for information exchange • Webinars and tutorials • Data recipes • Specific documentation
  • 25. American Customer Satisfaction Index (ACSI) survey scoring 79 from over 4,000 respondents EOSDIS delivered over 1.6 Billion data products to over 4.1 Million users from around the world Easy access and discovery of data to over 12,500 unique data products 33,000 Data Collections in the Common Metadata Repository (CMR) … of which 95% of granule searches complete in less than 1 Second Over 330,000 users have registered with EOSDIS to date NASA’s Earth Science Data System in 2018 EOSDIS currently has over 27 Petabytes of accessible Earth science data EOSDIS also delivers near- real-time products in under 3 hours from observation … And Over 380 Million data granules https://earthdata.nasa.gov/about/system-performance
  • 27. User Growth New, easy to use software, tools, services and data formats have exposed EO data to an ever growing user base. Includes 2 user types: • Very knowledgeable about the specific scientific context within which data were collected • Don’t require as much contextual information to find and use relevant data • Includes: • Domain specific research scientists • Principal investigators who originally collected the data Local Users • Leverage data for research and applications beyond the data’s original intended use • Includes: • Scientists conducting research across siloed domain environments • Users from the applications and decision making communities • Data scientists using data in innovative new ways Global Users
  • 28. Where Do Data and Users Come Together? • For local users -> local data centers • For global users -> Centralized, or aggregated catalogs • Provides a single discovery point for data from multiple sources • Brings together metadata from different data centers into an aggregated catalog and presents the metadata in a unified user interface • NASA’s aggregated catalog for Earth observation data is the Common Metadata Repository (CMR) and the unified user interface is the Earthdata Search client.
  • 29. Metadata In Aggregated Catalogs Metadata sets the stage for data - • Metadata makes it possible to search for data • Metadata limits and focuses attention to the relevant information about a dataset • Metadata helps a user understand whether data is relevant to a given research problem When metadata isn’t at its best, users can’t – • Find the right data • Understand the data 1. Tag by Guilhem from the Noun Project 2. Big data by ProSymbols from the Noun Project 3. users by Marksu Desu from the Noun Project
  • 30. Image Credit: https://earthobservatory.nasa.gov/images/696/spring-vegetation-in- north-america When Metadata Doesn’t Work… • Conducting a faceted search for ‘NDVI’ in Earthdata Search returns 14 datasets • NDVI, or the Normalized Difference Vegetation Index, is key for many applications based research questions • MODIS datasets are missing from the search results: • MODIS is a key instrument for calculating NDVI • MODIS Level 3 vegetation indices datasets which include NDVI as a vegetation layer are missing
  • 31. How Can We Assess Metadata Quality? • Metadata needs to be informative to both subject matter experts and applications based, or global, users • High quality metadata helps both user groups • However, creating and maintaining high quality metadata can cause metadata friction for data centers • NASA has established the Analysis and Review of CMR (ARC) team to define and assess metadata quality for Earth observation data and to lower metadata friction for data centers by: • Creating a metadata quality framework to assess metadata quality consistently and rigorously • Leveraging automated and manual checks to assess this quality • Building a team of reviewers with backgrounds in Earth system science, Atmospheric science, remote sensing and informatics • Defining a priority matrix to help lower metadata friction for data centers
  • 32. ARC Metadata Quality Review Process • Leverages metadata curation framework and ARC priority matrix • Process followed for each collection level metadata record and one randomly selected granule/file level record
  • 33. ARC Priority Matrix Priority Categorization Justification Red = High Priority Issues High priority issues emphasize several characteristics of metadata quality including completeness, accuracy and accessibility. Issues flagged as red are required to be addressed by the data provider. Yellow = Medium Priority Issues Medium priority issues emphasize consistency and completeness. Data providers are strongly encouraged to address yellow flagged issues. If a yellow flagged issue is not addressed, the data provider will be asked to provide a justification as to why. Blue = Low Priority Issues Low priority issues also focus on completeness, consistency and accuracy. Any additional information that may be provided to make the metadata more robust or complete is categorized as blue. Green = No Issue Elements flagged green are free of issues. Green flagged elements require no action on behalf of the data provider.
  • 34. Top Metadata Quality Issues • Broken URLs: • Data access URLs that do not conform to NASA requirements (ftp vs https) • No URLs to essential data documentation • No data access URLs provided at all
  • 35. Top Metadata Issues • DOIs and Collection State are new concepts that were recently added • Slow adoption of new concepts by data centers despite creating DOIs for data
  • 36. Top Metadata Issues • Data format information not widely adopted by data centers • Not viewed as an information priority in the past but important to users
  • 37. Top Metadata Issues • Abstracts are particularly problematic o Too lengthy o Non-existent o Not specific enough to describe data o Too technical for a global user
  • 38. Metadata Improvement To Date Sample of improved metadata quality to date (3 of 12 data centers) All data centers are actively participating in the improvement effort
  • 39. Lessons Learned 1. Leveraging a metadata quality framework operationally requires communication, compromise and reiteration 2. The metadata curation process is not a “Do-it- once-right-and-forget-about-it” activity but should instead be viewed as an iterative process 3. Curating metadata within an aggregated catalog may require an organizational mindset change • Need to consider not just local users needs and local metadata needs when curating metadataLearning by Becris from the Noun Project
  • 41. Current EOSDIS Architecture Open data policies drive system architecture Discipline specific support and tools (DAAC data) Optimized for archive, search and distribution Easily add new data products Supports millions of users – High ACSI score Uneven service and performance Significant interface coordination Limited on-demand product generation Fragmentation – duplication of services, software and storage
  • 42. (Pre)Formulation Implementation Primary Ops Extended Ops ISS Instruments LIS (2020), SAGE III (2020) TSIS-1 (2018), OCO-3 (2019), ECOSTRESS (2019), GEDI (2020) CLARREO-PF (2020), EMIT (TBD) InVEST/CubeSats RAVAN (2016) RainCube (2018) TEMPEST-D (2018) CubeRRT (2018) CSIM (2018) CIRiS (2019) HARP (2019) SNoOPI* HyTI* CTIM* TACOS* * Launch date TBD NASA Earth Science Missions: Present through 2023 Landsat 9 (2020) PACE (2022) NI-SAR (2021) SWOT (2021) TEMPO (2018) GRACE-FO (2) (2023) ICESat-2 (2021) CYGNSS (8) (2019) NISTAR, EPIC (DSCOVR / NOAA) (2019) Landsat 7 (USGS) (~2022) Terra (>2021) Aqua (>2022) CloudSat (~2018) CALIPSO (>2022) Aura (>2022) SMAP (>2022) Suomi NPP (NOAA) (>2022) Landsat 8 (USGS) (>2022) GPM (>2022) OCO-2 (>2022) Sentinel-6A/B (2020, 2025) MAIA (~2021) GeoCARB (~2021) TROPICS (12) (~2021) SORCE, TCTE (NOAA) (2017) OSTM/Jason-2 (NOAA) (>2022) JPSS-2 Instruments OMPS-Limb (2019) 02.22.19 PREFIRE (2) (TBD)
  • 43. EOSDIS Data System Evolution Petabytes Cloud offers benefits like the ability to analyze data at scale, analyze multiple data sets together easily and avoid lengthy expensive moves of large data sets allowing scientists to work on data “in place” The current architecture will not be cost effective as the annual ingest rate increases from 4 to 50PB/year EOSDIS is developing open source cloud native software for reuse across the agency, throughout the government and for any other user.
  • 44. Conceptual Cloud Architecture - 2021 Open Science = Open Data + Open Source Software + Open Services Discipline specific support and tools (All data) Infrastructure specialization by DAACs Processing next to data for anyone Optimized for multidisciplinary research Clear integration path for ACCESS, AIST… Supports external distribution Develop coordination and documentation Cost management Security, ITAR, SBU, EAR99 Business processes and skillsets Vendor lock-in
  • 45. Conceptual ‘data close to compute’ The operational model of consolidating data—allowing users to compute on the data in place with a platform of common tools—is natural to cloud; it is a cost-effective way to leverage cloud and could be applicable to many businesses and missions Large volume data storage: Centralized mission observation and model datasets stored in auto graduated AWS object storage (Amazon S3, Amazon S3 IA, Amazon Glacier) Scalable compute: Provision, access, and terminate dynamically based on need. Cost by use Cloud Native Compute: Cloud vendor service software stacks and microservices easing deployment of user based applications EOSDIS applications and services: Application and service layer using AWS compute, storage (Amazon S3, Amazon S3 IA, Amazon Glacier), and cloud native technologies Non-EOSDIS/public applications and services: Science community brings algorithms to the data. Support for NASA and non-NASA Bring customers to the data
  • 46. So we made this thing.
  • 47. Lightweight, cloud-native framework for data ingest, archive, distribution and management Goals - Provide core DAAC functionality in a configurable manner - Enable DAACs to help each other with re-usable, compatible containers (e.g. data retrieval, metadata extraction, metrics delivery) - Enable DAAC-specific customizations What is Cumulus?
  • 48. A lightweight framework consisting of: Tasks a discrete action in a workflow, invoked as a Lambda function or EC2 service, common protocol supports chaining Orchestration engine (AWS Step Functions) that controls invocation of tasks in a workflow Database store status, logs, and other system state information Workflows(s) file(s) that define the ingest, processing, publication, and archive operations (json) Dashboard create and execute workflows, monitor system Cumulus Major System Components
  • 49.
  • 50. Workshop- “Enabling Analytics in the Cloud for Earth Science Data” • The NASA Earth Science Data Systems Program sponsored a workshop on 21-23 February 2018 where the participants discussed the convergence of Big Data, Cloud Computing, and Analytics. • These discussions clustered around these themes • Strategic Alignment of Cloud Effort • Reference Architecture for Cloud Analytics • Analytics Optimized Data Stores • Reuse of Cloud Analytics Related Services • Wider Deep Learning Adoption
  • 51. Analytics-Optimized Data • Identified “Analytics Optimized Data Stores” (AODS) as a solution to Big Data Volume and Variety challenges • AODS are data stored to minimize the need for data-wrangling for a large user community which enable fast access • AODS are optimized, cost-effective storage structures enabling iterative queries relevant to users • Building AODS to enable new analytic tools and services is viewed as a best path forward
  • 52. Conceptual Architecture for Supporting Analytics
  • 53. Multi-Mission Algorithm and Analysis Platform (MAAP) Outcomes • Enable researchers to easily discover, process, visualize, and analyze large volumes of data from both ESA and NASA • Provide tools and the supporting infrastructure needed to enable data comparison, analysis, evaluation, and generation Deliverables • Establish NASA MAAP data infrastructure including the use of CMR, MMT and Cumulus Workflow • Ingest high priority datasets for pilot MAAP and curate metadata to high quality standards Vision: The goal of the MAAP is to establish a collaboration framework between ESA and NASA to share data, science algorithms and compute resources in order to support scientific research conducted by NASA and ESA scientists.
  • 55. Data Expedition: Exploring data for Gap Winds • Tehuantepecer - gap wind triggered by a synoptic scale high pressure system over the Great Plains of North America that pushes air through a narrow Sierra Madre mountain range gap CCMP data coverage of three Central American sites: (A) Tehuantepec, Mexico, (B) Payagayo, Costa Rico and (C) Panama, for March 27, 2008 • Strong gap winds produce associated cold water ocean regions by triggering intense vertical mixing of the ocean, a phenomenon referred to as ocean upwelling.
  • 56. Exploring data for Gap Winds Seasonal distribution of these events
  • 57. Exploring data for Gap Winds Heat map for the month showing Gap Winds Three major events can be seen for the month
  • 58. Exploring data for Gap Winds Verify results by looking at browse images for individual days
  • 59. Thank you! Kevin Murphy ESDS Program PE, NASA HQ Andy Mitchell ESDIS Project Manager, NASA GSFC IMPACT Team, NASA MSFC
  • 60. Data Access Centralized Reusable Capabilities • Earthdata: The EOSDIS website https://earthdata.nasa.gov will increase visibility to the interdisciplinary use of data and demonstrate how data are used. • High Performance Data Search and Discovery • Common Metadata Repository (CMR): Provide sub-second search and discovery services across the Sentinel and other EOSDIS holdings. • Earthdata Search Client: Data search and order tool https://search.earthdata.nasa.gov • Imagery and Data Visualization Tools • Global Imagery Browse Services (GIBS): full resolution imagery in a community standards-based set of imagery services • Worldview: highly responsive interface to explore GIBS imagery and download the underlying data granules https://earthdata.nasa.gov/labs/worldview/ • Giovanni: Quick-start exploratory data visualization and analysis tool • Near Real-Time Capabilities: Provided by “LANCE” (Land Atmosphere Near real-time Capability for EOS) which produces products within < 3 hours of observation. Near real-time capabilities are co-located with the standard science production facilities. • EOSDIS Metrics System (EMS): collects and reports on data ingest, archive, and distribution metrics across EOSDIS • Earthdata Infrastructure (EDI DevOps): platform for requirement management, code development, testing and deployment to operations • User Support Tool (UST): user relationship management and issue resolution (Kayako) • Earthdata Log-in (User Registration System): provides a centralized and simplified mechanism for user registration and account management for all EOSDIS system components.
  • 61. Digital Object Identifiers/Citations • Authors using data in publications should cite and reference the datasets properly • DOIs are assigned to EOSDIS datasets by the ESDIS Project • Associated with each DOI is a common landing page that provides details about datasets and shows how to cite them 0 200 400 600 800 1000 1200 NumberofDOIs Data Distributor Reserved: Created by ESDIS but not registered with the EZID Registered: Created by ESDIS and also registered with the EZID Total DOIs Registered ~10,000 Status of Digital Object Identifiers created with the ESDIS by Data Providers as of July 31, 2018