SlideShare a Scribd company logo
1
NASA’s Big Data
Challenges in
Climate Science
Tsengdar Lee, Ph.D.
High-end Computing Program Manager
NASA Headquarters
Presented at IEEE Big Data 2014 Workshop
October 29, 2014
2
3
7-km GEOS-5 Nature Run
Global Tropical Cyclones
The GEOS-5 Nature Run must
produce realistic tropical cyclone
activity to be viable for tropical
observing system simulation
experiments. This includes
realistic frequency and intra-
seasonal variability across the
global basins, as well as
intensities typically observed in
nature. This GEOS-5 Nature
Run successfully reproduces
typical tropical cyclone activity in
all basins including a large
number of weak tropical storms
as well as major hurricanes and
typhoons.
In this short period from September 7-12, 2006 during the GEOS-5 7-km
Nature Run, two hurricanes spin through the east Pacific basin while a
major Atlantic hurricane develops in the Gulf of Mexico making landfall
along the US Gulf coast as a category 3 hurricane with winds (color
shading) in excess of 110 mph.
Tropical Storm
winds 39-73 mph
Hurricane winds
74-111 mph
Major Hurricane
winds 112+ mph
4
Projected Data Holding
!!By 2020 it is estimated that all climate
data holdings, including simulation,
observation, and reanalysis sources, will
grow to hundreds of exabytes in a
worldwide-federated network [CKD
Workshop, 2011 and CCDC Workshop,
2011].
!! CCDC Workshop, International Workshop
on Climate Change Data Challenges, June
2011,
http://www.wikiprogress.org/index.php/
Event:International_Workshop_on_Climat
e_Change_Data_Challenges.
!! CKD Workshop, Climate Knowledge
Discovery Workshop, March 2011, DKRZ,
Hamburg, Germany,
https://redmine.dkrz.de/collaboration/
projects/ckd-workshop/wiki/
CKD_2011_Hamburg.
!! Climate Data Challenges in the 21st Century, Jonathan
T. Overpeck, et al. Science 331, 700 (2011); DOI:
10.1126/science.1197869
4
Credit: LLNL/Dean Williams
5
Turning Observations into Knowledge
and Decision Products
6
6
Spacecraft
Data Acquisition
Ground
Stations
Polar Ground Stations
Tracking & Data
Relay Satellite
(TDRS)
Distribution, Access,
Interoperability & Reuse
Research
Education
Value-Added
Providers
Interagency
Data Centers
International
Partners
Use in Earth &
Space Models
Benchmarking
DSS
TECHNOLOGY
Flight Operations, Data
Capture, Initial
Processing & Backup
Archive
Data
Transport to
DAACs
NASA
Integrated
Services
Network
(NISN)
Mission
Services
Data
Processing &
Mission
Control
WWW
IP
Internet
Science
Teams
MEASURE
AIST, CMAC
ACCESS
Science Data
Processing, Data
Mgmt., Data Archive &
Distribution
Measurement
Teams
Science
Data Systems
(DAACs, NSSDC)
Data Acquisition to Data Access
7
Scientific IT Requirements
!!Scientists and engineers often computing services to
perform data analysis, theory verification, and
predictions
!!Often move large volume of data to and from data
centers and to and from compute centers
!!Often need to communicate, collaborate, and share
data with external (e.g. university) investigators
!!Often require high speed connections and high speed
computing platforms beyond business administration
requirements
!!Often require local disk storage and visualization HW
and SW.
8
Typical Data Analysis and Data
Processing Work Loads
•! A scientist or engineer queries a metadata server for the
data and orders the data from a data center.
•! The data center fulfills the order by preparing
(subsetting, resampling, averaging etc.) the data and
puts the result on a FTP server.
•! After receiving a notification from the data center, the
investigator goes to the FTP server and fetches the data.
•! Data is transmitted to the investigator’s institution and
stored on a local storage.
•! The investigator processes and analyzes the data locally
using local computing resources.
•! Some of the processed data will have to transmitted back
to the data center.
9
Before Big Data…
Analogy and Challenges
!!Challenges:
•! Stewardship
•! Curation
•! Indexing
•! Cataloging
•! Searching
•! Ordering
•! Subsetting
•! Provenance
•! Lineage
•! Data Mining
•! Dissemination
!!Analogy:
10
Gearing up for Big Data Analytics
•! Traditional data center focuses on data archive, access and distribution
o! Scientists typically order and download specific data sets to a local
machine to perform analysis
o! With large amount of observational and modeling data,
downloading to local machine is becoming inefficient
o! Data centers are starting to provide additional services for data
analysis
•! NASA computing and computational science program is building “data
analytics platforms” using “Climate Analytics as a Service” (CAaaS)
such as NASA Earth Exchange (NEX) and Observation for Model
Intercomparison Project (Obs4MIPs) using Earth System Grid
Federation (ESGF)
•! Build on iROD, SciDB, Hadoop file system, Map Reduce,
Apache OODT, Apache Open Climate technologies
•! Enabled by a rule based data management system
•! Current research focuses on how to manage data movement from
the archives to the analytical platforms
11
Functional Architecture
12
Challenges in Big Data Analytics
!!Challenges:
•! Remote and local
data visualization
•! Server side
processing
•! Distributed data
analysis
•! Data on-boarding
•! ETL
•! High speed network
•! Data management
•! Data storage
13
Open Source Strategy
14
14
High Performance Computing, Cloud Computing,
Storage, Networks, and Data Centers
IT Security, Grid Utilities, Cloud Computing OS
Batch Scheduling, Help Desk
Science Hardware Foundation
IT Middleware / Services
Elements of numerical models, data services
(dynamical cores, matrix solvers, subsetting etc.)
Science Elements
Science frameworks / services (ESMF, POOMA,
SWMF, Curator, Workflow), Data Management
Complete models, data analyses,
OSSEs, sensor webs, virtual observatories
Science Applications
Science Architectures
Science Discovery
Observations Modeling
Theory
Science Infosystem
15
Future Directions and Challenges
!! Scale with “Big Data” produced by higher resolution
models, satellites, and instruments
!! Expand server-side functionality
!! Server-side processing through WPS (climate indexes, custom algorithms);
GIS mapping services (for climate change impact studies at regional and
local scale); Facilitate model to observations inter-comparison
!! Expand direct client access capabilities
!! Increased support for OPeNDAP based access; Track provenance of complex
processing workflows for reproducibility and repeatability; Controlled
Vocabularies
!! Package VMs for Cloud deployment
!! Instantiate ESGF nodes on demand for short lifetime projects; Environment
with elastic allocation of back-end storage and computing resources
16
Research Opportunities
NASA Research Opportunities in Space and Earth Sciences (ROSES)
solicitation include:
1.! Data systems, data management, access, and data processing
•! Making Earth System data records for Use in Research
Environments (MEaSUREs)
•! Advancing Collaborative Connections for Earth System Science
(ACCESS)
2.! Advanced Information System Technology
3.! Modeling and Data Assimilation Research:
•! Earth science Modeling and Assimilation Program (MAP)
•! Atmospheric Composition: Modeling and Analysis
•! Heliophysics, Astrophysics Theory Programs
4.! Computational Modeling Algorithms and Cyberinfrastructure (CMAC)
Program
5.! ROSES Solicitation Web site (enter “ROSES” in the keywords
field):
http://nspires.nasaprs.com/external/solicitations/solicitations.do?
method=open&stack=push
17
Thank You!
Tsengdar Lee, Ph.D.
High-end Computing Program Manager
Weather Focus Area Program Scientist
NASA Headquarters
tsengdar.lee@nasa.gov

More Related Content

Similar to IEEE_BigData2014-Lee.pdf

Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
Ian Foster
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
aceas13tern
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
Amazon Web Services
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
Robert Grossman
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Ian Foster
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
University of Washington
 
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
lleonardSlideShare
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
aceas13tern
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
The Statistical and Applied Mathematical Sciences Institute
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
The Statistical and Applied Mathematical Sciences Institute
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
Ravi Madduri
 
Data analytics space
Data analytics  spaceData analytics  space
Data analytics space
M S Prasad
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
Ian Foster
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
KGMGROUP
 

Similar to IEEE_BigData2014-Lee.pdf (20)

Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
Data‐intensive hydrologic modeling: A Cloud strategy for integrating PIHM, GI...
 
Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014Ben Evans SPEDDEXES 2014
Ben Evans SPEDDEXES 2014
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
CLIM Program: Remote Sensing Workshop, Satellites and Stovepipes - Jay Morris...
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
 
re:Invent 2013-foster-madduri
re:Invent 2013-foster-maddurire:Invent 2013-foster-madduri
re:Invent 2013-foster-madduri
 
Data analytics space
Data analytics  spaceData analytics  space
Data analytics space
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 

IEEE_BigData2014-Lee.pdf

  • 1. 1 NASA’s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014
  • 2. 2
  • 3. 3 7-km GEOS-5 Nature Run Global Tropical Cyclones The GEOS-5 Nature Run must produce realistic tropical cyclone activity to be viable for tropical observing system simulation experiments. This includes realistic frequency and intra- seasonal variability across the global basins, as well as intensities typically observed in nature. This GEOS-5 Nature Run successfully reproduces typical tropical cyclone activity in all basins including a large number of weak tropical storms as well as major hurricanes and typhoons. In this short period from September 7-12, 2006 during the GEOS-5 7-km Nature Run, two hurricanes spin through the east Pacific basin while a major Atlantic hurricane develops in the Gulf of Mexico making landfall along the US Gulf coast as a category 3 hurricane with winds (color shading) in excess of 110 mph. Tropical Storm winds 39-73 mph Hurricane winds 74-111 mph Major Hurricane winds 112+ mph
  • 4. 4 Projected Data Holding !!By 2020 it is estimated that all climate data holdings, including simulation, observation, and reanalysis sources, will grow to hundreds of exabytes in a worldwide-federated network [CKD Workshop, 2011 and CCDC Workshop, 2011]. !! CCDC Workshop, International Workshop on Climate Change Data Challenges, June 2011, http://www.wikiprogress.org/index.php/ Event:International_Workshop_on_Climat e_Change_Data_Challenges. !! CKD Workshop, Climate Knowledge Discovery Workshop, March 2011, DKRZ, Hamburg, Germany, https://redmine.dkrz.de/collaboration/ projects/ckd-workshop/wiki/ CKD_2011_Hamburg. !! Climate Data Challenges in the 21st Century, Jonathan T. Overpeck, et al. Science 331, 700 (2011); DOI: 10.1126/science.1197869 4 Credit: LLNL/Dean Williams
  • 5. 5 Turning Observations into Knowledge and Decision Products
  • 6. 6 6 Spacecraft Data Acquisition Ground Stations Polar Ground Stations Tracking & Data Relay Satellite (TDRS) Distribution, Access, Interoperability & Reuse Research Education Value-Added Providers Interagency Data Centers International Partners Use in Earth & Space Models Benchmarking DSS TECHNOLOGY Flight Operations, Data Capture, Initial Processing & Backup Archive Data Transport to DAACs NASA Integrated Services Network (NISN) Mission Services Data Processing & Mission Control WWW IP Internet Science Teams MEASURE AIST, CMAC ACCESS Science Data Processing, Data Mgmt., Data Archive & Distribution Measurement Teams Science Data Systems (DAACs, NSSDC) Data Acquisition to Data Access
  • 7. 7 Scientific IT Requirements !!Scientists and engineers often computing services to perform data analysis, theory verification, and predictions !!Often move large volume of data to and from data centers and to and from compute centers !!Often need to communicate, collaborate, and share data with external (e.g. university) investigators !!Often require high speed connections and high speed computing platforms beyond business administration requirements !!Often require local disk storage and visualization HW and SW.
  • 8. 8 Typical Data Analysis and Data Processing Work Loads •! A scientist or engineer queries a metadata server for the data and orders the data from a data center. •! The data center fulfills the order by preparing (subsetting, resampling, averaging etc.) the data and puts the result on a FTP server. •! After receiving a notification from the data center, the investigator goes to the FTP server and fetches the data. •! Data is transmitted to the investigator’s institution and stored on a local storage. •! The investigator processes and analyzes the data locally using local computing resources. •! Some of the processed data will have to transmitted back to the data center.
  • 9. 9 Before Big Data… Analogy and Challenges !!Challenges: •! Stewardship •! Curation •! Indexing •! Cataloging •! Searching •! Ordering •! Subsetting •! Provenance •! Lineage •! Data Mining •! Dissemination !!Analogy:
  • 10. 10 Gearing up for Big Data Analytics •! Traditional data center focuses on data archive, access and distribution o! Scientists typically order and download specific data sets to a local machine to perform analysis o! With large amount of observational and modeling data, downloading to local machine is becoming inefficient o! Data centers are starting to provide additional services for data analysis •! NASA computing and computational science program is building “data analytics platforms” using “Climate Analytics as a Service” (CAaaS) such as NASA Earth Exchange (NEX) and Observation for Model Intercomparison Project (Obs4MIPs) using Earth System Grid Federation (ESGF) •! Build on iROD, SciDB, Hadoop file system, Map Reduce, Apache OODT, Apache Open Climate technologies •! Enabled by a rule based data management system •! Current research focuses on how to manage data movement from the archives to the analytical platforms
  • 12. 12 Challenges in Big Data Analytics !!Challenges: •! Remote and local data visualization •! Server side processing •! Distributed data analysis •! Data on-boarding •! ETL •! High speed network •! Data management •! Data storage
  • 14. 14 14 High Performance Computing, Cloud Computing, Storage, Networks, and Data Centers IT Security, Grid Utilities, Cloud Computing OS Batch Scheduling, Help Desk Science Hardware Foundation IT Middleware / Services Elements of numerical models, data services (dynamical cores, matrix solvers, subsetting etc.) Science Elements Science frameworks / services (ESMF, POOMA, SWMF, Curator, Workflow), Data Management Complete models, data analyses, OSSEs, sensor webs, virtual observatories Science Applications Science Architectures Science Discovery Observations Modeling Theory Science Infosystem
  • 15. 15 Future Directions and Challenges !! Scale with “Big Data” produced by higher resolution models, satellites, and instruments !! Expand server-side functionality !! Server-side processing through WPS (climate indexes, custom algorithms); GIS mapping services (for climate change impact studies at regional and local scale); Facilitate model to observations inter-comparison !! Expand direct client access capabilities !! Increased support for OPeNDAP based access; Track provenance of complex processing workflows for reproducibility and repeatability; Controlled Vocabularies !! Package VMs for Cloud deployment !! Instantiate ESGF nodes on demand for short lifetime projects; Environment with elastic allocation of back-end storage and computing resources
  • 16. 16 Research Opportunities NASA Research Opportunities in Space and Earth Sciences (ROSES) solicitation include: 1.! Data systems, data management, access, and data processing •! Making Earth System data records for Use in Research Environments (MEaSUREs) •! Advancing Collaborative Connections for Earth System Science (ACCESS) 2.! Advanced Information System Technology 3.! Modeling and Data Assimilation Research: •! Earth science Modeling and Assimilation Program (MAP) •! Atmospheric Composition: Modeling and Analysis •! Heliophysics, Astrophysics Theory Programs 4.! Computational Modeling Algorithms and Cyberinfrastructure (CMAC) Program 5.! ROSES Solicitation Web site (enter “ROSES” in the keywords field): http://nspires.nasaprs.com/external/solicitations/solicitations.do? method=open&stack=push
  • 17. 17 Thank You! Tsengdar Lee, Ph.D. High-end Computing Program Manager Weather Focus Area Program Scientist NASA Headquarters tsengdar.lee@nasa.gov