SlideShare a Scribd company logo
1 of 24
Download to read offline
THE EARTH SYSTEM GRID FEDERATION
AS A TESTBED FOR
GLOBAL DISTRIBUTED DATA ANALYTICS
WORKSHOP ON
REMOTE SENSING, UNCERTAINTY QUANTIFICATION, AND THEORY OF DATA SYSTEMS
CALTECH, PASADENA (CA)
FEBRUARY 2018
LUCA CINQUINI
JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY
© 2018. ALL RIGHTS RESERVED.
JPL UNLIMITED RELEASE CLEARANCE NUMBER: #18-0460
Abstract
“Talk about the Earth System Grid Federation (perhaps
the world's most successful example of a federated,
distributed data system)”

“Comment on what analytical capabilities are currently
available through the ESGF”

Amy Braverman
PART 1
INTRODUCTION TO THE
EARTH SYSTEM GRID FEDERATION
ESGF Overview
ESGF is an international collaboration of climate centers working together to manage
and provide access to climate data - models and observations

Started more than a decade ago, now the world premier technology infrastructure in
support of climate science

Spanning several tens of institutions in Europe, North America, Australia and Asia

Funding from DOE, NASA (U.S.), Copernicus (EU), NCI (Australia), CRIM (Canada)

Winner of the 2017 “R&D 100 Award” - prestigious conference that every year
recognizes the top 100 most innovate products in software, science and technology
System Architecture
ESGF is a system of distributed and federated Nodes that host data and services

Distributed: data and metadata are published, stored and served from multiple
Nodes across the world

Federated: Nodes interoperate because of the adoption of common services,
protocols and APIs, and the establishment of mutual trust relationships

A client (browser or program) can start from any Node in the federation and
discover, download and analyze data from multiple locations as if they were
stored in a single central archive
Current Data Holdings
ESGF hosts some of the most prominent data collections for
climate change research:

CMIP3, CMIP5 (“Coupled Model Inter-Comparison Project“):
output of global climate models used for periodic IPCC
assessment reports on climate change

CORDEX (“COrdinated Regional climate Downscaling
EXperiment"): output of regional climate models, grouped by
domain (N. America, Europe, Antarctica, etc.)

Obs4MIPs (“Observations for Model Inter-Comparison”):
observational data from NASA, ESA, etc. formatted to look like
climate model output

Ana4MIPS (“ReAnalysis for Model Inter-Comparison”): re-
analysis data formatted like model output

Many other MIPs: TAMIP, GeoMIP, DCMIP, …

The World Climate Research Program (WCRP) has recommended
that ESGF infrastructure be supported operationally, and that all
future MIPs follow the CMIP5 process and standards (October ’12)
HadCM3
CORDEX
Obs4MIPs
Future Data Holdings
ESGF is preparing for a massive increase in its data holdings (10x in the
next 3 years):

CMIP6 - starting in early 2018 and terminating in mid 2019

Models runs by ~30 modeling centers around the world

Approximately 25-40 PB of uncompressed primary output,
replicated at 4 sites

Supporting CMIP6 will require enhanced scalability and new
services for metadata, errata, persistent identifiers

ESGF is holding a series of “Data and Services” challenges
leading to a formal release of CMIP6 data on June 1st, 2018

Obs4MIPs - expected ~200 more data collections from NASA, NOAA
and European agencies

Dozens of additional MIPs expected to leverage ESGF infrastructure
PART 2
ESGF DATA ACCESS
AND ANALYTICAL CAPABILITIES
Federated Search
ESGF features a state-of-the-art federated search based on Apache Solr,
including advanced features as distributed searches and replication

Metadata are stored on separate catalogs at multiple sites, yet a search
initiated at any site is able to find results throughout the federation

Each site runs at least 2 Solr instances: one master Solr to publish
metadata (“write”) and one slave Solr to search (“read”)

Optionally, a site can choose to replicate some or all of the remote
catalogs to improve search performance
INDEX NODE
Solr
Master
(Write)
Solr
Slave
(Read)
INDEX NODE
Solr
Master
(Write)
Solr
Slave
(Read)
aggregations
files
datasets
INDEX NODE
Solr
Master
(Write)
Solr
Slave
(Read)
Solr
Replica
(Read)
Solr
Replica
(Read)
Search API
ESGF exposes a RESTful API to query its distributed
metadata catalogs:

Available to any HTTP client, including the front-
end web portals

The HTTP client can start the query from any
index node, results will span the whole federation

Query syntax supports both free text and
climate-specific keywords (aka facets)

Results returned as XML or JSON, possibly
paginated
Query for daily humidity data from a specific CMIP5 model:

https://esgf-node.jpl.nasa.gov/esg-search/search/?
project=CMIP5&variable=hus&model=CCSM4&time_frequency=day

See: https://www.earthsystemcog.org/projects/cog/esgf_search_restful_api
Authentication & Authorization
ESGF features a federated authentication and
authorization model:

Authentication: users can register at any Node,
they are assigned an OpenID which they can
use to authenticate anywhere in the federation

Authorization: each Node has complete control
over local resources by establishing policies that
match a set of resources to the required group
membership for specific operations - policy
tuple: (resource, group, operation)

Despite our best efforts, security remains an
obstacle when users want to access data

Upcoming security improvements:

OpenID 2.0 —> OAuth2.0 and OpenID Connect

Group membership —> free data download for
major data collections

MyProxy —> SLCS server to request an X.509
certificate via HTTP for some operations (data
publishing and data processing)
ESGF Node A
registration >
authentication >
ESGF Node B
data access >
user identity >
Data
data access policy
ESGF Node C
user attributes >
Group Membership
Database
Data Download
ESGF supports several methods to download data from the system:

Using a browser to initiate a single file download via HTTP

Using a browser to generate a wget script to download a large number of files via HTTP

Using a browser to initiate a Globus download through GridFTP

Using OpenDAP API to select specific variables, sub-set by space and time

Using esgfpy client to search and download full files

Currently all methods require authentication via OpenID or X.509 cert + group membership
Search Results Data Cart Globus Download
Remote Computing
ESGF is working at enabling server-side
computing i.e. moving the computation to the
data

Motivated by ever larger size of data archives -
impossible to download to a central location

Problem is made even more complex by the
distributed nature of the archives

Compute architecture under development is
based on the client-server paradigm:

Each ESGF Node deploys a Compute
processing server

A client (program or web portal) makes HTTP
request to one server which optionally
retrieves data from other servers through
OpenDAP

Currently operations can be performed at one
Node only
ESGF Node
Compute Engine
WPS Request
GET or POST >
UV-CDAT
implementation
Compute Client
Compute Server Side API
Data
EDAS
implementation
Ophidia
implementation
< WPS Response
ESGF Node
Data
Compute Engine
Compute Client Side API
Remote Computing: Server Side
ESGF defined a server-side computing API that conforms to the OGC/WPS standard:

GET: http://hostname/wps?service=wps&request=getcapabilities

GET: http://hostname/wps?service=wps&request=describeprocess&identifier=grid

GET: http://hostname/wps?
service=wps&request=execute&identifier=grid&datainputs=dataset=OCO2;algorithm=simple_averaging

POST: supports more complex requests with XML payload

3 different back-end implementations of server-side API:

UV-CDAT (LLNL): subset, aggregate, regrid, min, max, supports curvilinear grids

EDAS (NCCS/GSFC): parallelized (Spark) subset, aggregate, regrid, ensemble (mul, diff, min, max, ave, sum),
and reduction (mul, diff, min, max, ave, sum, rms), supports composition of canonical operations into workflows

Ophidia (CMCC): subset and reduction (max,min)

Django application packaged as Docker container brokers the request to available back-end implementations

Status: test servers deployed at LLNL, NCCS, currently being integrated with ESGF software stack

Currently no provision for running custom analytics

GitHub repository: https://github.com/ESGF/esgf-compute-wps 

API spec: https://docs.google.com/document/d/1GSLwSJPUCfs-ZrYCG1n7BcWyldvb4pT9nYD3zkPCwao
Example WPS Requests
HTTP GET:

https://aims2.llnl.gov/wps?service=WPS&request=Execute&datainputs=“[variable=[{"id":"cct|3eb2568e64913-0","uri":"http://
aims3.llnl.gov/thredds/dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/
cct_Amon_CMCC-CM_decadal2005_r1i2p1_200511-201512.nc"},{"id":"cct|3eb2568e64913-1","uri":"http://aims3.llnl.gov/thredds/
dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC-
CM_decadal2005_r1i2p1_201601-202512.nc"},{"id":"cct|3eb2568e64913-2","uri":"http://aims3.llnl.gov/thredds/dodsC/
cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC-
CM_decadal2005_r1i2p1_202601-203512.nc"}];domain=[{"id":"f27f4caba6fb1","lat":
{"start":"-89.427084176","end":"89.427084176","step":1,"crs":"values"},"lon":{"start":"0.0","end":"359.25","step":
1,"crs":"values"},"time":{"start":"15.0","end":11002.5,"step":
1,"crs":"values"}}];operation=[{"name":"CDAT.max","result":"0a08c78e900a3","domain":"f27f4caba6fb1","input":
["3eb2568e64913-0","3eb2568e64913-1","3eb2568e64913-2"],"axes":"time”}];]”

HTTP POST:

<wps:Execute xmlns:wps="http://www.opengis.net/wps/1.0.0" service="WPS" version="1.0.0" xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0/
wpsExecute_request.xsd"><wps:Identifier>CDAT.max</wps:Identifier><wps:DataInputs><wps:Input><ows:Identifier
xmlns:ows="http://www.opengis.net/ows/1.1">operation</
ows:Identifier><wps:Data><wps:ComplexData>[{"name":"CDAT.max","result":"0a08c78e900a3","domain":"f27f4caba6fb1","input":
["3eb2568e64913-0","3eb2568e64913-1","3eb2568e64913-2"],"axes":"time"}]</wps:ComplexData></wps:Data></
wps:Input><wps:Input><ows:Identifier xmlns:ows="http://www.opengis.net/ows/1.1">domain</
ows:Identifier><wps:Data><wps:ComplexData>[{"id":"f27f4caba6fb1","lat":{"start":"-89.427084176","end":"89.427084176","step":
1,"crs":"values"},"lon":{"start":"0.0","end":"359.25","step":1,"crs":"values"},"time":{"start":"15.0","end":11002.5,"step":
1,"crs":"values"}}]</wps:ComplexData></wps:Data></wps:Input><wps:Input><ows:Identifier xmlns:ows="http://www.opengis.net/
ows/1.1">variable</ows:Identifier><wps:Data><wps:ComplexData>[{"id":"cct|3eb2568e64913-0","uri":"http://aims3.llnl.gov/thredds/
dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC-
CM_decadal2005_r1i2p1_200511-201512.nc"},{"id":"cct|3eb2568e64913-1","uri":"http://aims3.llnl.gov/thredds/dodsC/
cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC-
CM_decadal2005_r1i2p1_201601-202512.nc"},{"id":"cct|3eb2568e64913-2","uri":"http://aims3.llnl.gov/thredds/dodsC/
cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC-
CM_decadal2005_r1i2p1_202601-203512.nc"}]</wps:ComplexData></wps:Data></wps:Input></wps:DataInputs></wps:Execute>
Remote Computing: Client Side
ESGF developed a client-side
Python library to facilitate the
process of creating, sending and
monitoring compute requests

Several examples available as
Jupyter notebooks

Requires authentication:

User first logs onto web portal
with OpenID, password to
obtain an authorization token

Authorization token in used as
part of client toolkit

GitHub repository: https://
github.com/ESGF/esgf-compute-
api
Remote Computing: Web Client
Also available: web user interface
to formulate and send requests
to the remote Compute engine

Integrated with standard ESGF UI
PART 3
USING ESGF AS A TESTBED
FOR ANALYTICAL DATA PROCESSING
Client-Driven Architecture
A local client access all data via ESGF APIs (search, download, subset)

All processing executed locally (laptop, in-premise, cloud)

Still takes advantage of a distributed federated archive, standard data/metadata
formats, programmatic access APIs

Possibly no access control for read-only operations

Can be executed right now
ClientAnalytics
ESGF Node
DataESGF APIs
ESGF Node
DataESGF APIs
Server-Side Computation
In the near future (~12 months), ESGF will support server-side computation

Analytics can be executed close to the data, but must conform to the WPS API

Must use access control to authorize execution on remote servers

Open questions:

How to deploy custom algorithms at each Node ?

Maybe use a limited number of “friendly” Nodes

How to execute workflows that span multiple Nodes ?
Client
ESGF Node
Data
ESGF
Compute
Server-Side
API
Analytics
ESGF Compute
Client-Side
API ESGF Node
Data
ESGF
Compute
Server-Side
API
Analytics
Containerization
ESGF is working on a new system architecture based on Docker containers

A “container” is a lightweight package that includes a program executable + all its
dependencies + just enough OS to run it

In the future, all ESGF data services will be deployed and operated as Docker
containers
ESGF Database Node
ESGF Front-End Node
ESGF User Interface Node ESGF Index Node ESGF Identity Provider
Node ESGF Data NodeESGF Data NodeESGF Data Node
POSTGRES DATA
Apache httpd
CoG
(mod_wsgi)
esgf-search webapp
(Tomcat)
Dashboard UI
(Tomcat)
Postgres
SOLR CATALOGS
Stores local metadata
catalogs and replicas,
enables searching
web accessible UI for
human users
Front-End
network proxy
User authentication,
issues temporary
credentials
Data storage and access
Dashboard Information
Provider
Dashboard stats API
(Tomcat)
THREDDS CATALOGS
Solr
(Jetty)Solr
(Jetty)Solr
(Jetty)Solr
(Jetty)
esgf-idp
(Tomcat)
SLCS + OAuth servers
Threads Data Server
(Tomcat)
esgf access control filters
Openid Relying Party
(Tomcat)
ESGF Publisher Client
ESGF OAuth Client
Containerization
ESGF/Docker software stack 1.4 as deployed on an AWS cluster of 5 EC2 instances
Containerization
In the next few years, we can expect most ESGF sites to use Docker to operate at least some of
their services - possibly the full ESGF stack 

Docker engine at each site can be used to execute custom analytics as self-contained Docker
containers

No complex software installation at each site

Direct read-only access to local data - no data level access control

Open questions:

How do you orchestrate containers running at distributed sites ?

How do you authorize containers to run at each site ?
Client
ESGF Node
Data
Docker Container
Analytics
Docker Engine
ESGF Node
Data
Docker Container
Analytics
Docker Engine
QUESTIONS ?

More Related Content

What's hot

OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Laurent Lefort
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography Facility
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pRobert Grossman
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATTRudolf Husar
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefRobert Grossman
 
2005-01-08 MANE-VU Status Report on CATT and FASTNET
2005-01-08 MANE-VU Status Report on CATT and FASTNET2005-01-08 MANE-VU Status Report on CATT and FASTNET
2005-01-08 MANE-VU Status Report on CATT and FASTNETRudolf Husar
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and UsageRoope Tervo
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
Open Weather Data as Part of Big Data
Open Weather Data as Part of Big DataOpen Weather Data as Part of Big Data
Open Weather Data as Part of Big DataRoope Tervo
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Globus
 
FMI Open Data Interface and Data Models
FMI Open Data Interface and Data ModelsFMI Open Data Interface and Data Models
FMI Open Data Interface and Data ModelsRoope Tervo
 
Aus plots escience-brasil
Aus plots escience-brasilAus plots escience-brasil
Aus plots escience-brasilbensparrowau
 

What's hot (20)

OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Rangeland hydrology and erosion model
Rangeland hydrology and erosion modelRangeland hydrology and erosion model
Rangeland hydrology and erosion model
 
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
Semantically-Enabling the Web of Things: The W3C Semantic Sensor Network Onto...
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences Data
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT2003-11-02 Combined Aerosol Trajectory Tool, CATT
2003-11-02 Combined Aerosol Trajectory Tool, CATT
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Fast Cat Mv3
Fast Cat Mv3Fast Cat Mv3
Fast Cat Mv3
 
2005-01-08 MANE-VU Status Report on CATT and FASTNET
2005-01-08 MANE-VU Status Report on CATT and FASTNET2005-01-08 MANE-VU Status Report on CATT and FASTNET
2005-01-08 MANE-VU Status Report on CATT and FASTNET
 
FMI Open Data Interface and Usage
FMI Open Data Interface and UsageFMI Open Data Interface and Usage
FMI Open Data Interface and Usage
 
Pilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOTPilot Project for HDF5 Metadata Structures for SWOT
Pilot Project for HDF5 Metadata Structures for SWOT
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 
Open Weather Data as Part of Big Data
Open Weather Data as Part of Big DataOpen Weather Data as Part of Big Data
Open Weather Data as Part of Big Data
 
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...
 
FMI Open Data Interface and Data Models
FMI Open Data Interface and Data ModelsFMI Open Data Interface and Data Models
FMI Open Data Interface and Data Models
 
Aus plots escience-brasil
Aus plots escience-brasilAus plots escience-brasil
Aus plots escience-brasil
 
SPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth ObservationSPD and KEA: HDF5 based file formats for Earth Observation
SPD and KEA: HDF5 based file formats for Earth Observation
 

Similar to ESGF: Earth System Grid Federation

The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionIan Foster
 
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe  first pilot-presentation-hangoutFirst online hangout SC5 - Big Data Europe  first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe first pilot-presentation-hangoutBigData_Europe
 
Building the Next Generation Earth System Grid Federation (ESGF2)
Building the Next Generation Earth System Grid Federation (ESGF2)Building the Next Generation Earth System Grid Federation (ESGF2)
Building the Next Generation Earth System Grid Federation (ESGF2)Globus
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGJordan Alpert
 
Scientific
Scientific Scientific
Scientific marpierc
 
Weather and Climate Visualization software
Weather and Climate Visualization softwareWeather and Climate Visualization software
Weather and Climate Visualization softwareRahul Gupta
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...DataWorks Summit/Hadoop Summit
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryIan Foster
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Globus
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008Ian Foster
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsARDC
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project OverviewFloris Sluiter
 
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopDay 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopICIMOD
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEuropeBigData_Europe
 

Similar to ESGF: Earth System Grid Federation (20)

The Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, EvolutionThe Earth System Grid Federation: Origins, Current State, Evolution
The Earth System Grid Federation: Origins, Current State, Evolution
 
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe  first pilot-presentation-hangoutFirst online hangout SC5 - Big Data Europe  first pilot-presentation-hangout
First online hangout SC5 - Big Data Europe first pilot-presentation-hangout
 
Building the Next Generation Earth System Grid Federation (ESGF2)
Building the Next Generation Earth System Grid Federation (ESGF2)Building the Next Generation Earth System Grid Federation (ESGF2)
Building the Next Generation Earth System Grid Federation (ESGF2)
 
AGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUGAGU_Iguassu_Brazil_AUG
AGU_Iguassu_Brazil_AUG
 
Scientific
Scientific Scientific
Scientific
 
Weather and Climate Visualization software
Weather and Climate Visualization softwareWeather and Climate Visualization software
Weather and Climate Visualization software
 
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
A Data Lake and a Data Lab to Optimize Operations and Safety within a nuclear...
 
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific DiscoveryA Global Research Data Platform: How Globus Services Enable Scientific Discovery
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
 
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
Gladier: The Globus Architecture for Data Intensive Experimental Research (AP...
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
How to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collectionsHow to use NCI's national repository of big spatial data collections
How to use NCI's national repository of big spatial data collections
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project Overview
 
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshopDay 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
Day 1 sanjay jayanarayanan, iitm, india, arrcc-carissa workshop
 
ICWE2017 BigDataEurope
ICWE2017 BigDataEuropeICWE2017 BigDataEurope
ICWE2017 BigDataEurope
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

ESGF: Earth System Grid Federation

  • 1. THE EARTH SYSTEM GRID FEDERATION AS A TESTBED FOR GLOBAL DISTRIBUTED DATA ANALYTICS WORKSHOP ON REMOTE SENSING, UNCERTAINTY QUANTIFICATION, AND THEORY OF DATA SYSTEMS CALTECH, PASADENA (CA) FEBRUARY 2018 LUCA CINQUINI JET PROPULSION LABORATORY, CALIFORNIA INSTITUTE OF TECHNOLOGY © 2018. ALL RIGHTS RESERVED. JPL UNLIMITED RELEASE CLEARANCE NUMBER: #18-0460
  • 2. Abstract “Talk about the Earth System Grid Federation (perhaps the world's most successful example of a federated, distributed data system)” “Comment on what analytical capabilities are currently available through the ESGF” Amy Braverman
  • 3. PART 1 INTRODUCTION TO THE EARTH SYSTEM GRID FEDERATION
  • 4. ESGF Overview ESGF is an international collaboration of climate centers working together to manage and provide access to climate data - models and observations Started more than a decade ago, now the world premier technology infrastructure in support of climate science Spanning several tens of institutions in Europe, North America, Australia and Asia Funding from DOE, NASA (U.S.), Copernicus (EU), NCI (Australia), CRIM (Canada) Winner of the 2017 “R&D 100 Award” - prestigious conference that every year recognizes the top 100 most innovate products in software, science and technology
  • 5. System Architecture ESGF is a system of distributed and federated Nodes that host data and services Distributed: data and metadata are published, stored and served from multiple Nodes across the world Federated: Nodes interoperate because of the adoption of common services, protocols and APIs, and the establishment of mutual trust relationships A client (browser or program) can start from any Node in the federation and discover, download and analyze data from multiple locations as if they were stored in a single central archive
  • 6. Current Data Holdings ESGF hosts some of the most prominent data collections for climate change research: CMIP3, CMIP5 (“Coupled Model Inter-Comparison Project“): output of global climate models used for periodic IPCC assessment reports on climate change CORDEX (“COrdinated Regional climate Downscaling EXperiment"): output of regional climate models, grouped by domain (N. America, Europe, Antarctica, etc.) Obs4MIPs (“Observations for Model Inter-Comparison”): observational data from NASA, ESA, etc. formatted to look like climate model output Ana4MIPS (“ReAnalysis for Model Inter-Comparison”): re- analysis data formatted like model output Many other MIPs: TAMIP, GeoMIP, DCMIP, … The World Climate Research Program (WCRP) has recommended that ESGF infrastructure be supported operationally, and that all future MIPs follow the CMIP5 process and standards (October ’12) HadCM3 CORDEX Obs4MIPs
  • 7. Future Data Holdings ESGF is preparing for a massive increase in its data holdings (10x in the next 3 years): CMIP6 - starting in early 2018 and terminating in mid 2019 Models runs by ~30 modeling centers around the world Approximately 25-40 PB of uncompressed primary output, replicated at 4 sites Supporting CMIP6 will require enhanced scalability and new services for metadata, errata, persistent identifiers ESGF is holding a series of “Data and Services” challenges leading to a formal release of CMIP6 data on June 1st, 2018 Obs4MIPs - expected ~200 more data collections from NASA, NOAA and European agencies Dozens of additional MIPs expected to leverage ESGF infrastructure
  • 8. PART 2 ESGF DATA ACCESS AND ANALYTICAL CAPABILITIES
  • 9. Federated Search ESGF features a state-of-the-art federated search based on Apache Solr, including advanced features as distributed searches and replication Metadata are stored on separate catalogs at multiple sites, yet a search initiated at any site is able to find results throughout the federation Each site runs at least 2 Solr instances: one master Solr to publish metadata (“write”) and one slave Solr to search (“read”) Optionally, a site can choose to replicate some or all of the remote catalogs to improve search performance INDEX NODE Solr Master (Write) Solr Slave (Read) INDEX NODE Solr Master (Write) Solr Slave (Read) aggregations files datasets INDEX NODE Solr Master (Write) Solr Slave (Read) Solr Replica (Read) Solr Replica (Read)
  • 10. Search API ESGF exposes a RESTful API to query its distributed metadata catalogs: Available to any HTTP client, including the front- end web portals The HTTP client can start the query from any index node, results will span the whole federation Query syntax supports both free text and climate-specific keywords (aka facets) Results returned as XML or JSON, possibly paginated Query for daily humidity data from a specific CMIP5 model: https://esgf-node.jpl.nasa.gov/esg-search/search/? project=CMIP5&variable=hus&model=CCSM4&time_frequency=day See: https://www.earthsystemcog.org/projects/cog/esgf_search_restful_api
  • 11. Authentication & Authorization ESGF features a federated authentication and authorization model: Authentication: users can register at any Node, they are assigned an OpenID which they can use to authenticate anywhere in the federation Authorization: each Node has complete control over local resources by establishing policies that match a set of resources to the required group membership for specific operations - policy tuple: (resource, group, operation) Despite our best efforts, security remains an obstacle when users want to access data Upcoming security improvements: OpenID 2.0 —> OAuth2.0 and OpenID Connect Group membership —> free data download for major data collections MyProxy —> SLCS server to request an X.509 certificate via HTTP for some operations (data publishing and data processing) ESGF Node A registration > authentication > ESGF Node B data access > user identity > Data data access policy ESGF Node C user attributes > Group Membership Database
  • 12. Data Download ESGF supports several methods to download data from the system: Using a browser to initiate a single file download via HTTP Using a browser to generate a wget script to download a large number of files via HTTP Using a browser to initiate a Globus download through GridFTP Using OpenDAP API to select specific variables, sub-set by space and time Using esgfpy client to search and download full files Currently all methods require authentication via OpenID or X.509 cert + group membership Search Results Data Cart Globus Download
  • 13. Remote Computing ESGF is working at enabling server-side computing i.e. moving the computation to the data Motivated by ever larger size of data archives - impossible to download to a central location Problem is made even more complex by the distributed nature of the archives Compute architecture under development is based on the client-server paradigm: Each ESGF Node deploys a Compute processing server A client (program or web portal) makes HTTP request to one server which optionally retrieves data from other servers through OpenDAP Currently operations can be performed at one Node only ESGF Node Compute Engine WPS Request GET or POST > UV-CDAT implementation Compute Client Compute Server Side API Data EDAS implementation Ophidia implementation < WPS Response ESGF Node Data Compute Engine Compute Client Side API
  • 14. Remote Computing: Server Side ESGF defined a server-side computing API that conforms to the OGC/WPS standard: GET: http://hostname/wps?service=wps&request=getcapabilities GET: http://hostname/wps?service=wps&request=describeprocess&identifier=grid GET: http://hostname/wps? service=wps&request=execute&identifier=grid&datainputs=dataset=OCO2;algorithm=simple_averaging POST: supports more complex requests with XML payload 3 different back-end implementations of server-side API: UV-CDAT (LLNL): subset, aggregate, regrid, min, max, supports curvilinear grids EDAS (NCCS/GSFC): parallelized (Spark) subset, aggregate, regrid, ensemble (mul, diff, min, max, ave, sum), and reduction (mul, diff, min, max, ave, sum, rms), supports composition of canonical operations into workflows Ophidia (CMCC): subset and reduction (max,min) Django application packaged as Docker container brokers the request to available back-end implementations Status: test servers deployed at LLNL, NCCS, currently being integrated with ESGF software stack Currently no provision for running custom analytics GitHub repository: https://github.com/ESGF/esgf-compute-wps API spec: https://docs.google.com/document/d/1GSLwSJPUCfs-ZrYCG1n7BcWyldvb4pT9nYD3zkPCwao
  • 15. Example WPS Requests HTTP GET: https://aims2.llnl.gov/wps?service=WPS&request=Execute&datainputs=“[variable=[{"id":"cct|3eb2568e64913-0","uri":"http:// aims3.llnl.gov/thredds/dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/ cct_Amon_CMCC-CM_decadal2005_r1i2p1_200511-201512.nc"},{"id":"cct|3eb2568e64913-1","uri":"http://aims3.llnl.gov/thredds/ dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC- CM_decadal2005_r1i2p1_201601-202512.nc"},{"id":"cct|3eb2568e64913-2","uri":"http://aims3.llnl.gov/thredds/dodsC/ cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC- CM_decadal2005_r1i2p1_202601-203512.nc"}];domain=[{"id":"f27f4caba6fb1","lat": {"start":"-89.427084176","end":"89.427084176","step":1,"crs":"values"},"lon":{"start":"0.0","end":"359.25","step": 1,"crs":"values"},"time":{"start":"15.0","end":11002.5,"step": 1,"crs":"values"}}];operation=[{"name":"CDAT.max","result":"0a08c78e900a3","domain":"f27f4caba6fb1","input": ["3eb2568e64913-0","3eb2568e64913-1","3eb2568e64913-2"],"axes":"time”}];]” HTTP POST: <wps:Execute xmlns:wps="http://www.opengis.net/wps/1.0.0" service="WPS" version="1.0.0" xmlns:xsi="http://www.w3.org/2001/ XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0/ wpsExecute_request.xsd"><wps:Identifier>CDAT.max</wps:Identifier><wps:DataInputs><wps:Input><ows:Identifier xmlns:ows="http://www.opengis.net/ows/1.1">operation</ ows:Identifier><wps:Data><wps:ComplexData>[{"name":"CDAT.max","result":"0a08c78e900a3","domain":"f27f4caba6fb1","input": ["3eb2568e64913-0","3eb2568e64913-1","3eb2568e64913-2"],"axes":"time"}]</wps:ComplexData></wps:Data></ wps:Input><wps:Input><ows:Identifier xmlns:ows="http://www.opengis.net/ows/1.1">domain</ ows:Identifier><wps:Data><wps:ComplexData>[{"id":"f27f4caba6fb1","lat":{"start":"-89.427084176","end":"89.427084176","step": 1,"crs":"values"},"lon":{"start":"0.0","end":"359.25","step":1,"crs":"values"},"time":{"start":"15.0","end":11002.5,"step": 1,"crs":"values"}}]</wps:ComplexData></wps:Data></wps:Input><wps:Input><ows:Identifier xmlns:ows="http://www.opengis.net/ ows/1.1">variable</ows:Identifier><wps:Data><wps:ComplexData>[{"id":"cct|3eb2568e64913-0","uri":"http://aims3.llnl.gov/thredds/ dodsC/cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC- CM_decadal2005_r1i2p1_200511-201512.nc"},{"id":"cct|3eb2568e64913-1","uri":"http://aims3.llnl.gov/thredds/dodsC/ cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC- CM_decadal2005_r1i2p1_201601-202512.nc"},{"id":"cct|3eb2568e64913-2","uri":"http://aims3.llnl.gov/thredds/dodsC/ cmip5_css02_data/cmip5/output1/CMCC/CMCC-CM/decadal2005/mon/atmos/Amon/r1i2p1/cct/1/cct_Amon_CMCC- CM_decadal2005_r1i2p1_202601-203512.nc"}]</wps:ComplexData></wps:Data></wps:Input></wps:DataInputs></wps:Execute>
  • 16. Remote Computing: Client Side ESGF developed a client-side Python library to facilitate the process of creating, sending and monitoring compute requests Several examples available as Jupyter notebooks Requires authentication: User first logs onto web portal with OpenID, password to obtain an authorization token Authorization token in used as part of client toolkit GitHub repository: https:// github.com/ESGF/esgf-compute- api
  • 17. Remote Computing: Web Client Also available: web user interface to formulate and send requests to the remote Compute engine Integrated with standard ESGF UI
  • 18. PART 3 USING ESGF AS A TESTBED FOR ANALYTICAL DATA PROCESSING
  • 19. Client-Driven Architecture A local client access all data via ESGF APIs (search, download, subset) All processing executed locally (laptop, in-premise, cloud) Still takes advantage of a distributed federated archive, standard data/metadata formats, programmatic access APIs Possibly no access control for read-only operations Can be executed right now ClientAnalytics ESGF Node DataESGF APIs ESGF Node DataESGF APIs
  • 20. Server-Side Computation In the near future (~12 months), ESGF will support server-side computation Analytics can be executed close to the data, but must conform to the WPS API Must use access control to authorize execution on remote servers Open questions: How to deploy custom algorithms at each Node ? Maybe use a limited number of “friendly” Nodes How to execute workflows that span multiple Nodes ? Client ESGF Node Data ESGF Compute Server-Side API Analytics ESGF Compute Client-Side API ESGF Node Data ESGF Compute Server-Side API Analytics
  • 21. Containerization ESGF is working on a new system architecture based on Docker containers A “container” is a lightweight package that includes a program executable + all its dependencies + just enough OS to run it In the future, all ESGF data services will be deployed and operated as Docker containers ESGF Database Node ESGF Front-End Node ESGF User Interface Node ESGF Index Node ESGF Identity Provider Node ESGF Data NodeESGF Data NodeESGF Data Node POSTGRES DATA Apache httpd CoG (mod_wsgi) esgf-search webapp (Tomcat) Dashboard UI (Tomcat) Postgres SOLR CATALOGS Stores local metadata catalogs and replicas, enables searching web accessible UI for human users Front-End network proxy User authentication, issues temporary credentials Data storage and access Dashboard Information Provider Dashboard stats API (Tomcat) THREDDS CATALOGS Solr (Jetty)Solr (Jetty)Solr (Jetty)Solr (Jetty) esgf-idp (Tomcat) SLCS + OAuth servers Threads Data Server (Tomcat) esgf access control filters Openid Relying Party (Tomcat) ESGF Publisher Client ESGF OAuth Client
  • 22. Containerization ESGF/Docker software stack 1.4 as deployed on an AWS cluster of 5 EC2 instances
  • 23. Containerization In the next few years, we can expect most ESGF sites to use Docker to operate at least some of their services - possibly the full ESGF stack Docker engine at each site can be used to execute custom analytics as self-contained Docker containers No complex software installation at each site Direct read-only access to local data - no data level access control Open questions: How do you orchestrate containers running at distributed sites ? How do you authorize containers to run at each site ? Client ESGF Node Data Docker Container Analytics Docker Engine ESGF Node Data Docker Container Analytics Docker Engine