SlideShare a Scribd company logo
1 of 35
P U B L I C S E C T O R
S U M M I T
Washington, DC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Community Tools for Analysis
of Earth Science Data in the
Cloud
Kevin Jorissen
Job Title
Company/Org Name
S e s s i o n 3 0 1 5 8 2
Rich Signell
Research Oceanographer
U.S. Geological Survey
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo: A community platform
for Big Data geoscience
Rich Signell
Research Oceanographer
U.S. Geological Survey
S e s s i o n 3 0 1 5 8 2
Ryan Abernathy (Columbia)
Joe Hamman (NCAR)
Matthew Rocklin (Anaconda->NVIDIA)
Jacob Tomlinson (UK Met Office)
Scott Henderson (UW)
and the rest of the Pangeo Community!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
U.S. Geological Survey Sediment Transport
Modeling ~200TB of coastal
ocean model output
data in 4D (T, Z, Y, X)
NetCDF files
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data is stored in CF-Compliant NetCDF Files
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Big Data in the Geosciences
1 GB
500
GB
40 TB
2 PB
150
PB
20182012200620001994
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Traditional Model Data
Analysis
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Model Data Analysis of the Future
(available now!)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is a Community
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is a core software
stack
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo HPC Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo Cloud
Architecture
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Matthew Rocklin’s
blog post on HDF
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The Zarr Format
• Developed by Genomics community to address problems with
NetCDF/HDF on cloud storage
• Simple format, clear specification
• Each chunk is stored as a separate binary object
• Lightweight global and variable metadata stored as JSON
• Groups, filters, compression using Blosc
• Free, open-source software
• Read/write in Python using Xarray
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The Zarr Format
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The Zarr Format
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Zarr is community-driven
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
NOAA’s Big Data Project One month of forcing and output is 15TB
NWM is part of the Big Data Project, with
data being pushed to the Cloud:
Forecast data:
s3:noaa-nwm-pds
25 year reanalysis:
s3:nwm-archive
$25K research credits from Amazon to
explore using Pangeo for National Water
Model data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
FUSE-mounted NetCDF/HDF is slow
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Cloud-friendly Zarr is fast
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserve
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is not just for model
data...
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is not just for geoscience data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is not just for big data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo on AWS
• Kubernetes cluster deployed with Amazon Elastic Container
Service for Kubernetes (Amazon EKS)
• Three classes of k8s node pools
• Core pool: JupyterHub, web proxy (small)
• Jupyter pool: autoscaling pool for single-user sessions
• Dask pool: autoscaling pool for Dask workers on premptible (e.g., spot) instances
• Pangeo installed with Helm chart
• Custom environments built with repo2docker at
https://github.com/pangeo-data/pangeo-stacks
• Full deploy instructions at pangeo.io
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Overcoming Barriers to Adoption
• Concerns about cost: Changing institutional computing models,
research credits, waving egress charges for research
• New skills required: AWS workshops, hackathons, institutional road
shows
• Data formats and data standardization: benchmarking, blogging
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pangeo is a
Movement!
• Visit us: pangeo.io
• Follow us: medium.com/pangeo
• Chat with us: gitter.im/pangeo-data
• Join and help us: github.com/pangeo-data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Rich Signell
rsignell@usgs.gov
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T

More Related Content

Similar to Community Tools for Analysis of Earth Science Data in the Cloud

Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...
Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...
Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...Amazon Web Services
 
From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...
 From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo... From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...
From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...Amazon Web Services
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationAmazon Web Services
 
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground TruthBuild Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground TruthAmazon Web Services
 
A Tale of Two IT Modernization Strategies
A Tale of Two IT Modernization StrategiesA Tale of Two IT Modernization Strategies
A Tale of Two IT Modernization StrategiesAmazon Web Services
 
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...Amazon Web Services
 
The Scout24 Data Platform - a technical deep dive
The Scout24 Data Platform - a technical deep diveThe Scout24 Data Platform - a technical deep dive
The Scout24 Data Platform - a technical deep diveseangustafson
 
2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOpsCobus Bernard
 
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...Amazon Web Services
 
20190819 AWS におけるモニタリング 議論のための観点総ざらえ
20190819 AWS におけるモニタリング 議論のための観点総ざらえ20190819 AWS におけるモニタリング 議論のための観点総ざらえ
20190819 AWS におけるモニタリング 議論のための観点総ざらえAmazon Web Services Japan
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudChris Shenton
 
Optimize deep learning training and inferencing using GPU and Amazon SageMake...
Optimize deep learning training and inferencing using GPU and Amazon SageMake...Optimize deep learning training and inferencing using GPU and Amazon SageMake...
Optimize deep learning training and inferencing using GPU and Amazon SageMake...Amazon Web Services
 
Drive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningDrive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningAWS Summits
 
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...Amazon Web Services
 
The Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLThe Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLAmazon Web Services
 
DevOps: The Amazon Way
DevOps: The Amazon WayDevOps: The Amazon Way
DevOps: The Amazon WayAWS Summits
 
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...Amazon Web Services
 
Running Geospatial Workloads on AWS - AWS Summit Sydney
Running Geospatial Workloads on AWS - AWS Summit SydneyRunning Geospatial Workloads on AWS - AWS Summit Sydney
Running Geospatial Workloads on AWS - AWS Summit SydneyAmazon Web Services
 

Similar to Community Tools for Analysis of Earth Science Data in the Cloud (20)

Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...
Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...
Iowa Department of Public Health: Bringing a Data Platform Back to Life Throu...
 
From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...
 From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo... From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...
From Unattended Ground Sensors (UGS) to Installations; Leveraging AWS IoT fo...
 
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake FormationSecure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
Secure, Build and Deduplicate Your Data Lake Data with Amazon Lake Formation
 
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground TruthBuild Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
Build Highly Accurate Training Datasets Using Amazon SageMaker Ground Truth
 
A Tale of Two IT Modernization Strategies
A Tale of Two IT Modernization StrategiesA Tale of Two IT Modernization Strategies
A Tale of Two IT Modernization Strategies
 
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...
Best Practices for Innovation in Public Sector: A Fireside Chat with Innovati...
 
Moving to DevOps the Amazon Way
Moving to DevOps the Amazon WayMoving to DevOps the Amazon Way
Moving to DevOps the Amazon Way
 
The Scout24 Data Platform - a technical deep dive
The Scout24 Data Platform - a technical deep diveThe Scout24 Data Platform - a technical deep dive
The Scout24 Data Platform - a technical deep dive
 
2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps
 
Machine Learning at the Edge
Machine Learning at the EdgeMachine Learning at the Edge
Machine Learning at the Edge
 
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...
Lessons from WuXi NextCODE Scales Up To Accelerate Data Sequencing in Their D...
 
20190819 AWS におけるモニタリング 議論のための観点総ざらえ
20190819 AWS におけるモニタリング 議論のための観点総ざらえ20190819 AWS におけるモニタリング 議論のための観点総ざらえ
20190819 AWS におけるモニタリング 議論のための観点総ざらえ
 
Creating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloudCreating Serverless apps for NASA in GovCloud
Creating Serverless apps for NASA in GovCloud
 
Optimize deep learning training and inferencing using GPU and Amazon SageMake...
Optimize deep learning training and inferencing using GPU and Amazon SageMake...Optimize deep learning training and inferencing using GPU and Amazon SageMake...
Optimize deep learning training and inferencing using GPU and Amazon SageMake...
 
Drive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine LearningDrive Digital Transformation Using Machine Learning
Drive Digital Transformation Using Machine Learning
 
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...
Staff Retainment Beyond Salary: Steps to Skill, Empower, and be an Employer o...
 
The Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/MLThe Future of Fraud Detection and Risk Modeling with AI/ML
The Future of Fraud Detection and Risk Modeling with AI/ML
 
DevOps: The Amazon Way
DevOps: The Amazon WayDevOps: The Amazon Way
DevOps: The Amazon Way
 
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...
Innovate - The Next Lap in Education: Accelerating Your Journey Through Innov...
 
Running Geospatial Workloads on AWS - AWS Summit Sydney
Running Geospatial Workloads on AWS - AWS Summit SydneyRunning Geospatial Workloads on AWS - AWS Summit Sydney
Running Geospatial Workloads on AWS - AWS Summit Sydney
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Community Tools for Analysis of Earth Science Data in the Cloud

  • 1. P U B L I C S E C T O R S U M M I T Washington, DC
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Community Tools for Analysis of Earth Science Data in the Cloud Kevin Jorissen Job Title Company/Org Name S e s s i o n 3 0 1 5 8 2 Rich Signell Research Oceanographer U.S. Geological Survey
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo: A community platform for Big Data geoscience Rich Signell Research Oceanographer U.S. Geological Survey S e s s i o n 3 0 1 5 8 2 Ryan Abernathy (Columbia) Joe Hamman (NCAR) Matthew Rocklin (Anaconda->NVIDIA) Jacob Tomlinson (UK Met Office) Scott Henderson (UW) and the rest of the Pangeo Community!
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T U.S. Geological Survey Sediment Transport Modeling ~200TB of coastal ocean model output data in 4D (T, Z, Y, X) NetCDF files
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data is stored in CF-Compliant NetCDF Files © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Big Data in the Geosciences 1 GB 500 GB 40 TB 2 PB 150 PB 20182012200620001994 © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Traditional Model Data Analysis © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Model Data Analysis of the Future (available now!) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is a Community © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is a core software stack © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo HPC Architecture © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo Cloud Architecture © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Matthew Rocklin’s blog post on HDF
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The Zarr Format • Developed by Genomics community to address problems with NetCDF/HDF on cloud storage • Simple format, clear specification • Each chunk is stored as a separate binary object • Lightweight global and variable metadata stored as JSON • Groups, filters, compression using Blosc • Free, open-source software • Read/write in Python using Xarray
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The Zarr Format © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The Zarr Format © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Zarr is community-driven
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T NOAA’s Big Data Project One month of forcing and output is 15TB NWM is part of the Big Data Project, with data being pushed to the Cloud: Forecast data: s3:noaa-nwm-pds 25 year reanalysis: s3:nwm-archive $25K research credits from Amazon to explore using Pangeo for National Water Model data © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T FUSE-mounted NetCDF/HDF is slow © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Cloud-friendly Zarr is fast © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserve
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is not just for model data... © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is not just for geoscience data © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is not just for big data © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo on AWS • Kubernetes cluster deployed with Amazon Elastic Container Service for Kubernetes (Amazon EKS) • Three classes of k8s node pools • Core pool: JupyterHub, web proxy (small) • Jupyter pool: autoscaling pool for single-user sessions • Dask pool: autoscaling pool for Dask workers on premptible (e.g., spot) instances • Pangeo installed with Helm chart • Custom environments built with repo2docker at https://github.com/pangeo-data/pangeo-stacks • Full deploy instructions at pangeo.io © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Overcoming Barriers to Adoption • Concerns about cost: Changing institutional computing models, research credits, waving egress charges for research • New skills required: AWS workshops, hackathons, institutional road shows • Data formats and data standardization: benchmarking, blogging © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Pangeo is a Movement! • Visit us: pangeo.io • Follow us: medium.com/pangeo • Chat with us: gitter.im/pangeo-data • Join and help us: github.com/pangeo-data © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 34. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Rich Signell rsignell@usgs.gov
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T