SlideShare a Scribd company logo
1 of 49
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Stoner
Alaska Satellite Facility
Anthony Arendt
University of Washington
Session: 194329
Transitioning Geoscience Research
to the Cloud
Jed Sundwall
Amazon
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why does AWS care about open data?
Many of our public sector
customers are required to make
their data available to the public.
Sharing data on AWS makes it accessible to a large and growing community of
researchers, entrepreneurs, and enterprises who use the AWS Cloud.
Many of our commercial sector
customers rely on access to open
data to develop their products.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The AWS Open Data program
makes more data more
available to more people.
https://opendata.aws
Earth Observation Life Sciences & Genomics Machine Learning
https://registry.opendata.aws
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Stoner
Alaska Satellite Facility
Session: 194329
AWS and the Alaska Satellite Facility
Opportunities and Challenges
NASA Distributed Active Archive Center (DAAC)
• Ingest, archive, and distribute Synthetic Aperture Radar
(SAR) data
• On-prem footprint ~6 PB
• Spinning disk, available for immediate download
• No cost to the user
Alaska Satellite Facility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA ESDIS Distributed Data System
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge We have a big mission on the horizon...
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
• NASA-ISRO SAR (NISAR) Mission
• 20+ GB per file
• 150 PB archive
• 50 Gbps incoming rate
• On-prem architecture won’t scale
• Cost and time to scale up
• Build and maintain enough storage
• Data movement of 100s of petabytes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
JPL/NISAR Homepage
Why AWS?
• Pay only for what you use
• Quick iterations
• Large sets of compute available cheaply
• 100s or 1000s of nodes
• Scaling in and out
• Ruled out on-prem
• Maintain extra capacity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Opportunity Going to the cloud is a journey…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Storage versus Web Object Store
• In the beginning, storage was the driver
• Shared storage for fewer data moves
• Less duplication of data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Storage *and* Web Object Store
• Best of both worlds
• Boto3 and Web Object Store (WOS)
• Buckets and Objects
• Code reuse
• Leverage Both
• Transparent to user
• Flexibility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn by Doing
• “Lift and Shift” project
• Cloud from scratch project
• Hybrid architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mindset Shift
• Traditional DAAC
• Ownership of large on-prem footprint
• Data Stewards
• Where do I offer value now?
• Cloud system design and operation
• Still data stewards
• Culture Changes
• Changing core competencies
• DevOps
Sean Gallup/Getty Images
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Policy Changes
• Early architecture
• Highly controlled environment
• No access to AWS dashboard or
services directly
• What do we really need
• Cost visibility across all DAACs
• Organizations
• Access to FedRamp/OCIO approved services
• Blacklist/WhiteList Policies
• Anti-deficiency Act
• Egress mitigation system
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Photo Credit: Ryan Clements
Multi-temperature, Hybrid Storage
• Amazon Glacier is cheap, cold storage
• Distribution is slower
• Can be costly to serve
• Amazon S3 infrequent access
• Best of both worlds
• Not suitable for Hot data
• Amazon S3 or on-prem Edge
• Aggressive roll-off
• Hot data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Predictive Analytics
• Determine during ingest
• Ever distributed?
• Where to store it?
• Amazon Machine Learning
• Model on active missions
• Previous user behavior
• > 90% accuracy for some
Products
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
End User Analytics
• Provide cloud tools
• Learn just enough
cloud technology
• Own AWS Account
• Process any NASA
data in cloud storage
• Access to entire archive
• Download time before processing
• Cloud opportunity
• Barrier to entry
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA/ESDIS
• Radiometric Terrain Correction (RTC)
• Often base product for SAR Research
• Download the file (~5 GB)
• Process locally 1 at a time
• Takes ~5 hours per product on a local machine
Example: Sentinel-1 RTC
• CloudFormation Template
• Fetch in-region
• Process in parallel
• Takes ~20 minutes per product on a
decent EC2 machine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cloud Tools
• AWS CloudFormation Templates
• Contains instructions for AWS to
create a processing pipeline
– Sentinel-1 RTC
Includes:
• Amazon Machine Image
• IAM roles/policies
• Input/Output buckets
• CloudWatch alarms
• Pipeline Master
• SNS topics
• SQS queue
• AutoScaling group
– Spot Market
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefit to End User
• Time
• Don’t download!
• Process next to storage
• Scale
• 100s/1000s nodes
• Cheap using Spot
• Iteration
• What compute node
works best
• What processing flow
gives me what I want
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Chris Stoner, ASF
cstoner5@alaska.edu
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anthony Arendt
eScience Institute / Applied Physics Laboratory
University of Washington
Session: 194329
Transitioning Geoscience Research
to the Cloud
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Complex alpine hydrology, meteorology and
ecosystems
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA’s High Mountain Asia Team
• 3 year award from NASA Earth Sciences division
• 14 teams funded, 90 individual researchers, students,
technicians
• goal: to advance understanding of processes driving
changes in climate and the cryosphere in the High Mountain
Asia region
Glacier velocities Regional water balance Downstream
impacts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Sharing Challenges
High volume (hundreds of terrabytes)
Multidimensional (lat, long, elevation, time, variables)
Multiple versions (different parameter combinations)
Different formats / lack of data standards
Different pre / post publication usage constraints
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Sharing using NASA Supercomputers
• accessible to NASA federal
employees
• conforms with NASA
security standards
Advantages
• long approval process for
non-NASA scientists
• limited customizability
• learning curve for users not
familiar with command line
Disadvantages
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Tools to Enhance Scientific
Collaboration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access: Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Direct Connection to Data in a GIS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Access from a Python Script
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multiple Services Linked in a Single Application
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access on AWS: Advantages
Centralized location for all datasets
On-the-fly reprojection
Data available in multiple formats
Spatial and temporal subsetting
Adherance to community data / metadata standards
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Pre-Processing and Analysis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pangeo Data (https://pangeo-data.github.io)
The Pangeo Platform; source:
Abernathey et al (2017), “Pangeo: An
Open Source Big Data Climate Science
Platform “ NSF award 1740648.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating Pangeo Architecture to AWS
JupyterHub: A multi-user server that manages
multiple instances of single-user Jupyter
notebooks
Automated deployment, scaling and management
of containerized applications
Amazon Elastic Container Service for Kubernetes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pangeo Deployed on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Experimental Deployment (http://pangeo.pydata.org)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison of Computing Architectures
Future ModelCurrent model
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shifting Scientific Culture Towards
Cloud Adoption
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Trust and Communication Across Teams
Adaptive
leadership
Individuals and small
groups are
empowered to make
decisions.
Professional facilitators
guide scientists through
structures that maximize
engagement across all
levels.
In-person meetings
Using Wordpress on
AWS we post datasets,
team documents and
tutorials.
Team web resources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hackweeks at the eScience Institute
geohackweek.github.io oceanhackweek.github.io interactive tutorials
project “hacks” code sharing on
GitHub
reproducibility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
JupyterHub on AWS for Tutorials and Projects
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting Metrics on Hackweek Success
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Research Scientist
University of Washington
Anthony Arendt Chris Stoner
Science Specialist
Alaska Satellite Facility
cstoner5@alaska.edu
@aaarendt
arendta@uw.edu
aaarendt
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

What's hot (20)

Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
Amazon DynamoDB Deep Dive Advanced Design Patterns for DynamoDB (DAT401) - AW...
 
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
Effective Data Lakes: Challenges and Design Patterns (ANT316) - AWS re:Invent...
 
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
Modern Cloud Data Warehousing ft. Equinox Fitness Clubs: Optimize Analytics P...
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
Building Serverless Analytics Pipelines with AWS Glue (ANT308) - AWS re:Inven...
 
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
Building Data Lakes That Cost Less and Deliver Results Faster - AWS Online Te...
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
How One Growing U.S. County Protects Residents' Data on AWS
 How One Growing U.S. County Protects Residents' Data on AWS How One Growing U.S. County Protects Residents' Data on AWS
How One Growing U.S. County Protects Residents' Data on AWS
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
Build Data Engineering Platforms with Amazon EMR (ANT204) - AWS re:Invent 2018
 
Earth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIEarth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVI
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
Query in Place with AWS (STG315-R1) - AWS re:Invent 2018
 
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
How to Build HR Lakes on AWS to Unlock New Business Insights (DAT367) - AWS r...
 
Making Sense of Remote Sensing
Making Sense of Remote SensingMaking Sense of Remote Sensing
Making Sense of Remote Sensing
 
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
Search Your DynamoDB Data with Amazon Elasticsearch Service (ANT302) - AWS re...
 

Similar to Transitioning Geoscience Research to the Cloud: Opportunities and Challenges

Similar to Transitioning Geoscience Research to the Cloud: Opportunities and Challenges (20)

Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
Scalable and Secure Cloud-Based Data Archiving for Digital Libraries, Complia...
 
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
Move Data to AWS Faster for Migrations, DR, & Bidirectional Workflows (STG382...
 
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the CloudBackup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
Backup & Recovery - Optimize Your Backup and Restore Architectures in the Cloud
 
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
How Fannie Mae Processes over a Quarter Million Loans per Day with Amazon S3 ...
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
AWS Storage State of the Union
AWS Storage State of the UnionAWS Storage State of the Union
AWS Storage State of the Union
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
 
Working with Open Data on AWS
Working with Open Data on AWSWorking with Open Data on AWS
Working with Open Data on AWS
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of Genomics
 
AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive AWS Data Transfer Services Deep Dive
AWS Data Transfer Services Deep Dive
 
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
EUT302_Data Ingestion at Seismic Scale Best Practices for Processing Petabyte...
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Amazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin Jorissen
Amazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin JorissenAmazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin Jorissen
Amazon Cloud Resources as Part of Scientific Workflows & HPC - Kevin Jorissen
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
 
Building Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scaleBuilding Hybrid Cloud Storage Architectures with AWS @scale
Building Hybrid Cloud Storage Architectures with AWS @scale
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
 

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Transitioning Geoscience Research to the Cloud: Opportunities and Challenges

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chris Stoner Alaska Satellite Facility Anthony Arendt University of Washington Session: 194329 Transitioning Geoscience Research to the Cloud Jed Sundwall Amazon
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why does AWS care about open data? Many of our public sector customers are required to make their data available to the public. Sharing data on AWS makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises who use the AWS Cloud. Many of our commercial sector customers rely on access to open data to develop their products.
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The AWS Open Data program makes more data more available to more people. https://opendata.aws
  • 4. Earth Observation Life Sciences & Genomics Machine Learning https://registry.opendata.aws
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chris Stoner Alaska Satellite Facility Session: 194329 AWS and the Alaska Satellite Facility Opportunities and Challenges
  • 6. NASA Distributed Active Archive Center (DAAC) • Ingest, archive, and distribute Synthetic Aperture Radar (SAR) data • On-prem footprint ~6 PB • Spinning disk, available for immediate download • No cost to the user Alaska Satellite Facility © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. NASA ESDIS Distributed Data System © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. Challenge We have a big mission on the horizon... © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9. • NASA-ISRO SAR (NISAR) Mission • 20+ GB per file • 150 PB archive • 50 Gbps incoming rate • On-prem architecture won’t scale • Cost and time to scale up • Build and maintain enough storage • Data movement of 100s of petabytes © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. JPL/NISAR Homepage
  • 10. Why AWS? • Pay only for what you use • Quick iterations • Large sets of compute available cheaply • 100s or 1000s of nodes • Scaling in and out • Ruled out on-prem • Maintain extra capacity © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 11. Opportunity Going to the cloud is a journey… © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. AWS Storage versus Web Object Store • In the beginning, storage was the driver • Shared storage for fewer data moves • Less duplication of data © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. AWS Storage *and* Web Object Store • Best of both worlds • Boto3 and Web Object Store (WOS) • Buckets and Objects • Code reuse • Leverage Both • Transparent to user • Flexibility © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. Learn by Doing • “Lift and Shift” project • Cloud from scratch project • Hybrid architecture © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. Mindset Shift • Traditional DAAC • Ownership of large on-prem footprint • Data Stewards • Where do I offer value now? • Cloud system design and operation • Still data stewards • Culture Changes • Changing core competencies • DevOps Sean Gallup/Getty Images © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. Policy Changes • Early architecture • Highly controlled environment • No access to AWS dashboard or services directly • What do we really need • Cost visibility across all DAACs • Organizations • Access to FedRamp/OCIO approved services • Blacklist/WhiteList Policies • Anti-deficiency Act • Egress mitigation system © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Photo Credit: Ryan Clements
  • 17. Multi-temperature, Hybrid Storage • Amazon Glacier is cheap, cold storage • Distribution is slower • Can be costly to serve • Amazon S3 infrequent access • Best of both worlds • Not suitable for Hot data • Amazon S3 or on-prem Edge • Aggressive roll-off • Hot data © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. Predictive Analytics • Determine during ingest • Ever distributed? • Where to store it? • Amazon Machine Learning • Model on active missions • Previous user behavior • > 90% accuracy for some Products © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. End User Analytics • Provide cloud tools • Learn just enough cloud technology • Own AWS Account • Process any NASA data in cloud storage • Access to entire archive • Download time before processing • Cloud opportunity • Barrier to entry © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NASA/ESDIS
  • 20. • Radiometric Terrain Correction (RTC) • Often base product for SAR Research • Download the file (~5 GB) • Process locally 1 at a time • Takes ~5 hours per product on a local machine Example: Sentinel-1 RTC • CloudFormation Template • Fetch in-region • Process in parallel • Takes ~20 minutes per product on a decent EC2 machine © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. Cloud Tools • AWS CloudFormation Templates • Contains instructions for AWS to create a processing pipeline – Sentinel-1 RTC Includes: • Amazon Machine Image • IAM roles/policies • Input/Output buckets • CloudWatch alarms • Pipeline Master • SNS topics • SQS queue • AutoScaling group – Spot Market © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. Benefit to End User • Time • Don’t download! • Process next to storage • Scale • 100s/1000s nodes • Cheap using Spot • Iteration • What compute node works best • What processing flow gives me what I want © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! Chris Stoner, ASF cstoner5@alaska.edu
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anthony Arendt eScience Institute / Applied Physics Laboratory University of Washington Session: 194329 Transitioning Geoscience Research to the Cloud
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Complex alpine hydrology, meteorology and ecosystems
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NASA’s High Mountain Asia Team • 3 year award from NASA Earth Sciences division • 14 teams funded, 90 individual researchers, students, technicians • goal: to advance understanding of processes driving changes in climate and the cryosphere in the High Mountain Asia region
  • 27. Glacier velocities Regional water balance Downstream impacts
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Sharing Challenges High volume (hundreds of terrabytes) Multidimensional (lat, long, elevation, time, variables) Multiple versions (different parameter combinations) Different formats / lack of data standards Different pre / post publication usage constraints
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Sharing using NASA Supercomputers • accessible to NASA federal employees • conforms with NASA security standards Advantages • long approval process for non-NASA scientists • limited customizability • learning curve for users not familiar with command line Disadvantages
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Tools to Enhance Scientific Collaboration
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Storage and Access
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Storage and Access: Architecture
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Direct Connection to Data in a GIS
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Access from a Python Script
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multiple Services Linked in a Single Application
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Storage and Access on AWS: Advantages Centralized location for all datasets On-the-fly reprojection Data available in multiple formats Spatial and temporal subsetting Adherance to community data / metadata standards
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Pre-Processing and Analysis
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pangeo Data (https://pangeo-data.github.io) The Pangeo Platform; source: Abernathey et al (2017), “Pangeo: An Open Source Big Data Climate Science Platform “ NSF award 1740648.
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Migrating Pangeo Architecture to AWS JupyterHub: A multi-user server that manages multiple instances of single-user Jupyter notebooks Automated deployment, scaling and management of containerized applications Amazon Elastic Container Service for Kubernetes
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pangeo Deployed on AWS
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Experimental Deployment (http://pangeo.pydata.org)
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Comparison of Computing Architectures Future ModelCurrent model
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shifting Scientific Culture Towards Cloud Adoption
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Trust and Communication Across Teams Adaptive leadership Individuals and small groups are empowered to make decisions. Professional facilitators guide scientists through structures that maximize engagement across all levels. In-person meetings Using Wordpress on AWS we post datasets, team documents and tutorials. Team web resources
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hackweeks at the eScience Institute geohackweek.github.io oceanhackweek.github.io interactive tutorials project “hacks” code sharing on GitHub reproducibility
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. JupyterHub on AWS for Tutorials and Projects
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Collecting Metrics on Hackweek Success
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Research Scientist University of Washington Anthony Arendt Chris Stoner Science Specialist Alaska Satellite Facility cstoner5@alaska.edu @aaarendt arendta@uw.edu aaarendt
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you!