More Related Content Similar to Transitioning Geoscience Research to the Cloud: Opportunities and Challenges (20) More from Amazon Web Services (20) Transitioning Geoscience Research to the Cloud: Opportunities and Challenges1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Stoner
Alaska Satellite Facility
Anthony Arendt
University of Washington
Session: 194329
Transitioning Geoscience Research
to the Cloud
Jed Sundwall
Amazon
2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why does AWS care about open data?
Many of our public sector
customers are required to make
their data available to the public.
Sharing data on AWS makes it accessible to a large and growing community of
researchers, entrepreneurs, and enterprises who use the AWS Cloud.
Many of our commercial sector
customers rely on access to open
data to develop their products.
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The AWS Open Data program
makes more data more
available to more people.
https://opendata.aws
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chris Stoner
Alaska Satellite Facility
Session: 194329
AWS and the Alaska Satellite Facility
Opportunities and Challenges
6. NASA Distributed Active Archive Center (DAAC)
• Ingest, archive, and distribute Synthetic Aperture Radar
(SAR) data
• On-prem footprint ~6 PB
• Spinning disk, available for immediate download
• No cost to the user
Alaska Satellite Facility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
7. NASA ESDIS Distributed Data System
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
8. Challenge We have a big mission on the horizon...
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
9. • NASA-ISRO SAR (NISAR) Mission
• 20+ GB per file
• 150 PB archive
• 50 Gbps incoming rate
• On-prem architecture won’t scale
• Cost and time to scale up
• Build and maintain enough storage
• Data movement of 100s of petabytes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
JPL/NISAR Homepage
10. Why AWS?
• Pay only for what you use
• Quick iterations
• Large sets of compute available cheaply
• 100s or 1000s of nodes
• Scaling in and out
• Ruled out on-prem
• Maintain extra capacity
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. Opportunity Going to the cloud is a journey…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
12. AWS Storage versus Web Object Store
• In the beginning, storage was the driver
• Shared storage for fewer data moves
• Less duplication of data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
13. AWS Storage *and* Web Object Store
• Best of both worlds
• Boto3 and Web Object Store (WOS)
• Buckets and Objects
• Code reuse
• Leverage Both
• Transparent to user
• Flexibility
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
14. Learn by Doing
• “Lift and Shift” project
• Cloud from scratch project
• Hybrid architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
15. Mindset Shift
• Traditional DAAC
• Ownership of large on-prem footprint
• Data Stewards
• Where do I offer value now?
• Cloud system design and operation
• Still data stewards
• Culture Changes
• Changing core competencies
• DevOps
Sean Gallup/Getty Images
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
16. Policy Changes
• Early architecture
• Highly controlled environment
• No access to AWS dashboard or
services directly
• What do we really need
• Cost visibility across all DAACs
• Organizations
• Access to FedRamp/OCIO approved services
• Blacklist/WhiteList Policies
• Anti-deficiency Act
• Egress mitigation system
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Photo Credit: Ryan Clements
17. Multi-temperature, Hybrid Storage
• Amazon Glacier is cheap, cold storage
• Distribution is slower
• Can be costly to serve
• Amazon S3 infrequent access
• Best of both worlds
• Not suitable for Hot data
• Amazon S3 or on-prem Edge
• Aggressive roll-off
• Hot data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
18. Predictive Analytics
• Determine during ingest
• Ever distributed?
• Where to store it?
• Amazon Machine Learning
• Model on active missions
• Previous user behavior
• > 90% accuracy for some
Products
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
19. End User Analytics
• Provide cloud tools
• Learn just enough
cloud technology
• Own AWS Account
• Process any NASA
data in cloud storage
• Access to entire archive
• Download time before processing
• Cloud opportunity
• Barrier to entry
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA/ESDIS
20. • Radiometric Terrain Correction (RTC)
• Often base product for SAR Research
• Download the file (~5 GB)
• Process locally 1 at a time
• Takes ~5 hours per product on a local machine
Example: Sentinel-1 RTC
• CloudFormation Template
• Fetch in-region
• Process in parallel
• Takes ~20 minutes per product on a
decent EC2 machine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
21. Cloud Tools
• AWS CloudFormation Templates
• Contains instructions for AWS to
create a processing pipeline
– Sentinel-1 RTC
Includes:
• Amazon Machine Image
• IAM roles/policies
• Input/Output buckets
• CloudWatch alarms
• Pipeline Master
• SNS topics
• SQS queue
• AutoScaling group
– Spot Market
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
22. Benefit to End User
• Time
• Don’t download!
• Process next to storage
• Scale
• 100s/1000s nodes
• Cheap using Spot
• Iteration
• What compute node
works best
• What processing flow
gives me what I want
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Chris Stoner, ASF
cstoner5@alaska.edu
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anthony Arendt
eScience Institute / Applied Physics Laboratory
University of Washington
Session: 194329
Transitioning Geoscience Research
to the Cloud
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Complex alpine hydrology, meteorology and
ecosystems
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA’s High Mountain Asia Team
• 3 year award from NASA Earth Sciences division
• 14 teams funded, 90 individual researchers, students,
technicians
• goal: to advance understanding of processes driving
changes in climate and the cryosphere in the High Mountain
Asia region
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Sharing Challenges
High volume (hundreds of terrabytes)
Multidimensional (lat, long, elevation, time, variables)
Multiple versions (different parameter combinations)
Different formats / lack of data standards
Different pre / post publication usage constraints
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Sharing using NASA Supercomputers
• accessible to NASA federal
employees
• conforms with NASA
security standards
Advantages
• long approval process for
non-NASA scientists
• limited customizability
• learning curve for users not
familiar with command line
Disadvantages
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Tools to Enhance Scientific
Collaboration
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access: Architecture
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Direct Connection to Data in a GIS
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Access from a Python Script
35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multiple Services Linked in a Single Application
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Storage and Access on AWS: Advantages
Centralized location for all datasets
On-the-fly reprojection
Data available in multiple formats
Spatial and temporal subsetting
Adherance to community data / metadata standards
37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Pre-Processing and Analysis
38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pangeo Data (https://pangeo-data.github.io)
The Pangeo Platform; source:
Abernathey et al (2017), “Pangeo: An
Open Source Big Data Climate Science
Platform “ NSF award 1740648.
39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Migrating Pangeo Architecture to AWS
JupyterHub: A multi-user server that manages
multiple instances of single-user Jupyter
notebooks
Automated deployment, scaling and management
of containerized applications
Amazon Elastic Container Service for Kubernetes
40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pangeo Deployed on AWS
41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Experimental Deployment (http://pangeo.pydata.org)
42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison of Computing Architectures
Future ModelCurrent model
43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shifting Scientific Culture Towards
Cloud Adoption
44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Trust and Communication Across Teams
Adaptive
leadership
Individuals and small
groups are
empowered to make
decisions.
Professional facilitators
guide scientists through
structures that maximize
engagement across all
levels.
In-person meetings
Using Wordpress on
AWS we post datasets,
team documents and
tutorials.
Team web resources
45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hackweeks at the eScience Institute
geohackweek.github.io oceanhackweek.github.io interactive tutorials
project “hacks” code sharing on
GitHub
reproducibility
46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
JupyterHub on AWS for Tutorials and Projects
47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Collecting Metrics on Hackweek Success
48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Research Scientist
University of Washington
Anthony Arendt Chris Stoner
Science Specialist
Alaska Satellite Facility
cstoner5@alaska.edu
@aaarendt
arendta@uw.edu
aaarendt
49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!