SlideShare a Scribd company logo
1 of 38
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Dan Pilone
CTO - Element 84, Inc.
Earth and Space on AWS
Processing and Streaming GOES-16
Data with AWS Managed Services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Organizations that leverage data
will devour ones that can’t.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There are billions of dollars worth of
funded public data waiting to be used.
Data is a national asset.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Earth Scientists are estimated
to spend about 60% of their
time preparing data for use.
They spend about 30% doing science.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
NASA is on track to put 100s of
PBs of newly captured data in the
cloud - available for use, for free.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
If you have the resources, you can figure
out how to harness big data.
What happens if you don't?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We still hear from partners and clients that they:
Can’t find data
Don’t know what data exists
Can’t figure out how
to get to data
Can’t use big
data effectively
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Our Approach to Projects
Better understand
problems through data
Build solutions to
affect those problems
Measure how we did with
metrics and analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where did this leave us?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We set out some guidelines
• Make the discovery easy and interactive even on low bandwidth,
low resolution, low processing power devices
• Be highly configurable - both in terms of data and processing
• Be as close to $0 as possible when not actively saving the world
• Be able to scale up to actually save the world
• Stop putting data in things.*
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
We want this to your personal shopper for Data…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Overall Data Flow
(e.g. PDS, NASA,
NOAA, USGS, etc.)
On Prem Data Provider
(OPeNDAP, WxS, etc.)
DISCOVERY
PROCESSING
Processing
Engine
Archive of Convenience
OVERVIEW UI
PANGEO
JUPYTER
NOTEBOOK
ETC.
DISCOVERY
UI
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Screenshot of water vapor around Maria
How All This Works
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How this works
First had to address
the discovery problem
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
02:02-:0204
…
…
…
…
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Client Details
• Static site hosted on S3 +
CloudFront
• Uses HLS video streams with
M3U 2s chunks created by
AWS Elastic Transcoder
Service
• Client-side JavaScript for
time to frame mapping
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Some numbers…
GOES-16 Full Disc Archive is roughly 20 TBs.
Our full archive videos in multiple resolutions are:
All videos are rendered by AWS Elastic Transcoder Service and prepped
for HLS distribution but can also do DASH.
5.3 GBs
1920p
1.8 GBs
1080p
12 GBs
3072p
540 MBs
640p
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Two Buckets are created in region
Index.html overview file • Video Snippet (via ETS)
• Metadata File • Jupyter Notebook
PUBLIC
ACCESS
Actual data Archive of Convenience (e.g. Zarr
archive)
1
IN-REGION ACCESS ONLY
2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Frame Processing Details
1. Triggers a Lambda function that distributes
GOES-16 netCDF files (Partition Key
Space) into input chunks.
2. Submits a Batch array job, launching a fleet
of Spot instances. Each Spot instance takes
a partition of .nc files, builds into Zarr
datasets, and pushes to a common S3 sink
Zarr.
3. Clean up any scratch data.
4. Sends a notification email.
Given a start and end time:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why Zarr?
• Zarr is an open format for n-dimensional arrays of data
along with metadata
• Flexible storage system making it usable locally as well as
optimized for cloud access (chunking in any dimension)
• Fully parallelized read and write capability
• Flexible compression capabilities
• No access infrastructure necessary
• Compatible with Pangeo*
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Archive of Convenience Data Organization
• Under the root is a group of frames
containing groups of datasets
• Datasets represent everything the end
user wants to know about selected
observation:
• General data
• Band specific data for selected
bands
• Metadata stored in attributes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Demo!
Demo?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Lots of room for optimization
• Current bottlenecks are:
• Data movement within AWS
• Batch group spin up time
• Chunk size and compression
need tuning
• Local caching of hot netCDFs
• Smarter archive creation
30%
15%
55%
STAGING TIME %
netCDF Access Data Processing Data movement to archive
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
This is just the beginning!
• Additional products and
providing data bundles
• Additional output formats
• Optimizing bundle build time
• Local caching
• Horizontal scaling
• Zarr tuning
• Time-lapse video generation
• Additional bands for video scrubbing
• Additional processing in the
workflows
• GPU based video filters
• Python for frame compute
• ML models for image detection
• Overlays and annotations
• Subframe rendering
• Common projection for
heterogenous products
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
• We leveraged AWS EC2/Spot/ECS and ETS to make ~20 TBs of
AWS Public Dataset GOES-16 imagery visually navigable at varying
levels of bandwidth.
• We can apply this approach to lots and lots of data products
• We’ve leveraged AWS Batch (ECS & Spot) to parallelize creation of
data bundles into ephemeral Archives of Convenience
• Users get convenient, highly elastic access to data that suits their
needs, in their preferred format.
• All of this costs $0 when not in active use but scales horizontally as
big as budget allows.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Want to hear from you!
• Data producers
• Do you have data that you want to make available?
• Data consumers
• What formats do you want the data available in?
• What information would you like to know?
• How do you want to find and subset the data?
• Scientists
• Help us not break the data! Algorithms, reviews, etc.
• What data should we be using for what?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
Dan Pilone // dan@element84.com
@e84news
E84 GOES-16 Demo is available at:
https://labs.element84.com/goes16

More Related Content

What's hot

How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsAmazon Web Services
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017Amazon Web Services
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018Amazon Web Services
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Amazon Web Services
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsAmazon Web Services
 
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Amazon Web Services
 
How One Growing U.S. County Protects Residents' Data on AWS
 How One Growing U.S. County Protects Residents' Data on AWS How One Growing U.S. County Protects Residents' Data on AWS
How One Growing U.S. County Protects Residents' Data on AWSAmazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech TalksAmazon Web Services
 

What's hot (20)

How Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS AnalyticsHow Amazon.com Uses AWS Analytics
How Amazon.com Uses AWS Analytics
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
What's New with Amazon Redshift ft. Dow Jones (ANT350-R) - AWS re:Invent 2018
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
Integrating Amazon Elasticsearch with your DevOps Tooling - AWS Online Tech T...
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
 
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
Build on Amazon Aurora with MySQL Compatibility (DAT348-R4) - AWS re:Invent 2018
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
Leadership Session: AWS Database and Analytics (DAT206-L) - AWS re:Invent 2018
 
How One Growing U.S. County Protects Residents' Data on AWS
 How One Growing U.S. County Protects Residents' Data on AWS How One Growing U.S. County Protects Residents' Data on AWS
How One Growing U.S. County Protects Residents' Data on AWS
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks
10 Hacks for Optimizing MySQL in the Cloud - AWS Online Tech Talks
 
AWS & Database Analytics
AWS & Database AnalyticsAWS & Database Analytics
AWS & Database Analytics
 

Similar to How Element 84 Raises the Bar on Streaming Satellite Data

Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfCome scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfAmazon Web Services
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Amazon Web Services
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...Dan Pilone
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...Amazon Web Services
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of usersAmazon Web Services
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.javier ramirez
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Amazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAmazon Web Services
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_SingaporeAmazon Web Services
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfAmazon Web Services
 

Similar to How Element 84 Raises the Bar on Streaming Satellite Data (20)

Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdfCome scalare da zero ai tuoi primi 10 milioni di utenti.pdf
Come scalare da zero ai tuoi primi 10 milioni di utenti.pdf
 
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
Tape Is a Four Letter Word: Back Up to the Cloud in Under an Hour (STG201) - ...
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
 
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
M&E Leadership Session: The State of the Industry, What's New from AWS for M&...
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.Building Data Lakes and Analytics on AWS. IPExpo Manchester.
Building Data Lakes and Analytics on AWS. IPExpo Manchester.
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How Element 84 Raises the Bar on Streaming Satellite Data

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dan Pilone CTO - Element 84, Inc. Earth and Space on AWS Processing and Streaming GOES-16 Data with AWS Managed Services
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Organizations that leverage data will devour ones that can’t.
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There are billions of dollars worth of funded public data waiting to be used. Data is a national asset.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Earth Scientists are estimated to spend about 60% of their time preparing data for use. They spend about 30% doing science.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. NASA is on track to put 100s of PBs of newly captured data in the cloud - available for use, for free.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. If you have the resources, you can figure out how to harness big data. What happens if you don't?
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. We still hear from partners and clients that they: Can’t find data Don’t know what data exists Can’t figure out how to get to data Can’t use big data effectively
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Our Approach to Projects Better understand problems through data Build solutions to affect those problems Measure how we did with metrics and analytics
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where did this leave us?
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. We set out some guidelines • Make the discovery easy and interactive even on low bandwidth, low resolution, low processing power devices • Be highly configurable - both in terms of data and processing • Be as close to $0 as possible when not actively saving the world • Be able to scale up to actually save the world • Stop putting data in things.*
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. We want this to your personal shopper for Data…
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Overall Data Flow (e.g. PDS, NASA, NOAA, USGS, etc.) On Prem Data Provider (OPeNDAP, WxS, etc.) DISCOVERY PROCESSING Processing Engine Archive of Convenience OVERVIEW UI PANGEO JUPYTER NOTEBOOK ETC. DISCOVERY UI
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Screenshot of water vapor around Maria How All This Works
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How this works First had to address the discovery problem
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 02:02-:0204 … … … …
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Client Details • Static site hosted on S3 + CloudFront • Uses HLS video streams with M3U 2s chunks created by AWS Elastic Transcoder Service • Client-side JavaScript for time to frame mapping
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Some numbers… GOES-16 Full Disc Archive is roughly 20 TBs. Our full archive videos in multiple resolutions are: All videos are rendered by AWS Elastic Transcoder Service and prepped for HLS distribution but can also do DASH. 5.3 GBs 1920p 1.8 GBs 1080p 12 GBs 3072p 540 MBs 640p
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Two Buckets are created in region Index.html overview file • Video Snippet (via ETS) • Metadata File • Jupyter Notebook PUBLIC ACCESS Actual data Archive of Convenience (e.g. Zarr archive) 1 IN-REGION ACCESS ONLY 2
  • 28.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Frame Processing Details 1. Triggers a Lambda function that distributes GOES-16 netCDF files (Partition Key Space) into input chunks. 2. Submits a Batch array job, launching a fleet of Spot instances. Each Spot instance takes a partition of .nc files, builds into Zarr datasets, and pushes to a common S3 sink Zarr. 3. Clean up any scratch data. 4. Sends a notification email. Given a start and end time:
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why Zarr? • Zarr is an open format for n-dimensional arrays of data along with metadata • Flexible storage system making it usable locally as well as optimized for cloud access (chunking in any dimension) • Fully parallelized read and write capability • Flexible compression capabilities • No access infrastructure necessary • Compatible with Pangeo*
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Archive of Convenience Data Organization • Under the root is a group of frames containing groups of datasets • Datasets represent everything the end user wants to know about selected observation: • General data • Band specific data for selected bands • Metadata stored in attributes
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Demo! Demo?
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Lots of room for optimization • Current bottlenecks are: • Data movement within AWS • Batch group spin up time • Chunk size and compression need tuning • Local caching of hot netCDFs • Smarter archive creation 30% 15% 55% STAGING TIME % netCDF Access Data Processing Data movement to archive
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. This is just the beginning! • Additional products and providing data bundles • Additional output formats • Optimizing bundle build time • Local caching • Horizontal scaling • Zarr tuning • Time-lapse video generation • Additional bands for video scrubbing • Additional processing in the workflows • GPU based video filters • Python for frame compute • ML models for image detection • Overlays and annotations • Subframe rendering • Common projection for heterogenous products
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary • We leveraged AWS EC2/Spot/ECS and ETS to make ~20 TBs of AWS Public Dataset GOES-16 imagery visually navigable at varying levels of bandwidth. • We can apply this approach to lots and lots of data products • We’ve leveraged AWS Batch (ECS & Spot) to parallelize creation of data bundles into ephemeral Archives of Convenience • Users get convenient, highly elastic access to data that suits their needs, in their preferred format. • All of this costs $0 when not in active use but scales horizontally as big as budget allows.
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Want to hear from you! • Data producers • Do you have data that you want to make available? • Data consumers • What formats do you want the data available in? • What information would you like to know? • How do you want to find and subset the data? • Scientists • Help us not break the data! Algorithms, reviews, etc. • What data should we be using for what?
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank you! Dan Pilone // dan@element84.com @e84news E84 GOES-16 Demo is available at: https://labs.element84.com/goes16