©2018, Element84, Inc. All rightsreserved.
Dan Pilone
CTO - Element 84, Inc.
Earth and Space on AWS
Processing and Streaming GOES-16 Data
with AWS Managed Services
©2018, Element84, Inc. All rightsreserved.
Organizations that leverage data
will devour ones that can’t.
©2018, Element84, Inc. All rightsreserved.
There are billions of dollars worth of funded
public data waiting to be used.
Data is a national asset.
©2018, Element84, Inc. All rightsreserved.
Earth Scientists are estimated
to spend about 60% of their
time preparing data for use.
They spend about 30% doing science.
©2018, Element84, Inc. All rightsreserved.
NASA is on track to put 100s of PBs
of newly captured data in the cloud -
available for use, for free.
©2018, Element84, Inc. All rightsreserved.
If you have the resources, you can figure out
how to harness big data.
What happens if you don't?
©2018, Element84, Inc. All rightsreserved.
We still hear from partners and clients that they:
Can’t find data
Don’t know what data exists
Can’t figure out how
to get to data
Can’t use big
data effectively
©2018, Element84, Inc. All rightsreserved.
Our Approach to Projects
Better understand problems
through data
Build solutions to affect
those problems
Measure how we did with
metrics and analytics
©2018, Element84, Inc. All rightsreserved.
Where did this leave us?
©2018, Element84, Inc. All rightsreserved.
We set out some guidelines
• Make the discovery easy and interactive even on low
bandwidth, low resolution, low processing power devices
• Be highly configurable - both in terms of data and processing
• Be as close to $0 as possible when not actively saving the world
• Be able to scale up to actually save the world
• Stop putting data in things.*
©2018, Element84, Inc. All rightsreserved.
We want this to your personal shopper for Data…
©2018, Element84, Inc. All rightsreserved.
Overall Data Flow
(e.g. PDS, NASA,
NOAA, USGS, etc.)
On Prem Data Provider
(OPeNDAP, W xS, etc.)
DISCOVERY
PROCESSING
Processing
Engine
Archive of Convenience
OVERVIEW UI
PANGEO
JUPYTER
NOTEBOOK
ETC.
DISCOVERY
UI
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
Screenshot of water vapor around Maria
Need imagery / graphic – I’ll
speak to content. Needs to
show imagery generation, ETS
for video generation, User
interaction with static web
client, Lambda invocation,
Batch processing of frame
conversion, S3 delivery,
Jupyter Notebook access
How All This Works
©2018, Element84, Inc. All rightsreserved.
How this works
First had to address
the discovery problem
©2018, Element84, Inc. All rightsreserved.
02:02-:0204
…
…
…
…
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
Client Details
• Static site hosted on S3 +
CloudFront
• Uses HLS video streams with M3U
2s chunks created by AWS Elastic
Transcoder Service
• Client-side JavaScript for
time to frame mapping
©2018, Element84, Inc. All rightsreserved.
Some numbers…
GOES-16 Full Disc Archive is roughly 20 TBs.
Our full archive videos in multiple resolutions are:
All videos are rendered by AWS Elastic Transcoder Service and prepped
for HLS distribution but can also do DASH.
5.3 GBs
1920p
1.8 GBs
1080p
12 GBs
3072p
540 MBs
640p
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
Two Buckets are created in region
Index.html overview file • Video Snippet (via ETS)
• Metadata File • Jupyter Notebook
PUBLIC ACCESS
Actual data Archive of Convenience (e.g. Zarr archive)
1
IN-REGION ACCESS ONLY
2
©2018, Element84, Inc. All rightsreserved.
Frame Processing Details
1. Triggers a Lambda function that distributes
GOES-16 netCDF files (Partition Key Space) into
input chunks.
2. Submits a Batch array job, launching a fleet of
Spot instances. Each Spot instance takes a
partition of .nc files, builds into Zarr datasets,
and pushes to a common S3 sink Zarr.
3. Clean up any scratch data.
4. Sends a notification email.
Given a start and end time:
©2018, Element84, Inc. All rightsreserved.
Why Zarr?
• Zarr is an open format for n-dimensional arrays of data along
with metadata
• Flexible storage system making it usable locally as well as
optimized for cloud access (chunking in any dimension)
• Fully parallelized read and write capability
• Flexible compression capabilities
• No access infrastructure necessary
• Compatible with Pangeo*
©2018, Element84, Inc. All rightsreserved.
Archive of Convenience Data Organization
• Under the root is a group of frames
containing groups of datasets
• Datasets represent everything the end user
wants to know about selected observation:
• General data
• Band specific data for selected bands
• Metadata stored in attributes
©2018, Element84, Inc. All rightsreserved.
Demo!
Demo?
©2018, Element84, Inc. All rightsreserved.
Lots of room for optimization
• Current bottlenecks are:
• Data movement within AWS
• Batch group spin up time
• Chunk size and compression
need tuning
• Local caching of hot netCDFs
• Smarter archive creation
30%
15%
55%
STAGING TIME %
netCDF Acces s Data Proc es sing Data movement toarchiv e
©2018, Element84, Inc. All rightsreserved.
This is just the beginning!
• Additional products and providing
data bundles
• Additional output formats
• Optimizing bundle build time
• Local caching
• Horizontal scaling
• Zarr tuning
• Time-lapse video generation
• Additional bands for video scrubbing
• Additional processing in the workflows
• GPU based video filters
• Python for frame compute
• ML models for image detection
• Overlays and annotations
• Subframe rendering
• Common projection for
heterogenous products
©2018, Element84, Inc. All rightsreserved.
Summary
• We leveraged AWS EC2/Spot/ECS and ETS to make ~20 TBs of AWS Public
Dataset GOES-16 imagery visually navigable at varying levels of bandwidth.
• We can apply this approach to lots and lots of data products
• We’ve leveraged AWS Batch (ECS & Spot) to parallelize creation of data
bundles into ephemeral Archives of Convenience
• Users get convenient, highly elastic access to data that suits their needs, in
their preferred format.
• All of this costs $0 when not in active use but scales horizontally as big as
budget allows.
©2018, Element84, Inc. All rightsreserved.
Want to hear from you!
• Data producers
• Do you have data that you want to make available?
• Data consumers
• What formats do you want the data available in?
• What information would you like to know?
• How do you want to find and subset the data?
• Scientists
• Help us not break the data! Algorithms, reviews, etc.
• What data should we be using for what?
©2018, Element84, Inc. All rightsreserved.
©2018, Element84, Inc. All rightsreserved.
Thank you!
Dan Pilone // dan@element84.com
@e84news
E84 GOES-16 Demo is available at:
https://labs.element84.com/goes16

AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data with AWS Managed Services

  • 1.
    ©2018, Element84, Inc.All rightsreserved. Dan Pilone CTO - Element 84, Inc. Earth and Space on AWS Processing and Streaming GOES-16 Data with AWS Managed Services
  • 2.
    ©2018, Element84, Inc.All rightsreserved. Organizations that leverage data will devour ones that can’t.
  • 3.
    ©2018, Element84, Inc.All rightsreserved. There are billions of dollars worth of funded public data waiting to be used. Data is a national asset.
  • 4.
    ©2018, Element84, Inc.All rightsreserved. Earth Scientists are estimated to spend about 60% of their time preparing data for use. They spend about 30% doing science.
  • 5.
    ©2018, Element84, Inc.All rightsreserved. NASA is on track to put 100s of PBs of newly captured data in the cloud - available for use, for free.
  • 6.
    ©2018, Element84, Inc.All rightsreserved. If you have the resources, you can figure out how to harness big data. What happens if you don't?
  • 7.
    ©2018, Element84, Inc.All rightsreserved. We still hear from partners and clients that they: Can’t find data Don’t know what data exists Can’t figure out how to get to data Can’t use big data effectively
  • 8.
    ©2018, Element84, Inc.All rightsreserved. Our Approach to Projects Better understand problems through data Build solutions to affect those problems Measure how we did with metrics and analytics
  • 9.
    ©2018, Element84, Inc.All rightsreserved. Where did this leave us?
  • 10.
    ©2018, Element84, Inc.All rightsreserved. We set out some guidelines • Make the discovery easy and interactive even on low bandwidth, low resolution, low processing power devices • Be highly configurable - both in terms of data and processing • Be as close to $0 as possible when not actively saving the world • Be able to scale up to actually save the world • Stop putting data in things.*
  • 11.
    ©2018, Element84, Inc.All rightsreserved. We want this to your personal shopper for Data…
  • 12.
    ©2018, Element84, Inc.All rightsreserved. Overall Data Flow (e.g. PDS, NASA, NOAA, USGS, etc.) On Prem Data Provider (OPeNDAP, W xS, etc.) DISCOVERY PROCESSING Processing Engine Archive of Convenience OVERVIEW UI PANGEO JUPYTER NOTEBOOK ETC. DISCOVERY UI
  • 13.
    ©2018, Element84, Inc.All rightsreserved.
  • 14.
    ©2018, Element84, Inc.All rightsreserved.
  • 15.
    ©2018, Element84, Inc.All rightsreserved.
  • 16.
    ©2018, Element84, Inc.All rightsreserved.
  • 17.
    ©2018, Element84, Inc.All rightsreserved. Screenshot of water vapor around Maria Need imagery / graphic – I’ll speak to content. Needs to show imagery generation, ETS for video generation, User interaction with static web client, Lambda invocation, Batch processing of frame conversion, S3 delivery, Jupyter Notebook access How All This Works
  • 18.
    ©2018, Element84, Inc.All rightsreserved. How this works First had to address the discovery problem
  • 19.
    ©2018, Element84, Inc.All rightsreserved. 02:02-:0204 … … … …
  • 20.
    ©2018, Element84, Inc.All rightsreserved.
  • 21.
    ©2018, Element84, Inc.All rightsreserved.
  • 22.
    ©2018, Element84, Inc.All rightsreserved.
  • 24.
    ©2018, Element84, Inc.All rightsreserved. Client Details • Static site hosted on S3 + CloudFront • Uses HLS video streams with M3U 2s chunks created by AWS Elastic Transcoder Service • Client-side JavaScript for time to frame mapping
  • 25.
    ©2018, Element84, Inc.All rightsreserved. Some numbers… GOES-16 Full Disc Archive is roughly 20 TBs. Our full archive videos in multiple resolutions are: All videos are rendered by AWS Elastic Transcoder Service and prepped for HLS distribution but can also do DASH. 5.3 GBs 1920p 1.8 GBs 1080p 12 GBs 3072p 540 MBs 640p
  • 26.
    ©2018, Element84, Inc.All rightsreserved.
  • 27.
    ©2018, Element84, Inc.All rightsreserved. Two Buckets are created in region Index.html overview file • Video Snippet (via ETS) • Metadata File • Jupyter Notebook PUBLIC ACCESS Actual data Archive of Convenience (e.g. Zarr archive) 1 IN-REGION ACCESS ONLY 2
  • 29.
    ©2018, Element84, Inc.All rightsreserved. Frame Processing Details 1. Triggers a Lambda function that distributes GOES-16 netCDF files (Partition Key Space) into input chunks. 2. Submits a Batch array job, launching a fleet of Spot instances. Each Spot instance takes a partition of .nc files, builds into Zarr datasets, and pushes to a common S3 sink Zarr. 3. Clean up any scratch data. 4. Sends a notification email. Given a start and end time:
  • 30.
    ©2018, Element84, Inc.All rightsreserved. Why Zarr? • Zarr is an open format for n-dimensional arrays of data along with metadata • Flexible storage system making it usable locally as well as optimized for cloud access (chunking in any dimension) • Fully parallelized read and write capability • Flexible compression capabilities • No access infrastructure necessary • Compatible with Pangeo*
  • 31.
    ©2018, Element84, Inc.All rightsreserved. Archive of Convenience Data Organization • Under the root is a group of frames containing groups of datasets • Datasets represent everything the end user wants to know about selected observation: • General data • Band specific data for selected bands • Metadata stored in attributes
  • 32.
    ©2018, Element84, Inc.All rightsreserved. Demo! Demo?
  • 33.
    ©2018, Element84, Inc.All rightsreserved. Lots of room for optimization • Current bottlenecks are: • Data movement within AWS • Batch group spin up time • Chunk size and compression need tuning • Local caching of hot netCDFs • Smarter archive creation 30% 15% 55% STAGING TIME % netCDF Acces s Data Proc es sing Data movement toarchiv e
  • 34.
    ©2018, Element84, Inc.All rightsreserved. This is just the beginning! • Additional products and providing data bundles • Additional output formats • Optimizing bundle build time • Local caching • Horizontal scaling • Zarr tuning • Time-lapse video generation • Additional bands for video scrubbing • Additional processing in the workflows • GPU based video filters • Python for frame compute • ML models for image detection • Overlays and annotations • Subframe rendering • Common projection for heterogenous products
  • 35.
    ©2018, Element84, Inc.All rightsreserved. Summary • We leveraged AWS EC2/Spot/ECS and ETS to make ~20 TBs of AWS Public Dataset GOES-16 imagery visually navigable at varying levels of bandwidth. • We can apply this approach to lots and lots of data products • We’ve leveraged AWS Batch (ECS & Spot) to parallelize creation of data bundles into ephemeral Archives of Convenience • Users get convenient, highly elastic access to data that suits their needs, in their preferred format. • All of this costs $0 when not in active use but scales horizontally as big as budget allows.
  • 36.
    ©2018, Element84, Inc.All rightsreserved. Want to hear from you! • Data producers • Do you have data that you want to make available? • Data consumers • What formats do you want the data available in? • What information would you like to know? • How do you want to find and subset the data? • Scientists • Help us not break the data! Algorithms, reviews, etc. • What data should we be using for what?
  • 37.
    ©2018, Element84, Inc.All rightsreserved.
  • 38.
    ©2018, Element84, Inc.All rightsreserved. Thank you! Dan Pilone // dan@element84.com @e84news E84 GOES-16 Demo is available at: https://labs.element84.com/goes16