Earth Observation in the Cloud

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Earth Observation in the Cloud
Jed Sundwall, AWS Open Data Global Lead
10 November 2015
Demo Day

Agenda
• Welcome! 🌍🌎🌏
• Open data on AWS
• NEXRAD on AWS
• Landsat on AWS
2

Sponsor Sponsor Sponsor
Venue Sponsor
Thank you to our sponsors!

Our goals for this event
• Show off amazing work being done by our customers
• Provide opportunities for you to network
• Highlight the diversity of work made possible by Earth
observation data
• Learn about your priorities and needs

New whitepaper
This Amazon Web Services, Inc. (AWS) package is provided for informational purposes only. The services discussed
in this package are standard commercial services. This package may include a set of suggested solutions for this
opportunity that are based on our limited information, and should not be construed as a binding offer from AWS. For
current prices for AWS services, please refer to the AWS website at www.aws.amazon.com.
This package includes Amazon Web Services, Inc. commercial, financial, or trade secret data that includes
confidential, and/or trade secret information.
AWS Whitepaper: Minimizing Variable
Costs for Shared Data
November 2015
Amazon Web Services, Inc.
410 Terry Avenue North
Seattle, WA 98109-5210
Cage Code: 66EB1
DUNS Number: 965048981
NAICS: 518210
We have just published a new AWS
Whitepaper on Minimizing Variable
Costs for Shared Data.
Download it at:
http://bit.ly/s3-requester-pays-open-data

Why does AWS care about open data?
Open data is data that can be used by anyone for any purpose for free.
Many of our customers rely on quality open data as much as they rely on
our computing, storage, and other web services.
10

Data on AWS
Amazon Web Services provides a comprehensive toolkit for gathering,
storing, analyzing, and working with data at any scale.
Amazon Elastic MapReduce
(Amazon EMR) provides the
Apache Hadoop analytics
framework as an easy-to-use
managed service.
Amazon S3 lets you store
and retrieve any amount of
data, at any time, from
anywhere on the web.
Amazon DynamoDB is a
fully-managed NoSQL
database service that makes
it cost-effective to store and
retrieve any amount of data.
11

1-click deployment to launch, on
multiple regions around the world
Pay-as-you-go pricing
Advanced AnalyticsData Integration Analysis & Visualization
http://bit.ly/awsAnalytics
12

The power of open data in the cloud
Making data open on AWS enables more innovation by making data
available for rapid access to our flexible and low-cost computing
resources.
Amazon S3
Bucket
Amazon
EMR
Amazon
EC2
AWS
Lambda
Amazon
Redshift
Amazon
DynamoDB
13

Making data open on AWS enables more innovation by making data
available for rapid access to our flexible and low-cost computing
resources.
Amazon S3
Bucket
Amazon
EMR
Amazon
EC2
AWS
Lambda
Amazon
Redshift
Amazon
DynamoDB
The power of open data in the cloud
14

AWS Partners Focused on Public Sector
15

History of Innovation
AWS has been continually expanding its services to support virtually any cloud
workload, now offering more than 40 services.
Amazon S3
Amazon SQS
Amazon EC2
Amazon SimpleDB
Amazon EBS
Amazon CloudFront
Elastic Load
Balancing
Auto Scaling
Amazon VPC
Amazon RDS
Amazon SNS
AWS Identity and Access
Management
Amazon Route 53
Amazon SES
AWS Elastic Beanstalk
AWS CloudFormation
Amazon ElastiCache
AWS Direct Connect
AWS GovCloud
AWS Storage
Gateway
Amazon DynamoDB
Amazon CloudSearch
Amazon SWF
Amazon Glacier
Amazon Redshift
AWS Data Pipeline
Amazon Elastic
Transcoder
AWS OpsWorks
AWS CloudHSM
Amazon AppStream
AWS CloudTrail
Amazon WorkSpaces
Amazon Kinesis
Amazon ECS
Amazon Lambda
AWS Config
AWS CodeDeploy
Amazon RDS for Aurora
AWS KMS
Amazon Cognito
Amazon WorkDocs
AWS Directory Service
Amazon Mobile Analytics
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Amazon EFS
Amazon WorkMail
Amazon Machine
Learning
16

AWS has announced price reductions 49* times since
our inception in 2006. Recent price drops included…
Amazon
ElastiCache
reduces
prices for
cache nodes
by an average
of 34%
March 26, 2014
34%
Amazon S3
reduces prices
for Standard
and Reduced
Redundancy
Storage, by an
average of 51%
March 26, 2014
51%
Amazon Route 53
lowers prices for
both standard
queries and
latency-based
routing queries
by 20%
July 31, 2014
20%
17
* As of June 2015

Open data as a platform
18
Data Enrichment
Sensemaking
Data Creation
Data at Rest
(Object storage)
Basic APIs
Complex APIs
Consumer
applications
Algorithmic
policy
Data-driven
journalism
Data Catalogs
Focused data
dashboards
Predictive
modeling
Visualizations
Lower cost of
knowledge

Data Enrichment
Sensemaking
Amazon
Kinesis
Amazon
EC2
Amazon
EC2
AWS Data
Pipeline
Amazon
S3
Amazon
RDS
Amazon
EMR
Amazon
Redshift
Amazon
DynamoDB
AWS
Lambda
Open data as a platform
19

An Amazonian approach to open data
Two ideas that inform how we approach public data sets:
• Work backwards from the customer
• Eliminate undifferentiated heavy lifting
20

Working Backwards
• Think of data sets as products
• Seek out valuable data by listening to customer needs
• Consider real-world use cases for the data
• Consider the size of the user community or market
opportunity
21

Undifferentiated heavy lifting
“…data must be organized, well-documented, consistently
formatted, and error free. Cleaning the data is often the
most taxing part of data science, and is frequently 80% of
the work.”
— Data Driven by DJ Patil and Hilary Mason
22

“…data must be organized, well-documented, consistently
formatted, and error free. Cleaning the data is often the
most taxing part of data science, and is frequently 80% of
the work.”
— Data Driven by DJ Patil and Hilary Mason
We ask: How can we get rid of that 80%?
23

Public datasets on AWS
To enable more innovation, AWS hosts a selection of datasets that anyone
can access for free. Data in our public datasets is available for rapid
access to our flexible and low-cost computing resources.
Earth Science
Landsat on AWS
Life Sciences
1000 Genomes Project
Internet Science
Common Crawl Corpus
24

NEXRAD on AWS
The Next Generation Weather Radar (NEXRAD) is
a network of 160 high-resolution Doppler radar
sites that detects precipitation and atmospheric
movement and disseminates data in 5 minute
intervals from each site.
It has traditionally been time consuming and
expensive to acquire, store, and analyze NEXRAD
data. Accessing the full historical archive has been
impossible.
26

NEXRAD on AWS
NEXRAD
Sites
Public
Amazon S3 Bucket
Amazon
EC2
Public
Amazon S3 Bucket
Real-time
data chunks
Volume scan
file assembly
Continuously
updated archive
With NEXRAD on AWS, we provide an archive of individual volume scan files and
real-time chunks as objects in Amazon S3.
This allows the data to be accessed programmatically via a RESTful interface and
quickly deployed to any of our products for analysis and processing.
27

NEXRAD on AWS
Our collaborators, including Unidata, The Weather Company, NOAA, Climate
Corporation, and CartoDB, have provided early use cases and tutorials on how to
use this data in the cloud.
A wide range of users are interested in using NEXRAD on AWS for longitudinal
analysis, to study and visualize specific weather events, and develop new
products.
More info at http://aws.amazon.com/public-data-sets/nexrad
28

Landsat on AWS
We have committed to make up to 1
petabyte of Landsat imagery readily
available as objects on Amazon S3.
All Landsat 8 scenes from 2015 are
available, along with a selection of
cloud-free scenes from 2013 and
2014. All new Landsat 8 scenes are
made available each day (~700 per
day), often within hours of
production.
30

The Traditional Approach
Data is most commonly accessed via a
web interface and downloaded on
premises before being loaded into a
web server.
All bands are downloaded in a .tar
archive, even if you only need a few
bands.
Data acquisition is time consuming and
inherently redundant. Analysis is limited
by user’s access to bandwidth, storage,
memory, and processing power.
31

Landsat on AWS
Landsat on AWS makes each band of
each scene readily available as objects
on Amazon S3. Data can be accessed
programmatically via HTTP and quickly
deployed to any of our products for
analysis and processing.
Users do not need to worry about local
storage and have access to virtually
unlimited computing power on demand.
32
Amazon
EC2
s3://landsat-pds
.tarUSGS
.tiff

We use GDAL to add “internal tiling”
on each Landsat on AWS tiff, which
allows developers to use HTTP range
gets to access specific portions of
each scene.
This allows people to only access the
data they need when they need it. Standard tiff
object
Internal tiled tiff
object
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33 34 35 36
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
31 32 33
34 35 36
33

RGB
Visible light
Infrared
Vegetation
Shortwave infrared
Urban areas
Wellington, New Zealand – Made on Snapsat.org
https://landsat-pds.s3.amazonaws.com/L8/072/089/

Landsat on AWS
In the first 150 days (19 Mar – 16 Aug 2015)
• Over 200,000 scenes available
• Over 500 million hits globally
Image shows frequency of scene requests by
path/row.
White: ~100 requests
Orange: >300k requests
36
Visualization by Drew Bollinger
Development Seed

Landsat on AWS as a platform
37

New SNS topic for Landsat on AWS
38
arn:aws:sns:us-west-2:274514004127:OpenObjectAddL8
You can now subscribe to a publicly available Amazon Simple Notification
Service (Amazon SNS) topic to be notified whenever a new batch of
Landsat scenes are available at s3://landsat-pds.

Thank you!
jed@amazon.com
Jed Sundwall

Earth Observation in the Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Earth Observation in the Cloud

Similar to Earth Observation in the Cloud (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

Earth Observation in the Cloud

Editor's Notes