SlideShare a Scribd company logo
1 of 29
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Jed Sundwall
Global Open Data Lead, Amazon Web Services
Dave Rocamora
Solutions Architect, Amazon Web Services
194328
AWS Public Datasets: Learnings from
Staging Petabytes of Data for Analysis
in AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why does AWS care about Open Data?
Many AWS customers supply data
to the public to accelerate research
and product development.
Sharing data on AWS makes it accessible to a large and growing community of
researchers, entrepreneurs, and enterprises who use the AWS Cloud.
Many AWS customers use data
shared on AWS to create new
products and services.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The AWS Open Data program expands
access to data by staging it for
analysis in the cloud.
https://opendata.aws
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Public Datasets
https://registry.opendata.aws
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Earth Observation Life Sciences &
Genomics
Machine Learning
https://registry.opendata.aws
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditional Data Acquisition
“…data must be organized, well-documented, consistently formatted,
and error free. Cleaning the data is often the most taxing part of data
science, and is frequently 80% of the work.”
— Data Driven by DJ Patil and Hilary Mason
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Undifferentiated Heavy Lifting
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data Acquisition in the Cloud
Amazon S3
Amazon EC2
Amazon Athena Amazon EMR
Amazon Redshift
AWS Glue AWS Data Pipeline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advantages of sharing data in the cloud
Global community of users
Faster pace of research Lower cost of research
New services and tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What makes a dataset
successful?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What makes a dataset
successful?
It is treated like a product.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What makes a dataset
successful?
It is treated like a product.
It is optimized for the cloud.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Highly processedRaw
Userbase
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributing Data
Not accessed
Necessary
Unnecessary (wasted)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributing Data - Traditional
Not accessed
Necessary
Unnecessary (wasted)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributing Data – Prepared data on S3
Not accessed
Necessary
Unnecessary (wasted)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Staging data for analysis
Amazon S3 allows programmatic
and precise access to data at
planetary scale.
Landsat on AWS uses Cloud
Optimized GeoTIFFs that allow
users to get only the data they
need when they need it.
cogeo.org
AWS Lambda.tarUSGS
Cloud Optimized GeoTIFFs
s3://landsat-pds
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Graph by Drew Bollinger (@drewbo19) at Development Seed
Landsat on AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Patterns
S3 Key Index External Index Internal Index
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: GOES-16 Key Naming
s3://noaa-goes16/ABI-L1b-RadF/2018/149/14/
OR_
ABI-L1b-RadF-M3C14_
G16_
s20181491430465_
e20181491441232_
c20181491441300.nc
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: IRS 990 CSV as External Index
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What makes a dataset
successful?
It is treated like a product.
It is optimized for the cloud.
There is a community around it.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Raw Accessible
Documented
Trustworthy
Userbase
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Landsat on AWS community
http://cogeo.org
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
landsat-tiler from Mapbox
https://viewer.remotepixel.ca/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What makes a dataset
successful?
It is treated like a product.
It is optimized for the cloud.
There is a community around it.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session survey in
the summit mobile app.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank You!
jed@amazon.de
rocamora@amazon.com

More Related Content

What's hot

Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightAmazon Web Services
 
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...Amazon Web Services
 
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Amazon Web Services
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczAmazon Web Services
 
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1Amazon Web Services
 
Machine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighterMachine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighterAmazon Web Services
 
Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Amazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SFAmazon Web Services
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAmazon Web Services
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Amazon Web Services
 
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...Amazon Web Services
 
From Data To Insights
From Data To Insights From Data To Insights
From Data To Insights Orit Alul
 
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...Rolf Koski
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...Rolf Koski
 
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Amazon Web Services
 
Webinar: Archiving Salesforce Data into AWS RDS
Webinar: Archiving Salesforce Data into AWS RDSWebinar: Archiving Salesforce Data into AWS RDS
Webinar: Archiving Salesforce Data into AWS RDSDataArchiva
 
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...Accumulo Summit
 

What's hot (20)

Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
Driving Machine Learning and Analytics Use Cases with AWS Storage (STG302) - ...
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSightBusiness Intelligence in Minutes with Amazon Athena and Amazon QuickSight
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight
 
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
The Intelligent Edge for IoT: Help Customers Harness the Power of Connected I...
 
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
Building IoT Analytics (IOT327-R1) - AWS re:Invent 2018
 
Using Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter DachnowiczUsing Search with a Database - Peter Dachnowicz
Using Search with a Database - Peter Dachnowicz
 
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
AWS Data-Driven Insights Learning Series ANZ Sep 2019 Part 1
 
Machine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighterMachine Learning, Open Data, and the Future WarFighter
Machine Learning, Open Data, and the Future WarFighter
 
Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass Edge Computing with AWS Greengrass
Edge Computing with AWS Greengrass
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Using Search with a Database: Database Week SF
Using Search with a Database: Database Week SFUsing Search with a Database: Database Week SF
Using Search with a Database: Database Week SF
 
Adding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San FranciscoAdding Search to DynamoDB: Database Week San Francisco
Adding Search to DynamoDB: Database Week San Francisco
 
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
Leadership Session: AWS Semiconductor (MFG201-L) - AWS re:Invent 2018
 
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...
Perform Social Media Sentiment Analysis with Amazon Pinpoint & Amazon Compreh...
 
From Data To Insights
From Data To Insights From Data To Insights
From Data To Insights
 
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...
AWS Community Day Nordics 2018: Rolf Koski - Building Successful Enterprise C...
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
 
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
Build a Real-Time Sales Analytics Dashboard with Amazon ElastiCache for Redis...
 
Webinar: Archiving Salesforce Data into AWS RDS
Webinar: Archiving Salesforce Data into AWS RDSWebinar: Archiving Salesforce Data into AWS RDS
Webinar: Archiving Salesforce Data into AWS RDS
 
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...
Accumulo Summit 2015: From Big Data to Linked Data: Making Sense of Massive, ...
 

Similar to AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS

Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018Amazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Summits
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAdir Sharabi
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Web Services
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Amazon Web Services
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Michaela Bromfield
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Amazon Web Services
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Amazon Web Services
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSAmazon Web Services
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAmazon Web Services
 
Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018Amazon Web Services
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)Amazon Web Services
 

Similar to AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS (20)

Working with Open Data on AWS
Working with Open Data on AWSWorking with Open Data on AWS
Working with Open Data on AWS
 
Open Data on AWS
Open Data on AWSOpen Data on AWS
Open Data on AWS
 
AWS Open Data
AWS Open DataAWS Open Data
AWS Open Data
 
Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018Democratization of Big Data - GeoSummit 2018
Democratization of Big Data - GeoSummit 2018
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWSAWS Floor 28 - Building Data lake on AWS
AWS Floor 28 - Building Data lake on AWS
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
Building a Data Lake for Your Enterprise, ft. Sysco (STG309) - AWS re:Invent ...
 
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
Emerging Trends in Big Data, Analytics, Machine Learning, and Internet-of-Thi...
 
Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]Big Data - EBC on the road Brazil Edition [Portuguese]
Big Data - EBC on the road Brazil Edition [Portuguese]
 
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
 
BI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWSBI & Analytics - A Datalake on AWS
BI & Analytics - A Datalake on AWS
 
AWSome Day 2018 Keynote
AWSome Day 2018 KeynoteAWSome Day 2018 Keynote
AWSome Day 2018 Keynote
 
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech TalksAccelerate Productivity by Computing at the Edge - AWS Online Tech Talks
Accelerate Productivity by Computing at the Edge - AWS Online Tech Talks
 
AWSome Day Online Keynote
AWSome Day Online KeynoteAWSome Day Online Keynote
AWSome Day Online Keynote
 
Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018Securing Your Big Data Workload - AWS Summit Sydney 2018
Securing Your Big Data Workload - AWS Summit Sydney 2018
 
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
雲上打造資料湖 (Data Lake):智能化駕馭商機 (Level 300)
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jed Sundwall Global Open Data Lead, Amazon Web Services Dave Rocamora Solutions Architect, Amazon Web Services 194328 AWS Public Datasets: Learnings from Staging Petabytes of Data for Analysis in AWS
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Why does AWS care about Open Data? Many AWS customers supply data to the public to accelerate research and product development. Sharing data on AWS makes it accessible to a large and growing community of researchers, entrepreneurs, and enterprises who use the AWS Cloud. Many AWS customers use data shared on AWS to create new products and services.
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The AWS Open Data program expands access to data by staging it for analysis in the cloud. https://opendata.aws
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Public Datasets https://registry.opendata.aws
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Earth Observation Life Sciences & Genomics Machine Learning https://registry.opendata.aws
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Traditional Data Acquisition “…data must be organized, well-documented, consistently formatted, and error free. Cleaning the data is often the most taxing part of data science, and is frequently 80% of the work.” — Data Driven by DJ Patil and Hilary Mason
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Undifferentiated Heavy Lifting
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data Acquisition in the Cloud Amazon S3 Amazon EC2 Amazon Athena Amazon EMR Amazon Redshift AWS Glue AWS Data Pipeline
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Advantages of sharing data in the cloud Global community of users Faster pace of research Lower cost of research New services and tools
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What makes a dataset successful?
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What makes a dataset successful? It is treated like a product.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What makes a dataset successful? It is treated like a product. It is optimized for the cloud.
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Highly processedRaw Userbase
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Distributing Data Not accessed Necessary Unnecessary (wasted)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Distributing Data - Traditional Not accessed Necessary Unnecessary (wasted)
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Distributing Data – Prepared data on S3 Not accessed Necessary Unnecessary (wasted)
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Staging data for analysis Amazon S3 allows programmatic and precise access to data at planetary scale. Landsat on AWS uses Cloud Optimized GeoTIFFs that allow users to get only the data they need when they need it. cogeo.org AWS Lambda.tarUSGS Cloud Optimized GeoTIFFs s3://landsat-pds
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graph by Drew Bollinger (@drewbo19) at Development Seed Landsat on AWS
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Patterns S3 Key Index External Index Internal Index
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: GOES-16 Key Naming s3://noaa-goes16/ABI-L1b-RadF/2018/149/14/ OR_ ABI-L1b-RadF-M3C14_ G16_ s20181491430465_ e20181491441232_ c20181491441300.nc
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: IRS 990 CSV as External Index
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What makes a dataset successful? It is treated like a product. It is optimized for the cloud. There is a community around it.
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Raw Accessible Documented Trustworthy Userbase
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Landsat on AWS community http://cogeo.org
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. landsat-tiler from Mapbox https://viewer.remotepixel.ca/
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What makes a dataset successful? It is treated like a product. It is optimized for the cloud. There is a community around it.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the summit mobile app.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Thank You! jed@amazon.de rocamora@amazon.com