SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Pop-up Loft
How Amazon.com uses AWS Analytics
Saurabh Shrivastava
saursh@amazon.com
AWS Solution Architect
Andre Hass
hasandre@amazon.com
AWS Specialist Technical
Account Manager
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Traditional Data Warehousing
Wikipedia: In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a
system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are
central repositories of integrated data from one or more disparate sources. They store current and historical data and
are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could
range from annual and quarterly comparisons and trends to detailed daily sales analysis.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Battle for the Future
VS.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://www.promptcloud.com
https://john-popelaars.blogspot.com
https://ww.signiant.com
https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Industry Problem
Growth in Data
(mostly Unstructured)
& Analytics
Average Growth in
Traditional DW
Data
Average IT Budget
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is Amazon?
9
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Our vision is to be earth’s most customer-centric company;
to build a place where people can come to find and discover
anything they might want to buy online.
10
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 11
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 12
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon Data Warehouse
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Amazon Enterprise Data Warehouse
The Good!
Helps to Run the Amazon Business
• Most Comprehensive Set of Cleansed and Curated Business Data
• Feeds Many Downstream Systems and Processes
• Batch Processing, Reporting and Ad Hoc
• 500k+ Data Loads/Transformations Each Day
• 200k+ Queries/Extracts Each Day
• 20k+ Active Tables
• 10B++ Rows Loaded Daily
Our Data is Big!
• Core Data Set: 5+PB of Compressed Data (primarily limited by Legacy Technology)
• Total Storage (Multiple Systems): 35+ PB compressed
• Quote from Executive at Legacy DW Vendor:
• ~1000x Larger than any other DW Customer (from that Vendor)
Significant and Increasing Use of Redshift and EMR
• 1000’s of Redshift and EMR Systems, Range in size from:
• Individual Contributor - Project Based, to
• Running Multi-Billion Dollar Business inside Amazon
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Who are we?
• Analytics on the “Marketplace”
• Analytics Spokes: Pricing, B2B, Seller Support, Lending …
• Business Scale:
• 235MM monthly CPU Minutes on Legacy ODW
• 2K upstream tables
• Users:
• Supports 170 teams
• 1000 users with 9527 profiles (Parameterized Queries)
• 20K unique job runs per month
• 2800 (800 TB) datasets
• BI Tool Users:
• 3000+ Users, 650 non-tech
• 600+ ”Dashboards”
• 100k’s of queries each month
Example of an Amazon DW “Customer” Team
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Swiss Army” by by Jim Pennucci. No alterations other than cropping. https://www.flickr.com/photos/pennuja/5363518281/
Image used with permissions under Creative Commons license 2.0, Attribution Generic License
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is the Goal?
To Provide an analytic ecosystem that Scales with the
Amazon Business
To Leverage AWS Technologies and to help Improve these
technologies for all Amazon Customers
To Provide Choice and Options in New Analytic Technologies
• Provide an SQL based solution
• Increasingly Focus on Enabling new analytic approaches
including Machine Learning and Programmatic Data
Analysis
• Enable both “Bring Your Own Cluster” and “Bring your
Own Query” Approaches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Tools #2” by Juan Pablo Olmo. No alterations other than cropping. https://www.flickr.com/photos/juanpol/1562101472/
Image used with permissions under Creative Commons license 2.0, Attribution Generic License (https://creativecommons.org/licenses/by/2.0/)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR
(running Hive, Pig,
Spark, Presto, etc…)
Amazon DynamoDB
Amazon
Machine Learning
Amazon QuickSight
Amazon RDS
Amazon Elasticsearch
Service
Amazon Redshift Amazon Athena
Amazon SQS
Amazon Kinesis
Analytics
Amazon Kinesis
Firehose
Amazon S3
Amazon Kinesis
Open-source tools
(e.g. for ML, data science)
Commercial tools
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Moving Forward - AWS
S3 / EDX - Separate
Storage from Compute by
leveraging a parallel file
system as a global data
exchange
• Redshift - Preferred
platform SQL based
Analysis and traditional
Data Warehouse Data
• Focus is “Business Users”
• EMR – Scalable “Do
Everything” Platform - Enable
Teams who have chosen EMR
by providing Curated Data
• Focus is “Programattic Access”
Amazon
Redshift
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Amazon “Data Lake” – Project Name “Andes”
The Goal: ”THE” Place for Data at Amazon
• Source teams (Data Producers) put their Public Data there to give access to Analytic
teams (Data Consumers) and to share private data within their team
• EMR Can Directly Access the Data in Parallel from Andes
• Redshift can load the data in Parallel from Andes, or it Can Directly Access the Data in
Parallel with Spectrum
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Datamarts”
Number of Teams using the DW: ~2300
Number of Tables Used per Team:
• Max: 598
• Min 1
• Average: 49
Ad-Hoc (any data any time) can be achieved via
EMR can access the Data in Andes Directly
Redshift can load data into the Redshift file
system, or it can use the Spectrum Feature to
directly access the Data in Andes
An Architecture that Scales with the Business
Amazon Internal Team (132 Tables)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Putting The Pieces Together
The Analytic Architecture of the Future
Source
Systems
The Data Lake
“Andes”
Big Data Systems
Data Warehouses
“Bring Your Own Cluster” and
“Bring Your Own Query”
Services and Users
Postgre SQL
instance
Amazon
Redshift
Amazon
Redshift
Amazon
Redshift
Amazon
Kinesis
AWS Glue Amazon
QuickSight
Amazon
Athena
Amazon Machine
Learning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The Battle for the Future
The Data Lake becomes the
common source for all
data:
The DW becomes the
compute engine for
traditional structured data
(Redshift)
EMR becomes the compute
engine for programmatic
access, like machine
learning and many
emerging use cases
Both become a form of a
Dependent data mart with
the data coming from the
Data Lake
Vs.
AND
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 26
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Purchase
Contract
seller buyer
27
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Table	Subscriptions	- The	Vision
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Subscription
“Big Data Technologies” Team
producer consumer
29
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 30
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Value Chain
Image credits: Icons from thenounproject.com: “Collect” icon by Ramesh; “Cloud Security” icon by Creative Stall; “Search” icon by
Dinosoft Labs;
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Producers only need to integrate their datasets once
with the data lake
• Simplified onboarding process
• One-time integration
Ingest from various source systems:
• Relational databases – e.g., Amazon Aurora/RDS
Postgres
• Non-relational databases – e.g., Amazon DynamoDB
• Streams – e.g., Amazon Kinesis
• Flat files –e.g., files in Amazon S3
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Secure and scalable data lake:
• Highly durable S3-based storage
• Scalable since it’s built on AWS technologies
• Permissions are strictly enforced
Data quality:
• Certified with data quality checks
• Schemas are validated
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Company-wide data search index
• Consumers can quickly find what they’re looking
for
• Useful information about the datasets are
shown
Clear communication:
• Producers can communicate expectations
around data quality and SLAs
• Consumers can contact producers
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Easy process to subscribe to data:
• Find a dataset of interest
• Click “Subscribe”
• Choose the destination compute platform
Rapidly populate data marts, for example:
• Use AWS CloudFormation to provision Redshift
cluster
• Use subscriptions to load datasets to the cluster
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Subscriptions mechanism:
• Makes data available to the compute platform where
it can be analyzed
• Keep the compute platform in-sync with any data
updates
• Users can monitor the sync status of their
subscriptions
Synchronizations can be either:
• Full data copy
• Metadata-only sync
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Teams can use the right tools for the jobs, e.g.:
• Amazon Redshift for interactive analytics or batch
scheduled jobs
• Amazon EMR for machine learning and data
science
• QuickSight for Business analytics and visualizations
Compute resources can be scaled independently
of the data lake in order to:
• Process more/bigger/faster jobs
• Optimize costs
• Meet business SLAs
• Scale to meet high peak workloads
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.Image credits: Icons from thenounproject.com: “Collect” icon by Ramesh; “Cloud Security” icon by Creative Stall; “Search” icon by
Dinosoft Labs;
COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
Data Value Chain
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is the Goal?
To Provide an analytic ecosystem that Scales with the
Amazon Business
To Leverage AWS Technologies and to help Improve these
technologies for all Amazon Customers
To Provide Choice and Options in New Analytic Technologies
• Provide an SQL based solution
• Increasingly Focus on Enabling new analytic approaches
including Machine Learning and Programmatic Data
Analysis
• Enable both “Bring Your Own Cluster” and “Bring your
Own Query” Approaches
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Andes – Current State
• We have the data!
• 20k+ Tables maintained in Andes – All Active Tables
have been Sourced from the Enterprise Data
Warehouse
• Many teams are adding new data sets!
• Have Onboarded 900+ Redshift and EMR systems to
Subscriptions
• 20,000+ tables being synchronized
• Usage off the Legacy DW
• Three years (2014-2016) to grow from 0 to 100k Jobs
each Day
• In 2017, has grown from 100k to 300k Jobs each Day
Amazon.com
Big Data
Technologies
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data producers
(Amazon teams that want to share
data with other teams)
"Big Data Marketplace"
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved
Pop-up Loft
aws.amazon.com/activate
Everything and Anything Startups
Need to Get Started on AWS

More Related Content

What's hot

AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
Amazon Web Services
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
Amazon Web Services
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
SwathiPonugumati
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
Amazon Web Services
 
Machine Learning on AWS
Machine Learning on AWSMachine Learning on AWS
Machine Learning on AWS
Amazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Amazon Web Services
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
Amazon Web Services
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
sampath439572
 
AWS PrivateLink - Deep Dive
AWS PrivateLink - Deep DiveAWS PrivateLink - Deep Dive
AWS PrivateLink - Deep Dive
Enri Peters
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
Amazon Web Services
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
Amazon Web Services
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
Orchestrating AWS Lambda with AWS Step Functions
Orchestrating AWS Lambda with AWS Step Functions Orchestrating AWS Lambda with AWS Step Functions
Orchestrating AWS Lambda with AWS Step Functions
Amazon Web Services
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Amazon Web Services
 
AWS Cost Management Workshop
AWS Cost Management WorkshopAWS Cost Management Workshop
AWS Cost Management Workshop
Amazon Web Services
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
Amazon Web Services
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
Ian Massingham
 

What's hot (20)

AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Big Data and Analytics on AWS
Big Data and Analytics on AWS Big Data and Analytics on AWS
Big Data and Analytics on AWS
 
Introduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptxIntroduction to AWS Lake Formation.pptx
Introduction to AWS Lake Formation.pptx
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Machine Learning on AWS
Machine Learning on AWSMachine Learning on AWS
Machine Learning on AWS
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
AWS PrivateLink - Deep Dive
AWS PrivateLink - Deep DiveAWS PrivateLink - Deep Dive
AWS PrivateLink - Deep Dive
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Orchestrating AWS Lambda with AWS Step Functions
Orchestrating AWS Lambda with AWS Step Functions Orchestrating AWS Lambda with AWS Step Functions
Orchestrating AWS Lambda with AWS Step Functions
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
Migrate from Oracle to Aurora PostgreSQL: Best Practices, Design Patterns, & ...
 
AWS Cost Management Workshop
AWS Cost Management WorkshopAWS Cost Management Workshop
AWS Cost Management Workshop
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
AWS 101: Introduction to AWS
AWS 101: Introduction to AWSAWS 101: Introduction to AWS
AWS 101: Introduction to AWS
 

Similar to How Amazon.com Uses AWS Analytics: Data Analytics Week SF

How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
Amazon Web Services
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
Amazon Web Services
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
Amazon Web Services
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
Amazon Web Services
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
Amazon Web Services
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
Amazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
Amazon Web Services
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
Amazon Web Services
 
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWS
Injae Kwak
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Amazon Web Services
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
Amazon Web Services
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
Amazon Web Services
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
Amazon Web Services
 
Automating Big Data Technologies for Faster Time-to-Value
 Automating Big Data Technologies for Faster Time-to-Value Automating Big Data Technologies for Faster Time-to-Value
Automating Big Data Technologies for Faster Time-to-Value
Amazon Web Services
 
Architecting an Open Data Lake for the Enterprise
 Architecting an Open Data Lake for the Enterprise  Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
Amazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
Amazon Web Services
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
Amazon Web Services
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
Amazon Web Services
 
ABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For EnterpriseABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For Enterprise
Amazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
Amazon Web Services
 

Similar to How Amazon.com Uses AWS Analytics: Data Analytics Week SF (20)

How Amazon uses AWS Analytics
How Amazon uses AWS AnalyticsHow Amazon uses AWS Analytics
How Amazon uses AWS Analytics
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
 
How Amazon.com uses AWS Analytics
How Amazon.com uses AWS AnalyticsHow Amazon.com uses AWS Analytics
How Amazon.com uses AWS Analytics
 
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
A Look Under the Hood – How Amazon.com Uses AWS Services for Analytics at Mas...
 
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
MSC203_How Citrix Uses AWS Marketplace Solutions To Accelerate Analytic Workl...
 
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
How Citrix Uses AWS Marketplace Solutions to Accelerate Analytic Workloads on...
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWS
 
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
Database Freedom. Database migration approaches to get to the Cloud - Marcus ...
 
ARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million UsersARC201_Scaling Up to Your First 10 Million Users
ARC201_Scaling Up to Your First 10 Million Users
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Automating Big Data Technologies for Faster Time-to-Value
 Automating Big Data Technologies for Faster Time-to-Value Automating Big Data Technologies for Faster Time-to-Value
Automating Big Data Technologies for Faster Time-to-Value
 
Architecting an Open Data Lake for the Enterprise
 Architecting an Open Data Lake for the Enterprise  Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
RET301-Build Single Customer View across Multiple Retail Channels using AWS S...
 
ABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For EnterpriseABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For Enterprise
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How Amazon.com Uses AWS Analytics: Data Analytics Week SF

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Pop-up Loft How Amazon.com uses AWS Analytics Saurabh Shrivastava saursh@amazon.com AWS Solution Architect Andre Hass hasandre@amazon.com AWS Specialist Technical Account Manager
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Traditional Data Warehousing Wikipedia: In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analysis.
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Battle for the Future VS.
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://www.promptcloud.com https://john-popelaars.blogspot.com https://ww.signiant.com https://www.linkedin.com/pulse/world-today-data-rich-information-poor-guru-p-mohapatra-pmp/
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Industry Problem Growth in Data (mostly Unstructured) & Analytics Average Growth in Traditional DW Data Average IT Budget
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is Amazon? 9
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Our vision is to be earth’s most customer-centric company; to build a place where people can come to find and discover anything they might want to buy online. 10
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 11
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 12
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Data Warehouse
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Amazon Enterprise Data Warehouse The Good! Helps to Run the Amazon Business • Most Comprehensive Set of Cleansed and Curated Business Data • Feeds Many Downstream Systems and Processes • Batch Processing, Reporting and Ad Hoc • 500k+ Data Loads/Transformations Each Day • 200k+ Queries/Extracts Each Day • 20k+ Active Tables • 10B++ Rows Loaded Daily Our Data is Big! • Core Data Set: 5+PB of Compressed Data (primarily limited by Legacy Technology) • Total Storage (Multiple Systems): 35+ PB compressed • Quote from Executive at Legacy DW Vendor: • ~1000x Larger than any other DW Customer (from that Vendor) Significant and Increasing Use of Redshift and EMR • 1000’s of Redshift and EMR Systems, Range in size from: • Individual Contributor - Project Based, to • Running Multi-Billion Dollar Business inside Amazon
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Who are we? • Analytics on the “Marketplace” • Analytics Spokes: Pricing, B2B, Seller Support, Lending … • Business Scale: • 235MM monthly CPU Minutes on Legacy ODW • 2K upstream tables • Users: • Supports 170 teams • 1000 users with 9527 profiles (Parameterized Queries) • 20K unique job runs per month • 2800 (800 TB) datasets • BI Tool Users: • 3000+ Users, 650 non-tech • 600+ ”Dashboards” • 100k’s of queries each month Example of an Amazon DW “Customer” Team
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Swiss Army” by by Jim Pennucci. No alterations other than cropping. https://www.flickr.com/photos/pennuja/5363518281/ Image used with permissions under Creative Commons license 2.0, Attribution Generic License
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is the Goal? To Provide an analytic ecosystem that Scales with the Amazon Business To Leverage AWS Technologies and to help Improve these technologies for all Amazon Customers To Provide Choice and Options in New Analytic Technologies • Provide an SQL based solution • Increasingly Focus on Enabling new analytic approaches including Machine Learning and Programmatic Data Analysis • Enable both “Bring Your Own Cluster” and “Bring your Own Query” Approaches
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Tools #2” by Juan Pablo Olmo. No alterations other than cropping. https://www.flickr.com/photos/juanpol/1562101472/ Image used with permissions under Creative Commons license 2.0, Attribution Generic License (https://creativecommons.org/licenses/by/2.0/)
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EMR (running Hive, Pig, Spark, Presto, etc…) Amazon DynamoDB Amazon Machine Learning Amazon QuickSight Amazon RDS Amazon Elasticsearch Service Amazon Redshift Amazon Athena Amazon SQS Amazon Kinesis Analytics Amazon Kinesis Firehose Amazon S3 Amazon Kinesis Open-source tools (e.g. for ML, data science) Commercial tools
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Moving Forward - AWS S3 / EDX - Separate Storage from Compute by leveraging a parallel file system as a global data exchange • Redshift - Preferred platform SQL based Analysis and traditional Data Warehouse Data • Focus is “Business Users” • EMR – Scalable “Do Everything” Platform - Enable Teams who have chosen EMR by providing Curated Data • Focus is “Programattic Access” Amazon Redshift
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Amazon “Data Lake” – Project Name “Andes” The Goal: ”THE” Place for Data at Amazon • Source teams (Data Producers) put their Public Data there to give access to Analytic teams (Data Consumers) and to share private data within their team • EMR Can Directly Access the Data in Parallel from Andes • Redshift can load the data in Parallel from Andes, or it Can Directly Access the Data in Parallel with Spectrum
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Datamarts” Number of Teams using the DW: ~2300 Number of Tables Used per Team: • Max: 598 • Min 1 • Average: 49 Ad-Hoc (any data any time) can be achieved via EMR can access the Data in Andes Directly Redshift can load data into the Redshift file system, or it can use the Spectrum Feature to directly access the Data in Andes An Architecture that Scales with the Business Amazon Internal Team (132 Tables)
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Putting The Pieces Together The Analytic Architecture of the Future Source Systems The Data Lake “Andes” Big Data Systems Data Warehouses “Bring Your Own Cluster” and “Bring Your Own Query” Services and Users Postgre SQL instance Amazon Redshift Amazon Redshift Amazon Redshift Amazon Kinesis AWS Glue Amazon QuickSight Amazon Athena Amazon Machine Learning
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Battle for the Future The Data Lake becomes the common source for all data: The DW becomes the compute engine for traditional structured data (Redshift) EMR becomes the compute engine for programmatic access, like machine learning and many emerging use cases Both become a form of a Dependent data mart with the data coming from the Data Lake Vs. AND
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 26
  • 26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Purchase Contract seller buyer 27
  • 27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Table Subscriptions - The Vision
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Subscription “Big Data Technologies” Team producer consumer 29
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 30
  • 30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Value Chain Image credits: Icons from thenounproject.com: “Collect” icon by Ramesh; “Cloud Security” icon by Creative Stall; “Search” icon by Dinosoft Labs; COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER
  • 31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Producers only need to integrate their datasets once with the data lake • Simplified onboarding process • One-time integration Ingest from various source systems: • Relational databases – e.g., Amazon Aurora/RDS Postgres • Non-relational databases – e.g., Amazon DynamoDB • Streams – e.g., Amazon Kinesis • Flat files –e.g., files in Amazon S3 COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Secure and scalable data lake: • Highly durable S3-based storage • Scalable since it’s built on AWS technologies • Permissions are strictly enforced Data quality: • Certified with data quality checks • Schemas are validated COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Company-wide data search index • Consumers can quickly find what they’re looking for • Useful information about the datasets are shown Clear communication: • Producers can communicate expectations around data quality and SLAs • Consumers can contact producers COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 34. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Easy process to subscribe to data: • Find a dataset of interest • Click “Subscribe” • Choose the destination compute platform Rapidly populate data marts, for example: • Use AWS CloudFormation to provision Redshift cluster • Use subscriptions to load datasets to the cluster COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 35. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Subscriptions mechanism: • Makes data available to the compute platform where it can be analyzed • Keep the compute platform in-sync with any data updates • Users can monitor the sync status of their subscriptions Synchronizations can be either: • Full data copy • Metadata-only sync COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 36. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Teams can use the right tools for the jobs, e.g.: • Amazon Redshift for interactive analytics or batch scheduled jobs • Amazon EMR for machine learning and data science • QuickSight for Business analytics and visualizations Compute resources can be scaled independently of the data lake in order to: • Process more/bigger/faster jobs • Optimize costs • Meet business SLAs • Scale to meet high peak workloads COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 37. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.Image credits: Icons from thenounproject.com: “Collect” icon by Ramesh; “Cloud Security” icon by Creative Stall; “Search” icon by Dinosoft Labs; COLLECT STORE DELIVER ANALYZESUBSCRIBEDISCOVER Data Value Chain
  • 38. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is the Goal? To Provide an analytic ecosystem that Scales with the Amazon Business To Leverage AWS Technologies and to help Improve these technologies for all Amazon Customers To Provide Choice and Options in New Analytic Technologies • Provide an SQL based solution • Increasingly Focus on Enabling new analytic approaches including Machine Learning and Programmatic Data Analysis • Enable both “Bring Your Own Cluster” and “Bring your Own Query” Approaches
  • 39. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Andes – Current State • We have the data! • 20k+ Tables maintained in Andes – All Active Tables have been Sourced from the Enterprise Data Warehouse • Many teams are adding new data sets! • Have Onboarded 900+ Redshift and EMR systems to Subscriptions • 20,000+ tables being synchronized • Usage off the Legacy DW • Three years (2014-2016) to grow from 0 to 100k Jobs each Day • In 2017, has grown from 100k to 300k Jobs each Day Amazon.com Big Data Technologies
  • 40. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data producers (Amazon teams that want to share data with other teams) "Big Data Marketplace"
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved Pop-up Loft aws.amazon.com/activate Everything and Anything Startups Need to Get Started on AWS