SlideShare a Scribd company logo
Amazon Redshift:
How we managed 300 billion rows with no DBA
Matt Cohen
Founder & President
matt@onespot.com
December 10th, 2013

Copyright©2013OneSpot,Proprietary&Confidential

1
What is OneSpot?
• OneSpot is a content
advertising platform that
distributes content as
ads that people want
to click on.
– Fortune 2000 clients
– Realtime ad exchange
bidding
– Adaptive machine learning
– Seed funded until
$5.3M Series A last month

• Big data, big analysis
Copyright©2013OneSpot,Proprietary&Confidential

2
What is Redshift?
1. When light from a receding object appears
shifted to the red end of the spectrum
– A consequence of the expanding universe.

2. A cheap, fast, Petabyte-scale, managed
SQL data warehouse service from Amazon
Web Services
– A consequence of the expanding cloud ecosystem

Copyright©2013OneSpot,Proprietary&Confidential

3
Why Redshift?
•
•
•
•
•
•
•

Cheap
Fast
Petabyte-scale
Managed Service
SQL
Data Warehouse
From AWS

Copyright©2013OneSpot,Proprietary&Confidential

4
SQL Data Warehouse
• Based on the commercial ParAccel database
– Which is based on Postgres

• Standards-based tools and knowledge
• Built for data warehousing
–
–
–
–
–

Column-oriented
Cluster architecture
Read optimized
No relational integrity
Almost no SQL extensions

Copyright©2013OneSpot,Proprietary&Confidential

5
SQL Data Warehouse
• Column-oriented

Copyright©2013OneSpot,Proprietary&Confidential

6
SQL Data Warehouse
• Column-oriented

• 11 different compression techniques

Copyright©2013OneSpot,Proprietary&Confidential

7
SQL Data Warehouse
• Cluster architecture

Copyright©2013OneSpot,Proprietary&Confidential

8
SQL Data Warehouse
• Read optimized

• No relational integrity

– Large block size (1MB)
– Data replication

– No indexes:
sort and distribution keys

• 2x live, 1x S3

• Almost no SQL
extensions

Copyright©2013OneSpot,Proprietary&Confidential

9
Fast = Cheap
• Starts with 1 XL node
– 85¢ an hour ($620/month) on demand
– 50¢ an hour ($365) 1 year reserved

• Benchmarks say:
– Scales linearly
– 5-10x faster than Hadoop/Hive

Copyright©2013OneSpot,Proprietary&Confidential

10
Petabyte scale
• Up to
– 32 XL nodes (64 Terabytes)
– 100 8XL nodes (1.6 Petabytes)

Copyright©2013OneSpot,Proprietary&Confidential

11
Managed Service from AWS
• Managed Service
– Incredibly easy
– Nice UI
– Most SQL tools

• From AWS
– Free data transfer
– Easy load from S3
– Use AWS Data Pipeline

Copyright©2013OneSpot,Proprietary&Confidential

12
The TL;DR
• Pros
–
–
–
–
–

Standard SQL
Super easy
Very fast
Affordable
Integrates with AWS

– No DBA
– No Sysadmin

• Cons
– Standard SQL
– Almost no SQL
extensions
– Best with Star Schema
• Big joins can be slow

–
–
–
–

Copyright©2013OneSpot,Proprietary&Confidential

No MapReduce
Fixed columns
Consistency
1.6 Pbyte limit

13
Amazon Redshift:
How we managed 300 billion rows with no DBA
Matt Cohen
Founder & President
matt@onespot.com
December 10th, 2013

Copyright©2013OneSpot,Proprietary&Confidential

14

More Related Content

What's hot

Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
 
Pyramid vs QlikView
Pyramid vs QlikViewPyramid vs QlikView
Pyramid vs QlikView
Pyramid Analytics
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
Pyramid Analytics
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
 
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
HostedbyConfluent
 
Wix sql on-storm-platform
Wix sql on-storm-platformWix sql on-storm-platform
Wix sql on-storm-platform
alooma
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
Eddie Satterly
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
Elasticsearch
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
Amazon Web Services
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
Lynn Langit
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
AWSCOMSUM
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Amazon Web Services
 

What's hot (17)

Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
Дмитрий Лавриненко "Blockchain for Identity Management, based on Fast Big Data"
 
Pyramid vs QlikView
Pyramid vs QlikViewPyramid vs QlikView
Pyramid vs QlikView
 
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
 
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
 
Wix sql on-storm-platform
Wix sql on-storm-platformWix sql on-storm-platform
Wix sql on-storm-platform
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
 
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 

Similar to 2 one spot redshift bigdatacamp 1.02

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
Amazon Web Services
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
 
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
Amazon Web Services
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
mauerbac
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Web Services
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
Amazon Web Services
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Amazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Amazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
 

Similar to 2 one spot redshift bigdatacamp 1.02 (20)

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
 
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
 
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm – How SnapLogic and AWS Deliver Analytics in the Cl...
 
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 

More from Valerie Akinson Brown

1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
Valerie Akinson Brown
 
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
Valerie Akinson Brown
 
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
 
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
Valerie Akinson Brown
 
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
 

More from Valerie Akinson Brown (6)

1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
 
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
 
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
 
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
 

2 one spot redshift bigdatacamp 1.02

  • 1. Amazon Redshift: How we managed 300 billion rows with no DBA Matt Cohen Founder & President matt@onespot.com December 10th, 2013 Copyright©2013OneSpot,Proprietary&Confidential 1
  • 2. What is OneSpot? • OneSpot is a content advertising platform that distributes content as ads that people want to click on. – Fortune 2000 clients – Realtime ad exchange bidding – Adaptive machine learning – Seed funded until $5.3M Series A last month • Big data, big analysis Copyright©2013OneSpot,Proprietary&Confidential 2
  • 3. What is Redshift? 1. When light from a receding object appears shifted to the red end of the spectrum – A consequence of the expanding universe. 2. A cheap, fast, Petabyte-scale, managed SQL data warehouse service from Amazon Web Services – A consequence of the expanding cloud ecosystem Copyright©2013OneSpot,Proprietary&Confidential 3
  • 4. Why Redshift? • • • • • • • Cheap Fast Petabyte-scale Managed Service SQL Data Warehouse From AWS Copyright©2013OneSpot,Proprietary&Confidential 4
  • 5. SQL Data Warehouse • Based on the commercial ParAccel database – Which is based on Postgres • Standards-based tools and knowledge • Built for data warehousing – – – – – Column-oriented Cluster architecture Read optimized No relational integrity Almost no SQL extensions Copyright©2013OneSpot,Proprietary&Confidential 5
  • 6. SQL Data Warehouse • Column-oriented Copyright©2013OneSpot,Proprietary&Confidential 6
  • 7. SQL Data Warehouse • Column-oriented • 11 different compression techniques Copyright©2013OneSpot,Proprietary&Confidential 7
  • 8. SQL Data Warehouse • Cluster architecture Copyright©2013OneSpot,Proprietary&Confidential 8
  • 9. SQL Data Warehouse • Read optimized • No relational integrity – Large block size (1MB) – Data replication – No indexes: sort and distribution keys • 2x live, 1x S3 • Almost no SQL extensions Copyright©2013OneSpot,Proprietary&Confidential 9
  • 10. Fast = Cheap • Starts with 1 XL node – 85¢ an hour ($620/month) on demand – 50¢ an hour ($365) 1 year reserved • Benchmarks say: – Scales linearly – 5-10x faster than Hadoop/Hive Copyright©2013OneSpot,Proprietary&Confidential 10
  • 11. Petabyte scale • Up to – 32 XL nodes (64 Terabytes) – 100 8XL nodes (1.6 Petabytes) Copyright©2013OneSpot,Proprietary&Confidential 11
  • 12. Managed Service from AWS • Managed Service – Incredibly easy – Nice UI – Most SQL tools • From AWS – Free data transfer – Easy load from S3 – Use AWS Data Pipeline Copyright©2013OneSpot,Proprietary&Confidential 12
  • 13. The TL;DR • Pros – – – – – Standard SQL Super easy Very fast Affordable Integrates with AWS – No DBA – No Sysadmin • Cons – Standard SQL – Almost no SQL extensions – Best with Star Schema • Big joins can be slow – – – – Copyright©2013OneSpot,Proprietary&Confidential No MapReduce Fixed columns Consistency 1.6 Pbyte limit 13
  • 14. Amazon Redshift: How we managed 300 billion rows with no DBA Matt Cohen Founder & President matt@onespot.com December 10th, 2013 Copyright©2013OneSpot,Proprietary&Confidential 14