SlideShare a Scribd company logo
1 of 14
Hapyrus: Amazon Redshift
BENCHMARK Series 02
Scalability of Amazon Redshift
Data Loading and
Query Speed
Comparisons between the performance of different instances
www.flydata.com
Amazon Redshift can load 1.2TB data using:
•an XL instance, taking 17 hours
•a two node XL instance, taking 10 hours
•a two node 8XL instance (equivalent to XL 16x), taking 2 hours
Load speed is almost proportional to
number of nodes
Running identical queries on Amazon Redshift,
•an XL instance took 155 seconds
•a two node XL instance took 55 seconds
•a two node 8XL instance (equivalent to XL 16x) took 31 seconds
The more nodes, the faster a query runs (but
not by much, seems more inversely proportional)
www.flydata.com
The most important feature of Amazon Redshift is
that it has flexible scalability compared to other
columnar data warehouses such as Vertica,
Netezza and Teradata.
We have run benchmarks to compare the cost and
speed of Redshift instances using:
• 1.2TB of data and identical queries
– same data as previous benchmark
http://www.slideshare.net/Hapyrus/amazon-redshift-is-10x-
faster-and-cheaper-than-hadoop-same data as previous
benchmark http://www.slideshare.net/Hapyrus/amazon-
redshift-is-10x-faster-and-cheaper-than-hadoop-hive
• 3 different types of instances: Single node
XL, two node XL, and two node 8XL
www.flydata.com
1. Data Loading
• XL x1 instance
took 17 hours
• XL x2 instance
took 10 hours
• 8XL x2 instance
(equivalent to XL
16x) took 2
hours
* The query and data used can be referenced in our Appendix
2h 36m
17h 19m
10h 38m
Data loading by COPY command. All the data is copied at once.
2
www.flydata.com
2. Query Speed
• XL x1 instance
took 155
seconds
• XL x2 instance
took 55 seconds
• 8XL x2 instance
(equivalent to XL
16x) took 31
seconds
* The query used can be referenced in our Appendix
155.02s
54.77s
30.61s
2
www.flydata.com
Redshift Data Loading Results
* The data used can be referenced in our Appendix
The biggest data (imp_logs:
1.2TB)
Other Tables
Instance Type Nodes Machine Spec Time (hours) Speed Cost per TB
dw.hs1.xlarge
1 1 17.32 70MB/h $12.27
2 2 10.63 110MB/h $15.07
dw.hs1.8xlarge 2 16 2.62 460MB/h $29.64
Loading Time
Table Name XL x1 XL x2 8XL x2
ad_campaigns
(100MB)
5.82s 8.97s 8.5s
Publishers
(10MB)
552ms 2.83s 2.22s
Advertisers
(10MB)
770ms 3.71s 4.75s
imp_logs
(1.2TB)
17h 19m 22.27s 10h 38m 5.66s 2h 36m 55.49s
click_logs
(6GB)
11m 31.37s 10m 6.24s 37.87s
www.flydata.com
Redshift Query Results
* The query used can be referenced in our Appendix
Processing Time
Trial XL x1 XL x2 8XL x2
1 2m 28.65s 1m 1.44s 39.11s
2 2m 37.65s 52.89s 26.77s
3 2m 35.91s 53.76s 29.9s
4 2m 29.04s 53.52s 27.51s
5 2m 36.9s 52.22s 29.75s
Average 2m 35s 54.77s 30.69s
www.flydata.com
Discussion
• Redshift can load data in parallel
– Almost proportional to number of nodes
• Redshift query speed is drastically affected
by number of nodes
– Big difference between Single and Multiple nodes
– Two nodes are faster than two times the speed of
one node
– Additional experimentation may be helpful in
future
www.flydata.com
APPENDIX - Data
TSV files, gzip compressed
Imp_lo
g
1.2TB / 1.2B
record
date datetime
publisher_id integer
ad_campaign_id integer
bid_price real
country varchar(30)
attr1-4 varchar(255)
click_l
og
5.6GB / 6M
record
date datetime
publisher_id integer
ad_campaign_id integer
country varchar(30)
attr1-4 varchar(255)
ad_campai
gn
100MB / 100k
record
publish
er
10MB / 10k
record
advertis
er
10MB / 10k
record
Using 5 tables, we run a query which join tables and creates a report.
www.flydata.com
appendix – Sample Query
select
ac.ad_campaign_id as ad_campaign_id,
adv.advertiser_id as advertiser_id,
cs.spending as spending,
ims.imp_total as imp_total,
cs.click_total as click_total,
click_total/imp_total as CTR,
spending/click_total as CPC,
spending/(imp_total/1000) as CPM
from
ad_campaigns ac
join
advertisers adv
on (ac.advertiser_id = adv.advertiser_id)
join
(select
il.ad_campaign_id,
count(*) as imp_total
from
imp_logs il
group by
il.ad_campaign_id
) ims on (ims.ad_campaign_id =
ac.ad_campaign_id)
join
(select
cl.ad_campaign_id,
sum(cl.bid_price) as spending,
count(*) as click_total
from
click_logs cl
group by
cl.ad_campaign_id
) cs on (cs.ad_campaign_id = ac.ad_campaign_id);
The query generates a basic report for ad campaigns performance, imp, click
numbers,
advertiser spending, CTR, CPC and CPM.
www.flydata.com
APPENDIX – Additional Comments
• A Redshift 8XL instance cannot be selected as
single node (you must choose at least 2 nodes)
(as of Feb. 24,
2013)
www.flydata.com
APPENDIX – Additional Information
• All resources for our benchmark are on
our github repository
– https://github.com/hapyrus/redshift-
https://github.com/hapyrus/redshift-
benchmark
– The dataset we use is open on S3, so you
can reproduce the benchmark
www.flydata.com
About Us - FlyData
• FlyData Enterprise
– Enables continuous loading to Amazon Redshift,
with real-time data loading
– Automated ETL process with multiple supported
data formats
– Auto scaling, data Integrity and high durability
– FlyData Sync feature allows real-time replication
from RDBMS to Amazon Redshift
Contact us at: info@flydata.com
We are an official data
integration partner of
Amazon Redshift
Formerly known as Hapyrus
www.flydata.com
www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift

More Related Content

What's hot

Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queriestimonk
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftAmazon Web Services
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Amazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceAmazon Web Services
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with RedshiftAmazon Web Services
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon RedshiftAmazon Web Services
 

What's hot (20)

Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queries
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon RedshiftUses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
Production NoSQL in an Hour: Introduction to Amazon DynamoDB (DAT101) | AWS r...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Masterclass - Redshift
Masterclass - RedshiftMasterclass - Redshift
Masterclass - Redshift
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
Introduction to Amazon Redshift and What's Next (DAT103) | AWS re:Invent 2013
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch ServiceReal-Time Data Exploration and Analytics with Amazon Elasticsearch Service
Real-Time Data Exploration and Analytics with Amazon Elasticsearch Service
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
(ISM303) Migrating Your Enterprise Data Warehouse To Amazon Redshift
 

Viewers also liked

AWS Black Belt Tech シリーズ 2015 - Amazon Redshift
AWS Black Belt Tech シリーズ 2015 - Amazon RedshiftAWS Black Belt Tech シリーズ 2015 - Amazon Redshift
AWS Black Belt Tech シリーズ 2015 - Amazon RedshiftAmazon Web Services Japan
 
AWS Black Belt Techシリーズ Amazon Redshift
AWS Black Belt Techシリーズ  Amazon RedshiftAWS Black Belt Techシリーズ  Amazon Redshift
AWS Black Belt Techシリーズ Amazon RedshiftAmazon Web Services Japan
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSantosh Sahoo
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketMongoDB
 
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...Amazon Web Services
 
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014Amazon Web Services
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Koichi Fujikawa
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...Amazon Web Services
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...Amazon Web Services
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Types of databases
Types of databasesTypes of databases
Types of databasesPAQUIAAIZEL
 

Viewers also liked (20)

AWS Black Belt Tech シリーズ 2015 - Amazon Redshift
AWS Black Belt Tech シリーズ 2015 - Amazon RedshiftAWS Black Belt Tech シリーズ 2015 - Amazon Redshift
AWS Black Belt Tech シリーズ 2015 - Amazon Redshift
 
AWS Black Belt Techシリーズ Amazon Redshift
AWS Black Belt Techシリーズ  Amazon RedshiftAWS Black Belt Techシリーズ  Amazon Redshift
AWS Black Belt Techシリーズ Amazon Redshift
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark StreamingSpark Seattle meetup - Breaking ETL barrier with Spark Streaming
Spark Seattle meetup - Breaking ETL barrier with Spark Streaming
 
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to MarketBusiness Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
Business Track: How MongoDB Helps Telefonia Digital Accelerate Time to Market
 
Amazon redshiftのご紹介
Amazon redshiftのご紹介Amazon redshiftのご紹介
Amazon redshiftのご紹介
 
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
AWS June 2016 Webinar Series - Amazon Aurora Deep Dive - Optimizing Database ...
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014
(SDD414) Amazon Redshift Deep Dive and What's Next | AWS re:Invent 2014
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Tailored for Spark
Tailored for SparkTailored for Spark
Tailored for Spark
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
 
Types dbms
Types dbmsTypes dbms
Types dbms
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Types of databases
Types of databasesTypes of databases
Types of databases
 

Similar to Scalability of Amazon Redshift Data Loading and Query Speed

Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftAmazon Web Services LATAM
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudWolfgang Gentzsch
 
NAFEMS Smart Manufacturing - UberCloud
NAFEMS Smart Manufacturing - UberCloudNAFEMS Smart Manufacturing - UberCloud
NAFEMS Smart Manufacturing - UberCloudThomas Francis
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudThe UberCloud
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
SCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenSCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenNicolas Fränkel
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for SolrToke Eskildsen
 
JUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingJUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingNicolas Fränkel
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedis Labs
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Hortonworks
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
JUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingJUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingNicolas Fränkel
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfManish Chopra
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streamingNicolas Fränkel
 
BruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingBruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingNicolas Fränkel
 
WaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingWaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingNicolas Fränkel
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Similar to Scalability of Amazon Redshift Data Loading and Query Speed (20)

Melhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon RedshiftMelhores práticas de data warehouse no Amazon Redshift
Melhores práticas de data warehouse no Amazon Redshift
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 
NAFEMS Smart Manufacturing - UberCloud
NAFEMS Smart Manufacturing - UberCloudNAFEMS Smart Manufacturing - UberCloud
NAFEMS Smart Manufacturing - UberCloud
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
SCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in HeavenSCALE - Stream processing and Open Data, a match made in Heaven
SCALE - Stream processing and Open Data, a match made in Heaven
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Faceting optimizations for Solr
Faceting optimizations for SolrFaceting optimizations for Solr
Faceting optimizations for Solr
 
JUG SF - Introduction to data streaming
JUG SF - Introduction to data streamingJUG SF - Introduction to data streaming
JUG SF - Introduction to data streaming
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite Streaming SQL w/ Apache Calcite
Streaming SQL w/ Apache Calcite
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
JUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streamingJUG Tirana - Introduction to data streaming
JUG Tirana - Introduction to data streaming
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdf
 
vJUG - Introduction to data streaming
vJUG - Introduction to data streamingvJUG - Introduction to data streaming
vJUG - Introduction to data streaming
 
BruJUG - Introduction to data streaming
BruJUG - Introduction to data streamingBruJUG - Introduction to data streaming
BruJUG - Introduction to data streaming
 
WaJUG - Introduction to data streaming
WaJUG - Introduction to data streamingWaJUG - Introduction to data streaming
WaJUG - Introduction to data streaming
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

More from FlyData Inc.

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureFlyData Inc.
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data ScienceFlyData Inc.
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftFlyData Inc.
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterFlyData Inc.
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of ThingsFlyData Inc.
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!FlyData Inc.
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Inc.
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較 FlyData Inc.
 

More from FlyData Inc. (11)

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data Science
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon Redshift
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift Cluster
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of Things
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Scalability of Amazon Redshift Data Loading and Query Speed

  • 1. Hapyrus: Amazon Redshift BENCHMARK Series 02 Scalability of Amazon Redshift Data Loading and Query Speed Comparisons between the performance of different instances www.flydata.com
  • 2. Amazon Redshift can load 1.2TB data using: •an XL instance, taking 17 hours •a two node XL instance, taking 10 hours •a two node 8XL instance (equivalent to XL 16x), taking 2 hours Load speed is almost proportional to number of nodes Running identical queries on Amazon Redshift, •an XL instance took 155 seconds •a two node XL instance took 55 seconds •a two node 8XL instance (equivalent to XL 16x) took 31 seconds The more nodes, the faster a query runs (but not by much, seems more inversely proportional) www.flydata.com
  • 3. The most important feature of Amazon Redshift is that it has flexible scalability compared to other columnar data warehouses such as Vertica, Netezza and Teradata. We have run benchmarks to compare the cost and speed of Redshift instances using: • 1.2TB of data and identical queries – same data as previous benchmark http://www.slideshare.net/Hapyrus/amazon-redshift-is-10x- faster-and-cheaper-than-hadoop-same data as previous benchmark http://www.slideshare.net/Hapyrus/amazon- redshift-is-10x-faster-and-cheaper-than-hadoop-hive • 3 different types of instances: Single node XL, two node XL, and two node 8XL www.flydata.com
  • 4. 1. Data Loading • XL x1 instance took 17 hours • XL x2 instance took 10 hours • 8XL x2 instance (equivalent to XL 16x) took 2 hours * The query and data used can be referenced in our Appendix 2h 36m 17h 19m 10h 38m Data loading by COPY command. All the data is copied at once. 2 www.flydata.com
  • 5. 2. Query Speed • XL x1 instance took 155 seconds • XL x2 instance took 55 seconds • 8XL x2 instance (equivalent to XL 16x) took 31 seconds * The query used can be referenced in our Appendix 155.02s 54.77s 30.61s 2 www.flydata.com
  • 6. Redshift Data Loading Results * The data used can be referenced in our Appendix The biggest data (imp_logs: 1.2TB) Other Tables Instance Type Nodes Machine Spec Time (hours) Speed Cost per TB dw.hs1.xlarge 1 1 17.32 70MB/h $12.27 2 2 10.63 110MB/h $15.07 dw.hs1.8xlarge 2 16 2.62 460MB/h $29.64 Loading Time Table Name XL x1 XL x2 8XL x2 ad_campaigns (100MB) 5.82s 8.97s 8.5s Publishers (10MB) 552ms 2.83s 2.22s Advertisers (10MB) 770ms 3.71s 4.75s imp_logs (1.2TB) 17h 19m 22.27s 10h 38m 5.66s 2h 36m 55.49s click_logs (6GB) 11m 31.37s 10m 6.24s 37.87s www.flydata.com
  • 7. Redshift Query Results * The query used can be referenced in our Appendix Processing Time Trial XL x1 XL x2 8XL x2 1 2m 28.65s 1m 1.44s 39.11s 2 2m 37.65s 52.89s 26.77s 3 2m 35.91s 53.76s 29.9s 4 2m 29.04s 53.52s 27.51s 5 2m 36.9s 52.22s 29.75s Average 2m 35s 54.77s 30.69s www.flydata.com
  • 8. Discussion • Redshift can load data in parallel – Almost proportional to number of nodes • Redshift query speed is drastically affected by number of nodes – Big difference between Single and Multiple nodes – Two nodes are faster than two times the speed of one node – Additional experimentation may be helpful in future www.flydata.com
  • 9. APPENDIX - Data TSV files, gzip compressed Imp_lo g 1.2TB / 1.2B record date datetime publisher_id integer ad_campaign_id integer bid_price real country varchar(30) attr1-4 varchar(255) click_l og 5.6GB / 6M record date datetime publisher_id integer ad_campaign_id integer country varchar(30) attr1-4 varchar(255) ad_campai gn 100MB / 100k record publish er 10MB / 10k record advertis er 10MB / 10k record Using 5 tables, we run a query which join tables and creates a report. www.flydata.com
  • 10. appendix – Sample Query select ac.ad_campaign_id as ad_campaign_id, adv.advertiser_id as advertiser_id, cs.spending as spending, ims.imp_total as imp_total, cs.click_total as click_total, click_total/imp_total as CTR, spending/click_total as CPC, spending/(imp_total/1000) as CPM from ad_campaigns ac join advertisers adv on (ac.advertiser_id = adv.advertiser_id) join (select il.ad_campaign_id, count(*) as imp_total from imp_logs il group by il.ad_campaign_id ) ims on (ims.ad_campaign_id = ac.ad_campaign_id) join (select cl.ad_campaign_id, sum(cl.bid_price) as spending, count(*) as click_total from click_logs cl group by cl.ad_campaign_id ) cs on (cs.ad_campaign_id = ac.ad_campaign_id); The query generates a basic report for ad campaigns performance, imp, click numbers, advertiser spending, CTR, CPC and CPM. www.flydata.com
  • 11. APPENDIX – Additional Comments • A Redshift 8XL instance cannot be selected as single node (you must choose at least 2 nodes) (as of Feb. 24, 2013) www.flydata.com
  • 12. APPENDIX – Additional Information • All resources for our benchmark are on our github repository – https://github.com/hapyrus/redshift- https://github.com/hapyrus/redshift- benchmark – The dataset we use is open on S3, so you can reproduce the benchmark www.flydata.com
  • 13. About Us - FlyData • FlyData Enterprise – Enables continuous loading to Amazon Redshift, with real-time data loading – Automated ETL process with multiple supported data formats – Auto scaling, data Integrity and high durability – FlyData Sync feature allows real-time replication from RDBMS to Amazon Redshift Contact us at: info@flydata.com We are an official data integration partner of Amazon Redshift Formerly known as Hapyrus www.flydata.com
  • 14. www.flydata.com www.flydata.com Check us out! -> http://flydata.com sales@flydata.com Toll Free: 1-855-427-9787 http://flydata.com We are an official data integration partner of Amazon Redshift