SlideShare a Scribd company logo
1 of 22
Amazon Redshift SSD
- Queries on TBs of data can
run in a few seconds
FlyData: Amazon Redshift
BENCHMARK Series 03
www.flydata.com
Amazon Redshift HDD took 33.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD took 4.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD performed 8X faster
Takeaways:
•1.2 TB can now be handled in under
10 seconds.
•Use cases could spread to ad-delivery
optimization and financial trading
systems.
www.flydata.com
Amazon Redshift is a popular data warehouse for
big data on the cloud. AWS added the SSD instance
type on January 24, 2014.
We have run benchmarks to compare Redshift SSD
instances to Redshift HDD instances using the
following parameters:
• Data Size: 1.2TB and 300GB
• Query performance when
querying against all records in the cluster
• Loading speed
• Cost comparison
www.flydata.com
1. Query Speed for similar cluster sizes
• SSD version is
faster.
• Query against
1.2TB (entire data
set) took less than
10 seconds!
• For 1.2TB of data,
comparing similar
node sizes:
query time: 9.22s
(SSD) vs 28.48s
(HDD 8XLx2)
* See Appendix for queries being used.
Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data.
In order of cost
www.flydata.com
1. Query Speed at similar pricing points
• Query performance comparison based
on similar pricing point.
• 4 nodes of dw2.large cost:
$0.25(/hour) * 4(nodes) = $1.00(/hour)
• 1 node of dw1.xlarge cost:
$0.85(/hour)
• Direct comparison is difficult, but we
can see much better query
performance for the dw2 (SSD)
Redshift.
* See Appendix for queries being used.
Comparison of query speed for cluster configurations with similar pricing for 300GB of data.
www.flydata.com
2. Loading Time
• For similar cost
(DW2:$1.00/hour vs
DW1:$0.85/hour),
loading time was 4.6x
faster on SSD.
• For similar node sizes
(DW2:12 nodes vs
DW1:16 nodes),
loading time was
1.65x faster on SSD.
* See Appendix for queries being used.
Similar Cost Similar Node
Count
www.flydata.com
7
DW2 Cheaper when
data < 0.48TB
TB
3. Cost
Pricing Ondemand RI1 RI3
Hourly Upfront Hourly Upfront Hourly
dw1 $0.85 $2500 $0.215 $3000 $0.114
dw2 $0.25 $750 $0.075 $1325 $0.05
www.flydata.com
Summary
• Consider DW2 SSD Redshift
– If Query and Loading Performance is primary
and cost considerations are secondary
– If your data is smaller than 0.48TBs
• Consider DW1 HDD Redshift
– If current DW1 Redshift performance is
sufficient
– If DW2 costs are too expensive for your use
case
www.flydata.com
About Us - FlyData
• FlyData Enterprise
– Enables continuous loading to Amazon Redshift,
with real-time data loading
– Automated ETL process with multiple supported
data formats
– Auto scaling, data Integrity and high durability
– FlyData Sync feature allows real-time replication
from RDBMS to Amazon Redshift
Contact us at: info@flydata.com
We are an official data
integration partner of
Amazon Redshift
www.flydata.com
APPENDIX
www.flydata.com
Appendix: Data Loaded for Testing
TSV files, gzip compressed
Imp_lo
g
1) 300GB / 300M
record
2) 1.2TB / 1.2B record date datetime
publisher_id integer
ad_campaign_id integer
bid_price real
country varchar(30)
attr1-4 varchar(255)
click_l
og
1) 1.4GB / 1.5M
record
2) 5.6GB / 6M recorddate datetime
publisher_id integer
ad_campaign_id integer
country varchar(30)
attr1-4 varchar(255)
1) for 1 month
2) for 4
months
ad_campai
gn
100MB / 100k
record
publish
er
10MB / 10k
record
advertis
er
10MB / 10k
record
We used 5 tables to run a query which joins tables and creates a report.
www.flydata.com
Appendix: Sample Query
select
ac.ad_campaign_id as ad_campaign_id,
adv.advertiser_id as advertiser_id,
cs.spending as spending,
ims.imp_total as imp_total,
cs.click_total as click_total,
click_total/imp_total as CTR,
spending/click_total as CPC,
spending/(imp_total/1000) as CPM
from
ad_campaigns ac
join
advertisers adv
on (ac.advertiser_id = adv.advertiser_id)
join
(select
il.ad_campaign_id,
count(*) as imp_total
from
imp_logs il
group by
il.ad_campaign_id
) ims on (ims.ad_campaign_id =
ac.ad_campaign_id)
join
(select
cl.ad_campaign_id,
sum(cl.bid_price) as spending,
count(*) as click_total
from
click_logs cl
group by
cl.ad_campaign_id
) cs on (cs.ad_campaign_id = ac.ad_campaign_id);
The query generates a basic report for ad campaigns performance, imp, click numbers,
advertiser spending, CTR, CPC and CPM. The query runs against all data in the
cluster.
www.flydata.com
Query Performance: Data Size = 1.2 TB
Query Process
time(1.2TB) 12x DW2.large 1x DW1.xlarge
2x
DW1.xlarge
2x
DW1.8xlarge
trial Sample Query Sample Query
Sample
Query
Sample
Query
1 15.3 163.85 61.44 39.11ignore
2 8.8 148.65 52.89 26.77
3 9.71 157.65 53.76 29.9
4 9.12 155.91 53.52 27.51
5 9.24 149.04 52.22 29.75
average 9.2175 155.02 53.0975 28.4825
(In seconds)
www.flydata.com
Query Performance: Data Size = 300GB
Query Process
time(300GB) 4x DW2.large 1x DW1.xlarge
trial Sample Query Sample Query
1 9.05 58ignore
2 4.31 42.69
3 4.65 30.84
4 4.13 30.14
5 4.17 29.6
average 4.315 33.3175
(In seconds)
www.flydata.com
Appendix: Additional Information
• All resources for our benchmark are on
our github repository
– https://github.com/hapyrus/redshift-
https://github.com/hapyrus/redshift-
benchmark
– The dataset we use is open on S3, so you
can reproduce the benchmark
www.flydata.com
Summary: Amazon Redshift Pricing
• DW1: Amazon Redshift (HHD)
• DW2: Amazon Redshift (SSD)
– Cost is around 4x more expensive
– If storage need is less than 0.48TB, then DW2
is cheaper
16
www.flydata.com
Cost comparison:
1XL of DW1 (2TB),
4XL of DW2 (0.64TB) and 12XL of DW2 (1.92TB)
17
www.flydata.com
18
x
x
For the same storage space,
DW2 SSD can be 5.2 times higher
www.flydata.com
19
www.flydata.com
20
www.flydata.com
Additional Comments
• SSD could be 3.5x ~ 5x more expensive than
HDD for the same amount of storage space
(SSD is really optimized for performance)
• DW1.8xlarge is exactly 8 times a DW1.xlarge,
but DW2.8xlarge is actually 16 times a
DW2.large. This is because DW2.large nodes
are not “xlarge”; a bit confusing… ;)
(as of Jan. 27, 2014)
www.flydata.com
www.flydata.com www.flydata.com
Check us out!
-> http://flydata.com
sales@flydata.com
Toll Free: 1-855-427-9787
http://flydata.com
We are an official data integration
partner of Amazon Redshift

More Related Content

Viewers also liked

Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Amazon Web Services
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...Amazon Web Services
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptSurvey Department
 
The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones NeuraInc
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmarkatscaleinc
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonRoberto Gaiser
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19Amazon Web Services
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and familyFrancois Marier
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StoryBrian Cline
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extensioncucufrog
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages Priyank Kapadia
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralovedamovsky
 
My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012james tong
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsJames Bromberger
 
Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"kdmitchell
 
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd DinkelmanSSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd Dinkelmannomathjobs
 

Viewers also liked (20)

Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
Building a Real Time Dashboard with Amazon Kinesis, Amazon Lambda and Amazon ...
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 
The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones The Biggest Internet of Things Milestones
The Biggest Internet of Things Milestones
 
Business Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop BenchmarkBusiness Intelligence on Hadoop Benchmark
Business Intelligence on Hadoop Benchmark
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19
 
Supporting Debian machines for friends and family
Supporting Debian machines for friends and familySupporting Debian machines for friends and family
Supporting Debian machines for friends and family
 
Swift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer StorySwift at Scale: The IBM SoftLayer Story
Swift at Scale: The IBM SoftLayer Story
 
Disksim with SSD_extension
Disksim with SSD_extensionDisksim with SSD_extension
Disksim with SSD_extension
 
How to build Debian packages
How to build Debian packages How to build Debian packages
How to build Debian packages
 
MySQL and SSD
MySQL and SSDMySQL and SSD
MySQL and SSD
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralove
 
My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012My sql ssd-mysqluc-2012
My sql ssd-mysqluc-2012
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIs
 
Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"Myths and Legends- "The Pantheon"
Myths and Legends- "The Pantheon"
 
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd DinkelmanSSD vs HDD - A Shift In Data Storage by Todd Dinkelman
SSD vs HDD - A Shift In Data Storage by Todd Dinkelman
 

More from FlyData Inc.

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureFlyData Inc.
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data ScienceFlyData Inc.
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftFlyData Inc.
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterFlyData Inc.
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of ThingsFlyData Inc.
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!FlyData Inc.
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData FlyData Inc.
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Inc.
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較 FlyData Inc.
 

More from FlyData Inc. (11)

What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Three Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data InfrastructureThree Things to Consider When Making Investments in Your Big Data Infrastructure
Three Things to Consider When Making Investments in Your Big Data Infrastructure
 
Cognitive Biases in Data Science
Cognitive Biases in Data ScienceCognitive Biases in Data Science
Cognitive Biases in Data Science
 
How to Extract Data from Amazon Redshift
How to Extract Data from Amazon RedshiftHow to Extract Data from Amazon Redshift
How to Extract Data from Amazon Redshift
 
Amazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift ClusterAmazon Redshift - Create an Amazon Redshift Cluster
Amazon Redshift - Create an Amazon Redshift Cluster
 
The Internet of Things
The Internet of ThingsThe Internet of Things
The Internet of Things
 
Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!Create an Amazon Redshift Cluster with FlyData!
Create an Amazon Redshift Cluster with FlyData!
 
Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData Near Real-Time Data Analysis With FlyData
Near Real-Time Data Analysis With FlyData
 
FlyData Autoload: 事例集
FlyData Autoload: 事例集FlyData Autoload: 事例集
FlyData Autoload: 事例集
 
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
Amazon Redshift ベンチマーク  Hadoop + Hiveと比較 Amazon Redshift ベンチマーク  Hadoop + Hiveと比較
Amazon Redshift ベンチマーク Hadoop + Hiveと比較
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Amazon Redshift SSD - Queries on TBs of data can run in a few seconds

  • 1. Amazon Redshift SSD - Queries on TBs of data can run in a few seconds FlyData: Amazon Redshift BENCHMARK Series 03 www.flydata.com
  • 2. Amazon Redshift HDD took 33.32 seconds to run our queries for 300GB data Amazon Redshift SSD took 4.32 seconds to run our queries for 300GB data Amazon Redshift SSD performed 8X faster Takeaways: •1.2 TB can now be handled in under 10 seconds. •Use cases could spread to ad-delivery optimization and financial trading systems. www.flydata.com
  • 3. Amazon Redshift is a popular data warehouse for big data on the cloud. AWS added the SSD instance type on January 24, 2014. We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances using the following parameters: • Data Size: 1.2TB and 300GB • Query performance when querying against all records in the cluster • Loading speed • Cost comparison www.flydata.com
  • 4. 1. Query Speed for similar cluster sizes • SSD version is faster. • Query against 1.2TB (entire data set) took less than 10 seconds! • For 1.2TB of data, comparing similar node sizes: query time: 9.22s (SSD) vs 28.48s (HDD 8XLx2) * See Appendix for queries being used. Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data. In order of cost www.flydata.com
  • 5. 1. Query Speed at similar pricing points • Query performance comparison based on similar pricing point. • 4 nodes of dw2.large cost: $0.25(/hour) * 4(nodes) = $1.00(/hour) • 1 node of dw1.xlarge cost: $0.85(/hour) • Direct comparison is difficult, but we can see much better query performance for the dw2 (SSD) Redshift. * See Appendix for queries being used. Comparison of query speed for cluster configurations with similar pricing for 300GB of data. www.flydata.com
  • 6. 2. Loading Time • For similar cost (DW2:$1.00/hour vs DW1:$0.85/hour), loading time was 4.6x faster on SSD. • For similar node sizes (DW2:12 nodes vs DW1:16 nodes), loading time was 1.65x faster on SSD. * See Appendix for queries being used. Similar Cost Similar Node Count www.flydata.com
  • 7. 7 DW2 Cheaper when data < 0.48TB TB 3. Cost Pricing Ondemand RI1 RI3 Hourly Upfront Hourly Upfront Hourly dw1 $0.85 $2500 $0.215 $3000 $0.114 dw2 $0.25 $750 $0.075 $1325 $0.05 www.flydata.com
  • 8. Summary • Consider DW2 SSD Redshift – If Query and Loading Performance is primary and cost considerations are secondary – If your data is smaller than 0.48TBs • Consider DW1 HDD Redshift – If current DW1 Redshift performance is sufficient – If DW2 costs are too expensive for your use case www.flydata.com
  • 9. About Us - FlyData • FlyData Enterprise – Enables continuous loading to Amazon Redshift, with real-time data loading – Automated ETL process with multiple supported data formats – Auto scaling, data Integrity and high durability – FlyData Sync feature allows real-time replication from RDBMS to Amazon Redshift Contact us at: info@flydata.com We are an official data integration partner of Amazon Redshift www.flydata.com
  • 11. Appendix: Data Loaded for Testing TSV files, gzip compressed Imp_lo g 1) 300GB / 300M record 2) 1.2TB / 1.2B record date datetime publisher_id integer ad_campaign_id integer bid_price real country varchar(30) attr1-4 varchar(255) click_l og 1) 1.4GB / 1.5M record 2) 5.6GB / 6M recorddate datetime publisher_id integer ad_campaign_id integer country varchar(30) attr1-4 varchar(255) 1) for 1 month 2) for 4 months ad_campai gn 100MB / 100k record publish er 10MB / 10k record advertis er 10MB / 10k record We used 5 tables to run a query which joins tables and creates a report. www.flydata.com
  • 12. Appendix: Sample Query select ac.ad_campaign_id as ad_campaign_id, adv.advertiser_id as advertiser_id, cs.spending as spending, ims.imp_total as imp_total, cs.click_total as click_total, click_total/imp_total as CTR, spending/click_total as CPC, spending/(imp_total/1000) as CPM from ad_campaigns ac join advertisers adv on (ac.advertiser_id = adv.advertiser_id) join (select il.ad_campaign_id, count(*) as imp_total from imp_logs il group by il.ad_campaign_id ) ims on (ims.ad_campaign_id = ac.ad_campaign_id) join (select cl.ad_campaign_id, sum(cl.bid_price) as spending, count(*) as click_total from click_logs cl group by cl.ad_campaign_id ) cs on (cs.ad_campaign_id = ac.ad_campaign_id); The query generates a basic report for ad campaigns performance, imp, click numbers, advertiser spending, CTR, CPC and CPM. The query runs against all data in the cluster. www.flydata.com
  • 13. Query Performance: Data Size = 1.2 TB Query Process time(1.2TB) 12x DW2.large 1x DW1.xlarge 2x DW1.xlarge 2x DW1.8xlarge trial Sample Query Sample Query Sample Query Sample Query 1 15.3 163.85 61.44 39.11ignore 2 8.8 148.65 52.89 26.77 3 9.71 157.65 53.76 29.9 4 9.12 155.91 53.52 27.51 5 9.24 149.04 52.22 29.75 average 9.2175 155.02 53.0975 28.4825 (In seconds) www.flydata.com
  • 14. Query Performance: Data Size = 300GB Query Process time(300GB) 4x DW2.large 1x DW1.xlarge trial Sample Query Sample Query 1 9.05 58ignore 2 4.31 42.69 3 4.65 30.84 4 4.13 30.14 5 4.17 29.6 average 4.315 33.3175 (In seconds) www.flydata.com
  • 15. Appendix: Additional Information • All resources for our benchmark are on our github repository – https://github.com/hapyrus/redshift- https://github.com/hapyrus/redshift- benchmark – The dataset we use is open on S3, so you can reproduce the benchmark www.flydata.com
  • 16. Summary: Amazon Redshift Pricing • DW1: Amazon Redshift (HHD) • DW2: Amazon Redshift (SSD) – Cost is around 4x more expensive – If storage need is less than 0.48TB, then DW2 is cheaper 16 www.flydata.com
  • 17. Cost comparison: 1XL of DW1 (2TB), 4XL of DW2 (0.64TB) and 12XL of DW2 (1.92TB) 17 www.flydata.com
  • 18. 18 x x For the same storage space, DW2 SSD can be 5.2 times higher www.flydata.com
  • 21. Additional Comments • SSD could be 3.5x ~ 5x more expensive than HDD for the same amount of storage space (SSD is really optimized for performance) • DW1.8xlarge is exactly 8 times a DW1.xlarge, but DW2.8xlarge is actually 16 times a DW2.large. This is because DW2.large nodes are not “xlarge”; a bit confusing… ;) (as of Jan. 27, 2014) www.flydata.com
  • 22. www.flydata.com www.flydata.com Check us out! -> http://flydata.com sales@flydata.com Toll Free: 1-855-427-9787 http://flydata.com We are an official data integration partner of Amazon Redshift