Your SlideShare is downloading. ×
0

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Amazon Redshift SSD - Queries on TBs of data can run in a few seconds

23,513

Published on

We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances. See our blog at …

We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances. See our blog at https://flydata.com/blog/posts/with-amazon-redshift-ssd-querying-a-tb-of-data-took-less-than-10-seconds

Published in: Technology, Business
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
23,513
On Slideshare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
73
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Amazon Redshift SSD - Queries on TBs of data can run in a few seconds FlyData: Amazon Redshift BENCHMARK Series 03 www.flydata.com
  • 2. Amazon Redshift HDD took 33.32 seconds to run our queries for 300GB data Amazon Redshift SSD took 4.32 seconds to run our queries for 300GB data Amazon Redshift SSD performed 8X faster Takeaways: •1.2 TB can now be handled in under 10 seconds. •Use cases could spread to ad-delivery optimization and financial trading systems. www.flydata.com
  • 3. Amazon Redshift is a popular data warehouse for big data on the cloud. AWS added the SSD instance type on January 24, 2014. We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances using the following parameters: • Data Size: 1.2TB and 300GB • Query performance when querying against all records in the cluster • Loading speed • Cost comparison www.flydata.com
  • 4. 1. Query Speed for similar cluster sizes • SSD version is faster. • Query against 1.2TB (entire data set) took less than 10 seconds! • For 1.2TB of data, comparing similar node sizes: query time: 9.22s (SSD) vs 28.48s (HDD 8XLx2) * See Appendix for queries being used. Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data. In order of cost www.flydata.com
  • 5. 1. Query Speed at similar pricing points • Query performance comparison based on similar pricing point. • 4 nodes of dw2.large cost: $0.25(/hour) * 4(nodes) = $1.00(/hour) • 1 node of dw1.xlarge cost: $0.85(/hour) • Direct comparison is difficult, but we can see much better query performance for the dw2 (SSD) Redshift. * See Appendix for queries being used. Comparison of query speed for cluster configurations with similar pricing for 300GB of data. www.flydata.com
  • 6. 2. Loading Time • For similar cost (DW2:$1.00/hour vs DW1:$0.85/hour), loading time was 4.6x faster on SSD. • For similar node sizes (DW2:12 nodes vs DW1:16 nodes), loading time was 1.65x faster on SSD. * See Appendix for queries being used. Similar Cost Similar Node Count www.flydata.com
  • 7. 7 DW2 Cheaper when data < 0.48TB TB 3. Cost Pricing Ondemand RI1 RI3 Hourly Upfront Hourly Upfront Hourly dw1 $0.85 $2500 $0.215 $3000 $0.114 dw2 $0.25 $750 $0.075 $1325 $0.05 www.flydata.com
  • 8. Summary • Consider DW2 SSD Redshift – If Query and Loading Performance is primary and cost considerations are secondary – If your data is smaller than 0.48TBs • Consider DW1 HDD Redshift – If current DW1 Redshift performance is sufficient – If DW2 costs are too expensive for your use case www.flydata.com
  • 9. About Us - FlyData • FlyData Enterprise – Enables continuous loading to Amazon Redshift, with real-time data loading – Automated ETL process with multiple supported data formats – Auto scaling, data Integrity and high durability – FlyData Sync feature allows real-time replication from RDBMS to Amazon Redshift Contact us at: info@flydata.com We are an official data integration partner of Amazon Redshift www.flydata.com
  • 10. APPENDIX www.flydata.com
  • 11. Appendix: Data Loaded for Testing TSV files, gzip compressed Imp_lo g 1) 300GB / 300M record 2) 1.2TB / 1.2B record date datetime publisher_id integer ad_campaign_id integer bid_price real country varchar(30) attr1-4 varchar(255) click_l og 1) 1.4GB / 1.5M record 2) 5.6GB / 6M recorddate datetime publisher_id integer ad_campaign_id integer country varchar(30) attr1-4 varchar(255) 1) for 1 month 2) for 4 months ad_campai gn 100MB / 100k record publish er 10MB / 10k record advertis er 10MB / 10k record We used 5 tables to run a query which joins tables and creates a report. www.flydata.com
  • 12. Appendix: Sample Query select ac.ad_campaign_id as ad_campaign_id, adv.advertiser_id as advertiser_id, cs.spending as spending, ims.imp_total as imp_total, cs.click_total as click_total, click_total/imp_total as CTR, spending/click_total as CPC, spending/(imp_total/1000) as CPM from ad_campaigns ac join advertisers adv on (ac.advertiser_id = adv.advertiser_id) join (select il.ad_campaign_id, count(*) as imp_total from imp_logs il group by il.ad_campaign_id ) ims on (ims.ad_campaign_id = ac.ad_campaign_id) join (select cl.ad_campaign_id, sum(cl.bid_price) as spending, count(*) as click_total from click_logs cl group by cl.ad_campaign_id ) cs on (cs.ad_campaign_id = ac.ad_campaign_id); The query generates a basic report for ad campaigns performance, imp, click numbers, advertiser spending, CTR, CPC and CPM. The query runs against all data in the cluster. www.flydata.com
  • 13. Query Performance: Data Size = 1.2 TB Query Process time(1.2TB) 12x DW2.large 1x DW1.xlarge 2x DW1.xlarge 2x DW1.8xlarge trial Sample Query Sample Query Sample Query Sample Query 1 15.3 163.85 61.44 39.11ignore 2 8.8 148.65 52.89 26.77 3 9.71 157.65 53.76 29.9 4 9.12 155.91 53.52 27.51 5 9.24 149.04 52.22 29.75 average 9.2175 155.02 53.0975 28.4825 (In seconds) www.flydata.com
  • 14. Query Performance: Data Size = 300GB Query Process time(300GB) 4x DW2.large 1x DW1.xlarge trial Sample Query Sample Query 1 9.05 58ignore 2 4.31 42.69 3 4.65 30.84 4 4.13 30.14 5 4.17 29.6 average 4.315 33.3175 (In seconds) www.flydata.com
  • 15. Appendix: Additional Information • All resources for our benchmark are on our github repository – https://github.com/hapyrus/redshift- https://github.com/hapyrus/redshift- benchmark – The dataset we use is open on S3, so you can reproduce the benchmark www.flydata.com
  • 16. Summary: Amazon Redshift Pricing • DW1: Amazon Redshift (HHD) • DW2: Amazon Redshift (SSD) – Cost is around 4x more expensive – If storage need is less than 0.48TB, then DW2 is cheaper 16 www.flydata.com
  • 17. Cost comparison: 1XL of DW1 (2TB), 4XL of DW2 (0.64TB) and 12XL of DW2 (1.92TB) 17 www.flydata.com
  • 18. 18 x x For the same storage space, DW2 SSD can be 5.2 times higher www.flydata.com
  • 19. 19 www.flydata.com
  • 20. 20 www.flydata.com
  • 21. Additional Comments • SSD could be 3.5x ~ 5x more expensive than HDD for the same amount of storage space (SSD is really optimized for performance) • DW1.8xlarge is exactly 8 times a DW1.xlarge, but DW2.8xlarge is actually 16 times a DW2.large. This is because DW2.large nodes are not “xlarge”; a bit confusing… ;) (as of Jan. 27, 2014) www.flydata.com
  • 22. www.flydata.com www.flydata.com Check us out! -> http://flydata.com sales@flydata.com Toll Free: 1-855-427-9787 http://flydata.com We are an official data integration partner of Amazon Redshift

×