Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Where to Deploy Hadoop: Bare-metal or Cloud?

2,766 views

Published on

Visit for more information: http://j.mp/hadoopstudy

Presentation from Hadoop Summit '13 in San Jose on the Hadoop Deployment Comparison Study from Accenture Technology Labs. In the study we compare the price-performance ratio of a bare-metal Hadoop cluster with Hadoop-as-a-service at the matched total cost of ownership (TCO) level using real-world applications modeled by the Accenture Data Platform Benchmark.

Published in: Technology
  • Be the first to comment

Where to Deploy Hadoop: Bare-metal or Cloud?

  1. 1. Where to Deploy Hadoop: Bare-metal or Cloud? Michael Wendt Data Insights R&D Group
  2. 2. Copyright © 2013 Accenture All rights reserved. 2 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting
  3. 3. Copyright © 2013 Accenture All rights reserved. 3 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting Data Privacy Data Gravity Price-Performance Ratio Productivity of Developers & Data Scientists Data Enrichment
  4. 4. Copyright © 2013 Accenture All rights reserved. 4 Big Data: Bare-metal vs. Cloud Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Hadoop Appliance Hadoop Hosting Data Privacy Data Gravity Price-Performance Ratio Productivity of Developers & Data Scientists Data Enrichment
  5. 5. Copyright © 2013 Accenture All rights reserved. 5Servers designed by Daniel Campos from The Noun Project Price-Performance Ratio Views Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Cloud? Virtualized? Slow! Who cares! I’m cheap, just throw more in! Price-Performance Ratio
  6. 6. Copyright © 2013 Accenture All rights reserved. 6 Hadoop Deployment Comparison Study Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark + TCO analysis Price-Performance Ratio Price-Performance Ratio
  7. 7. Copyright © 2013 Accenture All rights reserved. 7 Hadoop Deployment Comparison Study TCO Analysis Price-Performance Ratio Bare-metal Cloud On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark + TCO analysis
  8. 8. Copyright © 2013 Accenture All rights reserved. 8 TCO of Bare-metal Hadoop Cluster On-premise full custom Server hardware Staff for operation Data center facility and electricity Technical support 24 server nodes and 50 TB of HDFS capacity* small-scale initial production deployment $3,000.00 $2,914.58 $6,656.00 $9,274.46 $21,845.04 Servers designed by Daniel Campos from The Noun Project
  9. 9. Copyright © 2013 Accenture All rights reserved. 9 TCO of Hadoop-as-a-Service Hadoop-as- a-Service Hadoop service Staff for operation Storage services Technical support Used bare-metal TCO for budget Calculated the number of affordable instances $15,318.28 $2,063.00 $1,372.27 $3,091.49 $21,845.04
  10. 10. Copyright © 2013 Accenture All rights reserved. 10 TCO of Hadoop-as-a-Service – Instances Hadoop service 14 instance types 3 pricing models 42 combinations Hadoop-as- a-Service
  11. 11. Copyright © 2013 Accenture All rights reserved. 11 TCO of Hadoop-as-a-Service – Instances Hadoop service m1.xl m2.4xl cc2.8xl Selected representative 3 instance types: m1.xlarge, m2.4xlarge, cc2.8xlarge Hadoop-as- a-Service
  12. 12. Copyright © 2013 Accenture All rights reserved. 12 TCO of Hadoop-as-a-Service – Affordable Instances Hadoop service 50% cluster utilization assumed 1/3 of budget allocated for Spot instances Instance type On-demand instances (ODI) Reserved instances (RI) Reserved + Spot instances (RI + SI) m1.xlarge 68 112 192 m2.4xlarge 20 41 77 cc2.8xlarge 13 28 53$15,318.28 Hadoop-as- a-Service
  13. 13. Copyright © 2013 Accenture All rights reserved. 13 Hadoop Deployment Comparison Study Accenture Data Platform Benchmark Price-Performance Ratio Bare-metal Cloud On-premise full custom Hadoop-as- a-Service + TCO analysis Accenture Data Platform Benchmark
  14. 14. Copyright © 2013 Accenture All rights reserved. 14 Accenture Data Platform Benchmark Log management Sessionization Customer preference prediction Recommendation engine Text Analytics Document clustering Use cases Workload Suite of real-world Hadoop MapReduce applications From client experience, internal roadmap, public literature Open- source libraries & public datasets Categorized & selected common use cases
  15. 15. Copyright © 2013 Accenture All rights reserved. 15 Accenture Data Platform Benchmark: Sessionization Log data Sessions Log data Bucketing Sorting Slicing Log data A session is a sequence of related interactions, useful to analyze as a group ~150 billion log entries, ~24 TB 1 million users, 1.1 billion sessions
  16. 16. Copyright © 2013 Accenture All rights reserved. 16 Accenture Data Platform Benchmark: Recommendation Engine Ratings data Who rated what item? Co-occurrence matrix How many people rated the pair of items? Recommendation Given the way the person rated these items, he/she is likely to be interested in these other items. Used item-based collaborative filtering algorithm Mahout example library used as foundation Generated 300 million ratings 3 million population, 50,000 items
  17. 17. Copyright © 2013 Accenture All rights reserved. 17 Accenture Data Platform Benchmark: Document Clustering Corpus of crawled web pages Filtered and tokenized documents Term dictionary TF vectors Clustered documents K-means TF-IDF vectors Groups similar documents Application components used in many areas (e.g., search engines, e-commerce site optimization) Common Crawl dataset, 10 TB corpus* ~31,000 ARC files or ~300 million HTML pages
  18. 18. Copyright © 2013 Accenture All rights reserved. 18 TCO analysis Hadoop Deployment Comparison Study Experiment Setup/Results Bare-metal Cloud + On-premise full custom Hadoop-as- a-Service Accenture Data Platform Benchmark Price-Performance Ratio
  19. 19. Copyright © 2013 Accenture All rights reserved. 19 Experiment Setup: Price-Performance Ratio Comparison Bare-metal Hadoop Cluster Amazon EMR Clusters 1 bare-metal cluster vs. 9 Amazon EMR clusters Manual and automated tuning Fixed budget for cluster size Measure execution time of benchmark Price-Performance Ratio
  20. 20. Copyright © 2013 Accenture All rights reserved. 20 Optimize phase Profile phase Experiment Setup: Starfish Automated Performance Tuning Tool Starfish (now Unravel) is an automated performance tuning tool for MapReduce jobs Speedometer designed by Filippo Camedda from The Noun Project For the experiment we ran each benchmark twice using Starfish Manual and automated tuning Measure execution time of optimize phase
  21. 21. Copyright © 2013 Accenture All rights reserved. 21 Experiment Results: Starfish Automated Performance Tuning Tool Manual and automated tuning Starfish tuned Recommendation Engine workload w/ 11 cascaded MapReduce jobs Manually tuned Sessionization workload 2+ weeks of manual tuning, ½ - 1 day iterations 8x improvement in one tuning cycle Achieve performance increases with less cost using Starfish
  22. 22. Copyright © 2013 Accenture All rights reserved. 22 408.07 229.25 125.82 381.55 204.10 166.82 250.13 172.23 114.35 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Sessionization Bare-metal: 533 13 20 68 28 41 112 53 77 192
  23. 23. Copyright © 2013 Accenture All rights reserved. 23 23.33 21.97 18.48 20.13 19.97 16.92 14.28 16.30 15.08 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Recommendation Engine Bare-metal: 21.59 13 20 68 28 41 112 53 77 192
  24. 24. Copyright © 2013 Accenture All rights reserved. 24 1661.03 1157.37 784.82 1649.98 1112.68 629.98 914.35 779.98 742.38 ODI RI RI+SI ExecutionTime(minutes) Amazon EMR Configuration cc2.8xlarge m2.4xlarge m1.xlarge Experiment Results: Document Clustering Bare-metal: 1186.37 13 20 68 28 41 112 53 77 192
  25. 25. Copyright © 2013 Accenture All rights reserved. 25 Key Takeaways Hadoop-as-a-Service offers a better price- performance ratio Cloud expands the performance tuning opportunities Automated performance tuning tools are a necessity Servers designed by Daniel Campos from The Noun Project
  26. 26. Copyright © 2013 Accenture All rights reserved. 26 Acknowledgement
  27. 27. Copyright © 2013 Accenture All rights reserved. 27 More details Contact us for the full white paper: Hadoop Deployment Comparison Study Michael Wendt R&D Developer Data Insights R&D Accenture Technology Labs @mike_wendt michael.e.wendt(at)accenture.com Scott Kurth Group Lead Data Insights R&D Accenture Technology Labs scott.kurth(at)accenture.com

×