Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MetaZeta Clusters Overview

1,168 views

Published on

Presentation at SVForum Cloud SIG June 26, 2012. I described the MetaZeta.com cluster provisioning service and went into detail about how multiple clusters are coordinated despite Amazon Web Service EC2 request throttling. Techniques for fast spin-up are discussed.

The MetaZeta clusters system was created to spawn clusters for big data, Hadoop, Hive, and HBase training classes where each student gets a dedicated cluster. Screenshots of the clusters are also included.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

MetaZeta Clusters Overview

  1. 1. MetaZeta Clusters
  2. 2. Background of Paul Baclace2005-2006 Internet Archive with Doug Cutting on Hadoop/Nutch2008-2010 AT&T interactive2010-2012 Euclid Elements, Yoterra, Zettaset, GroupAngle.com, ProductSignals.com, ThirdEye, HortonworksJuly 13, 2012 MetaZeta.com 2
  3. 3. Hadoop Clusters for Training• Generate pre-configured clusters• Identical and independent• Hadoop, HDFS, HBase, Hive, Pig• Spawn N clusters for deadline• Minimize setup needed by studentJuly 13, 2012 MetaZeta.com 3
  4. 4. Cluster Requirements• Access cluster via a single meta-page• Avoid need for browser proxy or plugins• No installation required for student laptop• ssh is optionalJuly 13, 2012 MetaZeta.com 4
  5. 5. Per-Cluster Logical ViewJuly 13, 2012 MetaZeta.com 5
  6. 6. Web UI MapJuly 13, 2012 MetaZeta.com 6
  7. 7. Whirr + jcloudsJuly 13, 2012 MetaZeta.com 7
  8. 8. Whirr + jcloudsJuly 13, 2012 MetaZeta.com 8
  9. 9. July 13, 2012 MetaZeta.com 9
  10. 10. July 13, 2012 MetaZeta.com 10
  11. 11. July 13, 2012 MetaZeta.com 11
  12. 12. July 13, 2012 MetaZeta.com 12
  13. 13. July 13, 2012 MetaZeta.com 13
  14. 14. July 13, 2012 MetaZeta.com 14
  15. 15. July 13, 2012 MetaZeta.com 15
  16. 16. July 13, 2012 MetaZeta.com 16
  17. 17. Challenges• Slow Package Installation Process• Amazon EC2 throttling• Failures after configuration changes• Occasional failures of EC2 nodes Boot failure DNS server failure Package repo availabilityJuly 13, 2012 MetaZeta.com 17
  18. 18. Slow Package Installation ProcessTotalTime = Nclusters * installLatencyinstallLatency = Npackages * repoLatencyTypical case repoLatency = 10-20secWorst case repoLatency = ∞July 13, 2012 MetaZeta.com 18
  19. 19. Slow Package Installation ProcessSolution:• Pre-install everything on custom AMI• Custom AMI can be slower to loadJuly 13, 2012 MetaZeta.com 19
  20. 20. Amazon EC2 throttling EC2 API Request Rate At human speeds: • 100-2000msec latency • Short sleep in between Remove sleep time: • 2-20sec latency Overlap requests in parallel: • HTTP 500 (no donut for you)July 13, 2012 MetaZeta.com 20
  21. 21. Amazon EC2 throttlingSolution:• Avoidance by rate-limiting all requests• Use heuristics to estimate lead-time needed to spawn N clustersJuly 13, 2012 MetaZeta.com 21
  22. 22. EC2 or Config FailuresSolution:• Acceptance Testing of HDFS Map-Reduce Hive HBase Hive + HBaseJuly 13, 2012 MetaZeta.com 22
  23. 23. ResultsNode Allocation: 287sec median, 467sec 95th%Config: 94sec median, 134sec 95th%Testing: 147sec median, 155sec 95th%Tagging: 79sec median, 155sec 95th%Overall: 520sec median, 777sec 95th%July 13, 2012 MetaZeta.com 23
  24. 24. CreditsThank you to:• Tom White for starting Whirr• Adrian Cole for starting jclouds• All the contributors to each projectJuly 13, 2012 MetaZeta.com 24
  25. 25. Pointers• http://metazeta.com/• http://www.jclouds.org/• http://whirr.apache.org/July 13, 2012 MetaZeta.com 25

×