Your SlideShare is downloading. ×
MetaZeta Clusters
Background of Paul Baclace2005-2006 Internet Archive with Doug  Cutting on Hadoop/Nutch2008-2010 AT&T interactive2010-2012...
Hadoop Clusters for Training•     Generate pre-configured clusters•     Identical and independent•     Hadoop, HDFS, HBase...
Cluster Requirements•     Access cluster via a single meta-page•     Avoid need for browser proxy or plugins•     No insta...
Per-Cluster Logical ViewJuly 13, 2012   MetaZeta.com        5
Web UI MapJuly 13, 2012   MetaZeta.com   6
Whirr + jcloudsJuly 13, 2012   MetaZeta.com      7
Whirr + jcloudsJuly 13, 2012   MetaZeta.com      8
July 13, 2012   MetaZeta.com   9
July 13, 2012   MetaZeta.com   10
July 13, 2012   MetaZeta.com   11
July 13, 2012   MetaZeta.com   12
July 13, 2012   MetaZeta.com   13
July 13, 2012   MetaZeta.com   14
July 13, 2012   MetaZeta.com   15
July 13, 2012   MetaZeta.com   16
Challenges•     Slow Package Installation Process•     Amazon EC2 throttling•     Failures after configuration changes•   ...
Slow Package Installation ProcessTotalTime = Nclusters * installLatencyinstallLatency = Npackages * repoLatencyTypical cas...
Slow Package Installation ProcessSolution:•   Pre-install everything on custom AMI•   Custom AMI can be slower to loadJuly...
Amazon EC2 throttling   EC2 API Request Rate    At human speeds:           • 100-2000msec latency           • Short sleep ...
Amazon EC2 throttlingSolution:•   Avoidance by rate-limiting all requests•   Use heuristics to estimate lead-time needed t...
EC2 or Config FailuresSolution:•   Acceptance Testing of       HDFS       Map-Reduce       Hive       HBase       Hive + H...
ResultsNode Allocation: 287sec median, 467sec 95th%Config: 94sec median, 134sec 95th%Testing: 147sec median, 155sec 95th%T...
CreditsThank you to:• Tom White for starting Whirr• Adrian Cole for starting jclouds• All the contributors to each project...
Pointers•     http://metazeta.com/•     http://www.jclouds.org/•     http://whirr.apache.org/July 13, 2012   MetaZeta.com ...
Upcoming SlideShare
Loading in...5
×

MetaZeta Clusters Overview

466

Published on

Presentation at SVForum Cloud SIG June 26, 2012. I described the MetaZeta.com cluster provisioning service and went into detail about how multiple clusters are coordinated despite Amazon Web Service EC2 request throttling. Techniques for fast spin-up are discussed.

The MetaZeta clusters system was created to spawn clusters for big data, Hadoop, Hive, and HBase training classes where each student gets a dedicated cluster. Screenshots of the clusters are also included.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
466
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Photo Credit: Paul Baclace * Hadoop and Cloud Computing Synergy ** Open Source means no license fee per node ** Cloud computing enables anyone to use Hadoop
  • Transcript of "MetaZeta Clusters Overview"

    1. 1. MetaZeta Clusters
    2. 2. Background of Paul Baclace2005-2006 Internet Archive with Doug Cutting on Hadoop/Nutch2008-2010 AT&T interactive2010-2012 Euclid Elements, Yoterra, Zettaset, GroupAngle.com, ProductSignals.com, ThirdEye, HortonworksJuly 13, 2012 MetaZeta.com 2
    3. 3. Hadoop Clusters for Training• Generate pre-configured clusters• Identical and independent• Hadoop, HDFS, HBase, Hive, Pig• Spawn N clusters for deadline• Minimize setup needed by studentJuly 13, 2012 MetaZeta.com 3
    4. 4. Cluster Requirements• Access cluster via a single meta-page• Avoid need for browser proxy or plugins• No installation required for student laptop• ssh is optionalJuly 13, 2012 MetaZeta.com 4
    5. 5. Per-Cluster Logical ViewJuly 13, 2012 MetaZeta.com 5
    6. 6. Web UI MapJuly 13, 2012 MetaZeta.com 6
    7. 7. Whirr + jcloudsJuly 13, 2012 MetaZeta.com 7
    8. 8. Whirr + jcloudsJuly 13, 2012 MetaZeta.com 8
    9. 9. July 13, 2012 MetaZeta.com 9
    10. 10. July 13, 2012 MetaZeta.com 10
    11. 11. July 13, 2012 MetaZeta.com 11
    12. 12. July 13, 2012 MetaZeta.com 12
    13. 13. July 13, 2012 MetaZeta.com 13
    14. 14. July 13, 2012 MetaZeta.com 14
    15. 15. July 13, 2012 MetaZeta.com 15
    16. 16. July 13, 2012 MetaZeta.com 16
    17. 17. Challenges• Slow Package Installation Process• Amazon EC2 throttling• Failures after configuration changes• Occasional failures of EC2 nodes Boot failure DNS server failure Package repo availabilityJuly 13, 2012 MetaZeta.com 17
    18. 18. Slow Package Installation ProcessTotalTime = Nclusters * installLatencyinstallLatency = Npackages * repoLatencyTypical case repoLatency = 10-20secWorst case repoLatency = ∞July 13, 2012 MetaZeta.com 18
    19. 19. Slow Package Installation ProcessSolution:• Pre-install everything on custom AMI• Custom AMI can be slower to loadJuly 13, 2012 MetaZeta.com 19
    20. 20. Amazon EC2 throttling EC2 API Request Rate At human speeds: • 100-2000msec latency • Short sleep in between Remove sleep time: • 2-20sec latency Overlap requests in parallel: • HTTP 500 (no donut for you)July 13, 2012 MetaZeta.com 20
    21. 21. Amazon EC2 throttlingSolution:• Avoidance by rate-limiting all requests• Use heuristics to estimate lead-time needed to spawn N clustersJuly 13, 2012 MetaZeta.com 21
    22. 22. EC2 or Config FailuresSolution:• Acceptance Testing of HDFS Map-Reduce Hive HBase Hive + HBaseJuly 13, 2012 MetaZeta.com 22
    23. 23. ResultsNode Allocation: 287sec median, 467sec 95th%Config: 94sec median, 134sec 95th%Testing: 147sec median, 155sec 95th%Tagging: 79sec median, 155sec 95th%Overall: 520sec median, 777sec 95th%July 13, 2012 MetaZeta.com 23
    24. 24. CreditsThank you to:• Tom White for starting Whirr• Adrian Cole for starting jclouds• All the contributors to each projectJuly 13, 2012 MetaZeta.com 24
    25. 25. Pointers• http://metazeta.com/• http://www.jclouds.org/• http://whirr.apache.org/July 13, 2012 MetaZeta.com 25

    ×