Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scaling Hadoop Applications<br />Milind Bhandarkar<br />Distributed Data Systems Engineer<br />[mbhandarkar@linkedin.com]<...
About Me<br />http://www.linkedin.com/in/milindb<br />Founding member of Hadoop team at Yahoo! [2005-2010]<br />Contributo...
Outline<br />Scalability of Applications<br />Causes of Sublinear Scalability<br />Best practices<br />Q & A<br />http://w...
Importance of Scaling<br />Zynga’s Cityville: 0 to 100 Million users in 43 days ! (http://venturebeat.com/2011/01/14/zynga...
Explosion of Data<br />(Data-Rich Computing theme proposal.  J. Campbell, et al, 2007)<br />http://www.flickr.com/photos/2...
Hadoop<br />Simplifies building parallel applications<br />Programmer specifies Record-level and Group-level sequential co...
Scalability of Parallel Programs<br />If one node processes k MB/s, then N nodes should process (k*N) MB/s<br />If some fi...
Reduce Latency<br />Minimize program<br />execution time<br />http://www.flickr.com/photos/adhe55/2560649325/<br />
Increase Throughput<br />Maximize data processed per unit time<br />http://www.flickr.com/photos/mikebaird/3898801499/<br />
Three Equations<br />http://www.flickr.com/photos/ucumari/321343385/<br />
Amdahl’s Law<br />http://www.computer.org/portal/web/awards/amdahl<br />
Multi-Phase Computations<br />If computation C is split into N different parts, C1..CN<br />If partial computation Ci can ...
Amdahl’s Law, Restated<br />http://www.computer.org/portal/web/awards/amdahl<br />
MapReduce Workflow<br />http://www.computer.org/portal/web/awards/amdahl<br />
Little’s Law<br />http://sloancf.mit.edu/photo/facstaff/little.gif<br />
Message Cost Model<br />http://www.flickr.com/photos/mattimattila/4485665105/<br />
Message Granularity<br />For Gigabit Ethernet<br />α = 300 μS<br />β = 100 MB/s<br />100 Messages of 10KB each = 40 ms<br ...
Alpha-Beta<br />Common Mistake: Assuming that α is constant<br />Scheduling latency for responder<br />Jobtracker/Tasktrac...
Causes of Sublinear Scalability<br />http://www.flickr.com/photos/sabertasche2/2609405532/<br />
Sequential Bottlenecks<br />Mistake: Single reducer (default)<br />mapred.reduce.tasks<br />Parallel directive<br />Mistak...
Load Imbalance<br />Unequal Partitioning of Work<br />Imbalance = Max Partition Size – Min Partition Size<br />Heterogeneo...
Over-Partitioning<br />For N workers, choose M partitions, M >> N<br />Adaptive Scheduling, each worker chooses the next u...
Critical Paths<br />Maximum distance between start and end nodes<br />Long tail in Map task executions<br />Enable Specula...
Algorithmic Overheads<br />Repeating computation in place of adding communication<br />Parallel exploration of solution sp...
Synchronization<br />Shuffle stage in Hadoop, M*R transfers<br />Slowest system components involved: Network, Disk<br />Ca...
Task Granularity<br />Amortize task scheduling and launching overheads<br />Centralized scheduler<br />Out-of-band Tasktra...
Hadoop Best Practices<br />Use higher-level languages (e.g. Pig)<br />Coalesce small files into larger ones & use bigger H...
Hadoop Best Practices<br />Use compression everywhere<br />CPU-efficient for intermediate data<br />Disk-efficient for out...
Questions ?<br />http://www.flickr.com/photos/horiavarlan/4273168957/<br />
Upcoming SlideShare
Loading in …5
×

March 2011 HUG: Scaling Hadoop

5,215 views

Published on

Scaling Hadoop Applications - Talk by Milind Bhandarkar, Linked In

Published in: Technology
  • Be the first to comment

March 2011 HUG: Scaling Hadoop

  1. 1. Scaling Hadoop Applications<br />Milind Bhandarkar<br />Distributed Data Systems Engineer<br />[mbhandarkar@linkedin.com]<br />[twitter: @techmilind]<br />http://www.flickr.com/photos/theplanetdotcom/4878812549/<br />
  2. 2. About Me<br />http://www.linkedin.com/in/milindb<br />Founding member of Hadoop team at Yahoo! [2005-2010]<br />Contributor to Apache Hadoop & Pig since version 0.1<br />Built and Led Grid Solutions Team at Yahoo! [2007-2010]<br />Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems, Pathscale Inc. (acquired by QLogic), Yahoo!, and LinkedIn<br />http://www.flickr.com/photos/tmartin/32010732/<br />
  3. 3. Outline<br />Scalability of Applications<br />Causes of Sublinear Scalability<br />Best practices<br />Q & A<br />http://www.flickr.com/photos/86798786@N00/2202278303/<br />
  4. 4. Importance of Scaling<br />Zynga’s Cityville: 0 to 100 Million users in 43 days ! (http://venturebeat.com/2011/01/14/zyngas-cityville-grows-to-100-million-users-in-43-days/)<br />Facebook in 2009 : From 150 Million users to 350 Million ! (http://www.facebook.com/press/info.php?timeline)<br />LinkedIn in 2010: Adding a member per second ! (http://money.cnn.com/2010/11/17/technology/linkedin_web2/index.htm)<br />Public WWW size: 26M in 1998, 1B in 2000, 1T in 2008 (http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html)<br />Scalability allows dealing with success gracefully !<br />http://www.flickr.com/photos/palmdiscipline/2784992768/<br />
  5. 5. Explosion of Data<br />(Data-Rich Computing theme proposal. J. Campbell, et al, 2007)<br />http://www.flickr.com/photos/28634332@N05/4128229979/<br />
  6. 6. Hadoop<br />Simplifies building parallel applications<br />Programmer specifies Record-level and Group-level sequential computation<br />Framework handles partitioning, scheduling, dispatch, execution, communication, failure handling, monitoring, reporting etc.<br />User provides hints to control parallel execution<br />
  7. 7. Scalability of Parallel Programs<br />If one node processes k MB/s, then N nodes should process (k*N) MB/s<br />If some fixed amount of data is processed in T minutes on one node, the N nodes should process same data in (T/N) minutes<br />Linear Scalability<br />http://www-03.ibm.com/innovation/us/watson/<br />
  8. 8. Reduce Latency<br />Minimize program<br />execution time<br />http://www.flickr.com/photos/adhe55/2560649325/<br />
  9. 9. Increase Throughput<br />Maximize data processed per unit time<br />http://www.flickr.com/photos/mikebaird/3898801499/<br />
  10. 10. Three Equations<br />http://www.flickr.com/photos/ucumari/321343385/<br />
  11. 11. Amdahl’s Law<br />http://www.computer.org/portal/web/awards/amdahl<br />
  12. 12. Multi-Phase Computations<br />If computation C is split into N different parts, C1..CN<br />If partial computation Ci can be speeded up by a factor of Si<br />http://www.computer.org/portal/web/awards/amdahl<br />
  13. 13. Amdahl’s Law, Restated<br />http://www.computer.org/portal/web/awards/amdahl<br />
  14. 14. MapReduce Workflow<br />http://www.computer.org/portal/web/awards/amdahl<br />
  15. 15. Little’s Law<br />http://sloancf.mit.edu/photo/facstaff/little.gif<br />
  16. 16. Message Cost Model<br />http://www.flickr.com/photos/mattimattila/4485665105/<br />
  17. 17. Message Granularity<br />For Gigabit Ethernet<br />α = 300 μS<br />β = 100 MB/s<br />100 Messages of 10KB each = 40 ms<br />10 Messages of 100 KB each = 13 ms<br />http://www.flickr.com/photos/philmciver/982442098/<br />
  18. 18. Alpha-Beta<br />Common Mistake: Assuming that α is constant<br />Scheduling latency for responder<br />Jobtracker/Tasktracker time slice inversely proportional to number of concurrent tasks<br />Common Mistake: Assuming that β is constant<br />Network congestion<br />TCP incast<br />
  19. 19. Causes of Sublinear Scalability<br />http://www.flickr.com/photos/sabertasche2/2609405532/<br />
  20. 20. Sequential Bottlenecks<br />Mistake: Single reducer (default)<br />mapred.reduce.tasks<br />Parallel directive<br />Mistake: Constant number of reducers independent of data size<br />default_parallel<br />http://www.flickr.com/photos/bogenfreund/556656621/<br />
  21. 21. Load Imbalance<br />Unequal Partitioning of Work<br />Imbalance = Max Partition Size – Min Partition Size<br />Heterogeneous workers<br />Example: Disk Bandwidth<br />Empty Disk: 100 MB/s<br />75% Full disk: 25 MB/s <br />http://www.flickr.com/photos/kelpatolog/176424797/<br />
  22. 22. Over-Partitioning<br />For N workers, choose M partitions, M >> N<br />Adaptive Scheduling, each worker chooses the next unscheduled partition<br />Load Imbalance = Max(Sum(Wk)) – Min(Sum(Wk))<br />For sufficiently large M, both Max and Min tend to Average(Wk), thus reducing load imbalance<br />http://www.flickr.com/photos/infidelic/3238637031/<br />
  23. 23. Critical Paths<br />Maximum distance between start and end nodes<br />Long tail in Map task executions<br />Enable Speculative Execution<br />Overlap communication and computation<br />Asynchronous notifications<br />Example: deleting intermediate outputs after job completion<br />http://www.flickr.com/photos/cdevers/4619306252/<br />
  24. 24. Algorithmic Overheads<br />Repeating computation in place of adding communication<br />Parallel exploration of solution space results in wastage of computations<br />Need for a coordinator<br />Share partial, unmaterialized results<br />Consider memory-based small replicated K-V stores with eventual consistency<br />http://www.flickr.com/photos/sbprzd/140903299/<br />
  25. 25. Synchronization<br />Shuffle stage in Hadoop, M*R transfers<br />Slowest system components involved: Network, Disk<br />Can you eliminate Reduce stage ?<br />Use combiners ?<br />Compress intermediate output ?<br />Launch reduce tasks before all maps finish<br />http://www.flickr.com/photos/susan_d_p/2098849477/<br />
  26. 26. Task Granularity<br />Amortize task scheduling and launching overheads<br />Centralized scheduler<br />Out-of-band Tasktracker heartbeat reduces latency, but<br />May overwhelm the scheduler<br />Increase task granularity<br />Combine multiple small files<br />Maximize split sizes<br />Combine computations per datum<br />http://www.flickr.com/photos/philmciver/982442098/<br />
  27. 27. Hadoop Best Practices<br />Use higher-level languages (e.g. Pig)<br />Coalesce small files into larger ones & use bigger HDFS block size<br />Tune buffer sizes for tasks to avoid spill to disk<br />Consume only one slot per task<br />Use combiner if local aggregation reduces size<br />http://www.flickr.com/photos/yaffamedia/1388280476/<br />
  28. 28. Hadoop Best Practices<br />Use compression everywhere<br />CPU-efficient for intermediate data<br />Disk-efficient for output data<br />Use Distributed Cache to distribute small side-files (<< than input split size)<br />Minimize number of NameNode/JobTracker RPCs from tasks<br />http://www.flickr.com/photos/yaffamedia/1388280476/<br />
  29. 29. Questions ?<br />http://www.flickr.com/photos/horiavarlan/4273168957/<br />

×