Successfully reported this slideshow.

Hadoop Distributions: Bottlenecks and Tuning

880 views

Published on

This presentation by Alexey Diomin, R&D Engineer at Altoros, explains how to spot performance bottlenecks in Hadoop and overviews five approaches to eliminating them.

Published in: Technology, Business
  • Be the first to comment

Hadoop Distributions: Bottlenecks and Tuning

  1. 1. Diomin Aliaksey R&D 2014, Minsk
  2. 2. OpenSource Monitoring Target Group Apache Hadoop Yes X Developers Cloudera Yes Good All Hortonworks Yes Good All MapR No Bad Enterprise PivotalHD No Bad Enterprise 3
  3. 3. How to find the bottleneck? 4
  4. 4. 5
  5. 5. 6
  6. 6. 8
  7. 7. 9
  8. 8. 10
  9. 9. 11
  10. 10. 12
  11. 11. 1. Increase size of cluster 2. Increase input block size 3. Increase buffer size 13
  12. 12. 1. Increase size of cluster 2. Increase input block size 3. Increase buffer size 14
  13. 13. 15
  14. 14. 16
  15. 15. 17
  16. 16. 1. Increase size of cluster 2. Increase input block size 3. Increase buffer size 18
  17. 17. 19
  18. 18. 1. Increase size of cluster 2. Increase input block size 3. Increase buffer size 20
  19. 19. 1. Compression 21
  20. 20. 1. Compression 2. Combiner 22
  21. 21. Wordcount Reduce function as Combine combine 1: <a, 1> <b, 1> <a, 1> => <a, 2> <b, 1> combine 2: <a, 1> <b, 1> => <a, 1> <b, 1> Reduce: <a, {1, 2}> <b, {1, 1}> => <a, 3> <b, 2> 23
  22. 22. Mean combine 1: <k,40> <k,30> <k,20> => <k, 30> combine 2: <k,2> <k,8> => <k, 5> Reduce: => <k, 17.5> <k, {30, 5}> 24
  23. 23. Mean combine 1: <k,40> <k,30> <k,20> => <k, 30> combine 2: <k,2> <k,8> => <k, 5> Reduce: => <k, 17.5> <k, {30, 5}> (40 + 30 + 20 + 2 + 8)/5 = 17.5 25
  24. 24. Mean combine 1: <k,<40,1>> <k,<30,1>>, <k,<20,1>> => <k, <90,3> > <k,<2,1>> <k, <8,1>> => <k, <10, 2> > Reduce: => <k, 20> combine 2: <k, {<90,3>, <10,2>} > 26
  25. 25. 27

×