Grid OperationsHadoop Performance at LinkedInAllen WittenauerGrid Computing Architect©2012 LinkedIn Corporation. All Right...
©2012 LinkedIn Corporation. All Rights Reserved.
“I have never seen a Hadoop cluster that was             legitimately CPU bound.”                -- Milind Bhandarkar     ...
X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
“I have only seen one Hadoop cluster that was            legitimately CPU bound.”               -- Milind Bhandarkar      ...
Why do we have such high CPU usage?©2012 LinkedIn Corporation. All Rights Reserved.
We do a lot of Graph Theory.©2012 LinkedIn Corporation. All Rights Reserved.
Ticket to Ride   Ticket To Ride is a registered trademark of Days of Wonder    ©2012 LinkedIn Corporation. All Rights Res...
Social Graph©2012 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS
2nd Degree Connection©2012 LinkedIn Corporation. All Rights Reserved.   GRID OPERATIONS
We under-commit our memory.©2012 LinkedIn Corporation. All Rights Reserved.
Our Hadoop Software Needs... The Plan...  Tasks     – 2 GB of RAM = 1 GB of JVM Heap, .5-1GB for non-heap     – (Typicall...
Our Hadoop Software Needs... The Reality  Task Counts     – Westmere (5650): 6       Cores+HT = 12       Tasks     – Sand...
We don’t have as many disks per node.©2012 LinkedIn Corporation. All Rights Reserved.
Typical Hadoop Node Out in the Wild  Most user’s don’t know their actual   needs     – Vendor advice... play it safe!  S...
What Happens With Fewer Disks?  Physical footprint requirements are smaller  Linux buffers & caches are more efficient  ...
LinkedIn Node Configuration  No RAID controller     – More cost for negative perf when doing       JBOD  6 Drives     – ...
Rack Level View  If we assume we can use 40u in a rack then:     – More CPUs     – Just as many HDs     – More Network   ...
We care about file system tuning.©2012 LinkedIn Corporation. All Rights Reserved.
LinkedIn Hadoop Disk/File Systems  noatime Enabled  writeback Enabled  Each Disk (except root) Partitions:     – Swap  ...
We care about job tuning.©2012 LinkedIn Corporation. All Rights Reserved.
LinkedIn Job Tuning Guidelines  All jobs get reviewed prior to going to production.  Task times should be between 5-15 m...
... and the result?©2012 LinkedIn Corporation. All Rights Reserved.
Why is LinkedIn Running so Hot?  We do a lot of non-MapReduce work.  RAM buffers and caches allow us to offset a lot of ...
©2012 LinkedIn Corporation. All Rights Reserved.   BUSINESS OPERATIONS
Upcoming SlideShare
Loading in...5
×

Hadoop Performance at LinkedIn

9,578

Published on

This is part of a presentation I did at Intel a month or so ago. Some of the content has been removed due to NDA, etc.

Published in: Technology

Transcript of "Hadoop Performance at LinkedIn"

  1. 1. Grid OperationsHadoop Performance at LinkedInAllen WittenauerGrid Computing Architect©2012 LinkedIn Corporation. All Rights Reserved.
  2. 2. ©2012 LinkedIn Corporation. All Rights Reserved.
  3. 3. “I have never seen a Hadoop cluster that was legitimately CPU bound.” -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar©2012 LinkedIn Corporation. All Rights Reserved.
  4. 4. X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
  5. 5. X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
  6. 6. “I have only seen one Hadoop cluster that was legitimately CPU bound.” -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar©2012 LinkedIn Corporation. All Rights Reserved.
  7. 7. Why do we have such high CPU usage?©2012 LinkedIn Corporation. All Rights Reserved.
  8. 8. We do a lot of Graph Theory.©2012 LinkedIn Corporation. All Rights Reserved.
  9. 9. Ticket to Ride Ticket To Ride is a registered trademark of Days of Wonder ©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  10. 10. Social Graph©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  11. 11. 2nd Degree Connection©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  12. 12. We under-commit our memory.©2012 LinkedIn Corporation. All Rights Reserved.
  13. 13. Our Hadoop Software Needs... The Plan...  Tasks – 2 GB of RAM = 1 GB of JVM Heap, .5-1GB for non-heap – (Typically) 1 Super Active Threads  TaskTracker – 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap – 1-4 Super Active Threads  DataNode – 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap – 1-4 Super Active Threads  RAM: 3GB + (task count * 2GB) + OS needs  Threads: 8 + (task count) + OS needs©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  14. 14. Our Hadoop Software Needs... The Reality  Task Counts – Westmere (5650): 6 Cores+HT = 12 Tasks – Sandy Bridge (2640): 6 Cores+HT = 14 Tasks  Most of our tasks leave at most .5 GB free – = combined -> very large buffer & cache©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  15. 15. We don’t have as many disks per node.©2012 LinkedIn Corporation. All Rights Reserved.
  16. 16. Typical Hadoop Node Out in the Wild  Most user’s don’t know their actual needs – Vendor advice... play it safe!  Significantly more memory – “For the future!” – Badly written code  Significantly more disk – “Hadoop is IO intensive!” – “Greater task locality!”  Greater performance...but is it worth the cost...©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  17. 17. What Happens With Fewer Disks?  Physical footprint requirements are smaller  Linux buffers & caches are more efficient – More per disk – Fewer to manage  Spindle count DOES matter... but the price/perf isn’t there for our workflows. – From a few years ago & based on store.sun.com prices (so not “real”)... Nodes/Cores RAM/Bus Disks Time In Minutes HW Cost* 3/24 16/half 8 254.98 $37827 3/24 24/full 8 244.50 $38817 3/24 16/half 4 257.38 $21456 3/24 24/full 4 246.82 $22986 6/48 16/half 4 126.98 $42912©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  18. 18. LinkedIn Node Configuration  No RAID controller – More cost for negative perf when doing JBOD  6 Drives – Still fits in 1U w/SATA drives – ~same perf as 8 drives  Less metal = cheaper cost©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  19. 19. Rack Level View  If we assume we can use 40u in a rack then: – More CPUs – Just as many HDs – More Network – Potentially more RAM©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  20. 20. We care about file system tuning.©2012 LinkedIn Corporation. All Rights Reserved.
  21. 21. LinkedIn Hadoop Disk/File Systems  noatime Enabled  writeback Enabled  Each Disk (except root) Partitions: – Swap – MapReduce Spill Space – HDFS  Delayed Commits – Why write once when you can do ganged writes more efficiently?©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  22. 22. We care about job tuning.©2012 LinkedIn Corporation. All Rights Reserved.
  23. 23. LinkedIn Job Tuning Guidelines  All jobs get reviewed prior to going to production.  Task times should be between 5-15 minutes.  Jobs should have less than 10,000 tasks.  Jobs should be smart about # of files and the size of those files generated.©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  24. 24. ... and the result?©2012 LinkedIn Corporation. All Rights Reserved.
  25. 25. Why is LinkedIn Running so Hot?  We do a lot of non-MapReduce work.  RAM buffers and caches allow us to offset a lot of disk IO.  We audit our jobs.  As a result, our CPUs are actually busy.©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  26. 26. ©2012 LinkedIn Corporation. All Rights Reserved. BUSINESS OPERATIONS
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×