Your SlideShare is downloading. ×
0
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Hadoop Performance at LinkedIn
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop Performance at LinkedIn

9,425

Published on

This is part of a presentation I did at Intel a month or so ago. Some of the content has been removed due to NDA, etc.

This is part of a presentation I did at Intel a month or so ago. Some of the content has been removed due to NDA, etc.

Published in: Technology
1 Comment
39 Likes
Statistics
Notes
No Downloads
Views
Total Views
9,425
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
170
Comments
1
Likes
39
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Grid OperationsHadoop Performance at LinkedInAllen WittenauerGrid Computing Architect©2012 LinkedIn Corporation. All Rights Reserved.
  • 2. ©2012 LinkedIn Corporation. All Rights Reserved.
  • 3. “I have never seen a Hadoop cluster that was legitimately CPU bound.” -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar©2012 LinkedIn Corporation. All Rights Reserved.
  • 4. X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
  • 5. X5650 - 6 Core @ 2.67 MHz©2012 LinkedIn Corporation. All Rights Reserved.
  • 6. “I have only seen one Hadoop cluster that was legitimately CPU bound.” -- Milind Bhandarkar -- Milind Bhandarkar -- Milind Bhandarkar©2012 LinkedIn Corporation. All Rights Reserved.
  • 7. Why do we have such high CPU usage?©2012 LinkedIn Corporation. All Rights Reserved.
  • 8. We do a lot of Graph Theory.©2012 LinkedIn Corporation. All Rights Reserved.
  • 9. Ticket to Ride Ticket To Ride is a registered trademark of Days of Wonder ©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 10. Social Graph©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 11. 2nd Degree Connection©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 12. We under-commit our memory.©2012 LinkedIn Corporation. All Rights Reserved.
  • 13. Our Hadoop Software Needs... The Plan...  Tasks – 2 GB of RAM = 1 GB of JVM Heap, .5-1GB for non-heap – (Typically) 1 Super Active Threads  TaskTracker – 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap – 1-4 Super Active Threads  DataNode – 1.5 GB of RAM = 1 GB of JVM Heap, .5GB for non-heap – 1-4 Super Active Threads  RAM: 3GB + (task count * 2GB) + OS needs  Threads: 8 + (task count) + OS needs©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 14. Our Hadoop Software Needs... The Reality  Task Counts – Westmere (5650): 6 Cores+HT = 12 Tasks – Sandy Bridge (2640): 6 Cores+HT = 14 Tasks  Most of our tasks leave at most .5 GB free – = combined -> very large buffer & cache©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 15. We don’t have as many disks per node.©2012 LinkedIn Corporation. All Rights Reserved.
  • 16. Typical Hadoop Node Out in the Wild  Most user’s don’t know their actual needs – Vendor advice... play it safe!  Significantly more memory – “For the future!” – Badly written code  Significantly more disk – “Hadoop is IO intensive!” – “Greater task locality!”  Greater performance...but is it worth the cost...©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 17. What Happens With Fewer Disks?  Physical footprint requirements are smaller  Linux buffers & caches are more efficient – More per disk – Fewer to manage  Spindle count DOES matter... but the price/perf isn’t there for our workflows. – From a few years ago & based on store.sun.com prices (so not “real”)... Nodes/Cores RAM/Bus Disks Time In Minutes HW Cost* 3/24 16/half 8 254.98 $37827 3/24 24/full 8 244.50 $38817 3/24 16/half 4 257.38 $21456 3/24 24/full 4 246.82 $22986 6/48 16/half 4 126.98 $42912©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 18. LinkedIn Node Configuration  No RAID controller – More cost for negative perf when doing JBOD  6 Drives – Still fits in 1U w/SATA drives – ~same perf as 8 drives  Less metal = cheaper cost©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 19. Rack Level View  If we assume we can use 40u in a rack then: – More CPUs – Just as many HDs – More Network – Potentially more RAM©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 20. We care about file system tuning.©2012 LinkedIn Corporation. All Rights Reserved.
  • 21. LinkedIn Hadoop Disk/File Systems  noatime Enabled  writeback Enabled  Each Disk (except root) Partitions: – Swap – MapReduce Spill Space – HDFS  Delayed Commits – Why write once when you can do ganged writes more efficiently?©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 22. We care about job tuning.©2012 LinkedIn Corporation. All Rights Reserved.
  • 23. LinkedIn Job Tuning Guidelines  All jobs get reviewed prior to going to production.  Task times should be between 5-15 minutes.  Jobs should have less than 10,000 tasks.  Jobs should be smart about # of files and the size of those files generated.©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 24. ... and the result?©2012 LinkedIn Corporation. All Rights Reserved.
  • 25. Why is LinkedIn Running so Hot?  We do a lot of non-MapReduce work.  RAM buffers and caches allow us to offset a lot of disk IO.  We audit our jobs.  As a result, our CPUs are actually busy.©2012 LinkedIn Corporation. All Rights Reserved. GRID OPERATIONS
  • 26. ©2012 LinkedIn Corporation. All Rights Reserved. BUSINESS OPERATIONS

×