Your SlideShare is downloading. ×
 Tracking multi-tenant resource usage with "White Elephant"
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Tracking multi-tenant resource usage with "White Elephant"

771

Published on

This is a 5 minute lightening talk on why one would want to use "White Elephant" for capacity planning on a Hadoop cluster. This talk was done for the LSPE group, hosted by Yahoo! in Sunnyvale on Sept …

This is a 5 minute lightening talk on why one would want to use "White Elephant" for capacity planning on a Hadoop cluster. This talk was done for the LSPE group, hosted by Yahoo! in Sunnyvale on Sept 19, 2013.

http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/129859402/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
771
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Introduce self Brief description of LinkedinOver 5000 physical machines, apache hadoop 1.04> 900 users competing for resources
  • Why track usage?Infrastructure has a cost and cost has a relationship based on resource requirements.Its important to Identifying wasted resources to keep costs down. Also having a graph that shows “team awesome’s” sloppy code is impacting other teams is nice.
  • White elephant uses job history logs as data source and we need to get these logs onto HDFSJob TrackerGets cranky if you rename delete or move logs on HDFS from under it.Manipulating data requires second copy Job history logs are ~200-300k in size so 2nd copy doubles number of small filesSmall files problem WebHDFSScript on jobtracker which copies job history files from local disk to HDFS using webhdfs.Allows us to move, rename, delete filesAllows us to merge files into HAR, avoiding small file problem.
  • About white elephantRequirements are minimalNeeds HadoopNeeds Avrojruby, HyperSQL for memory DB, Rickshaw for charting.job history logs need to be on HDFS. Pick your method on getting them there.Data AggregationTwo map red jobs, “all logs” and “incremental from last run. configurable via yaml file, kerberos, keytabs, file locationsDoes need a scheduler like azkaban or luigi
  • Onto the screen shotsDashboardInteractive displays showing different Metrics (“Total Hours”, Map or Reduce Hours”, number of mapsFailed tasks user1 vs user 2 and user3
  • Reduce shuffle bytes – user1, vs user2 vs user3Hey look at all the network traffic cause by User1 There’s many ways to slice and dice these graphs and I’m here to tell you it’s worth looking into.
  • This is just a sample of graphs.Most importantly, white elephant helps determine capacity
  • URLS worth visitingI’ll be available in the back to answer questionsThank the audience
  • Transcript

    • 1. Tracking multi-tenant resource usage with "White Elephant” Adam Faris LinkedIn
    • 2. Why track usage?
    • 3. – Use Hadoop to process logs – Creates small file problem for HDFS – WebHDFS + HAR = “Problem Solver” Job History Logs
    • 4. – Requirements – Provides Data Aggregation – Provides Dashboard – Open Sourced by LinkedIn Engineering http://en.wikipedia.org/wiki/White_elephant
    • 5. Failed Tasks
    • 6. Reduce Shuffle Bytes
    • 7. It can do more? • Total task time • Total speculative time • CPU Hours • Plus more • Helps determine capacity
    • 8. • Github: – https://github.com/linkedin/white-elephant • LinkedIn Open Source Projects: – http://data.linkedin.com/opensource/white-elephant • LinkedIn is Hiring: – http://careers.linkedin.com • Questions/Comments: – twitter: @opsmekanix

    ×