Hadoop on-mesos
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Hadoop on-mesos

on

  • 10,854 views

 

Statistics

Views

Total Views
10,854
Views on SlideShare
9,926
Embed Views
928

Actions

Likes
25
Downloads
107
Comments
0

7 Embeds 928

http://takeagile.com 832
https://twitter.com 68
http://gigaom.com 9
http://takeagile.wordpress.com 8
http://dronamobiletest.cloudapp.net 7
https://content-preview.socialcast.com 2
http://localhost 2
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Hadoop on-mesos Presentation Transcript

  • 1. Hadoop on Mesos with a short history of distributed computing
  • 2. Agenda 1. Introduction (to me) 2. A short history of distributed computing 3. Hadoop on Mesos 4. Case study - Airbnb 5. Final thoughts 6. Q&A
  • 3. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ Airbnb
  • 4. About me - Brenden Matthews ● cyclist ● runner ● started computering before it was cool ● free software advocate & contributor (Conky) ● for a living, engineers software @ I don't even like computers.
  • 5. Von Neumann Bottleneck ● Forever limited by memory and other I/O bandwidth limitations ● To do more, you must scale beyond a single node ● Even with SMP systems, the same limitations apply A little history
  • 6. Early days of distributed computing ● Working around the Von Neumann Bottleneck: scaling up & out (Cray, SGI, IBM) ● 'Supercomputers' only practical for organizations with budget multipliers that start with a 'B'
  • 7. Who has time to build a datacentre? ● Xen hypervisor is released in 2003, paves the way for an 'abstract datacentre' through virtualization ● Amazon launches EC2 in 2006, kicks off the 'cloud computing' craze
  • 8. DIY supercomputer; a novel approach ● Google's MapReduce papers formalized the concept of 'black-box' distributed computing (2004) ● Google's own infrastructure is built upon free software and commodity hardware
  • 9. DIY supercomputer; a novel approach ● Hadoop: a free implementation of Google's infrastructure; 'big computing' for all (2005) ○ Robust ○ High tolerance of system failure
  • 10. We're still left with many incomplete solutions ● EC2 doesn't solve some problems: ○ Virtualization delivers poor performance when compared to 'bare metal'; must compensate by adding more instances ○ Frequent instance failures (mystery reboots, etc) ○ EC2 isn't 'application aware' (though some have tried) What else? ● Supercomputers aren't affordable ● Building a datacentre is not feasible for most ● Existing 'application in the cloud' systems are too restrictive
  • 11. How can we overcome these problems?
  • 12. The dream is alive.
  • 13. Mesos is an operating system for your cluster that provides application level distributed computing Mesos helps bridge the gap between the hardware and your application (or 'framework', in Mesos terms) What's Mesos?
  • 14. Why Mesos? yes, but...
  • 15. I enjoy doing things the hard way.
  • 16. I really enjoy doing things the hard way.
  • 17. Hadoop on Mesos: Why? ● Formalized, scalable distributed computing ● Extensive toolset (Hive, Pig, Cascading, Cascalog, ...) ● Familiar to many ('gold standard') ● Hadoop as a distributed application (a novel concept!) ● Multiple versions of Hadoop (upgrade path) ● Why stop at Hadoop? There's more to do with our cluster! (Chronos, Storm, Jenkins, Spark, ...) and who has time to manage it?
  • 18. Hadoop on Mesos: Goals ● Avoid complexity: rely on existing, vetted systems, where possible ● Hadoop on Mesos should behave like any other Hadoop ● Realize high resource utilization ● Minimize contention & starvation ● Make Hadoop a first class framework on Mesos
  • 19. Hadoop terminology ● JobTracker: manages cluster resources, assigns tasks to TaskTrackers ● TaskTracker: manages individual map/reduce tasks, serves intermediate data amongst other TaskTrackers ● Job: collection of map and reduce tasks ● Task: one unit of work for a job (be it map or reduce) ● Slot: a task executor, is either map or reduce ● HDFS: distributed filesystem (outside scope)
  • 20. Hadoop on Mesos: Challenges ● Availability: JobTracker must ensure adequate map and reduce slots are available for current & future jobs ● Capacity: how do you estimate capacity? How do you profile jobs? ● Optimization: general case, or specific cases? Per job resource allocation policies? Separate JobTrackers for different job types?
  • 21. Hadoop on Mesos: Challenges ○ Mesos reservations allow for reservation of slave resources for frameworks ○ Hadoop FairScheduler supports role fair sharing and task pre-emption within JobTracker ● Resource reservations: handling competing frameworks on the same cluster
  • 22. Hadoop on Mesos: Challenges Job Maps Reduces Duration Start 1 95 5 1h 0 2 5 100 1m 1m 3 10 10 30m 60m 4 50 0 20m 70m 5 100 5 1h 80m Maps Reduces 95 5 48 52 10 10 60 10 90 10 Job Flow With capacity for 100 slots A contrived example Maps Reduces 50 50 50 50 50 50 50 50 50 50 Ideal allocation Actual Hadoop
  • 23. Hadoop on Mesos: What we did ● Mesos Scheduler is a thin layer atop the Hadoop scheduler ● JobTracker launches TaskTrackers for each job, using either a fixed or variable slot policy ○ Fixed policy launches a fixed number of slots per TaskTracker ○ Variable policy attempts to launch an ideal number of TaskTrackers and slots based on job queue ● Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler)
  • 24. Suggested key configuration values Hadoop on Mesos: How we did it Name Value mapred.tasktracker.map.tasks.maximum 50 mapred.tasktracker.reduce.tasks.maximum 50 mapred.mesos.slot.map.minimum 1000 mapred.mesos.slot.reduce.minimum 1000 mapred.mesos.scheduler.policy.fixed false mapred.mesos.slot.cpus 0.95 mapred.mesos.slot.mem 1550
  • 25. ● Engineering & analytics departments use Hive, Pig, Cascading and other tools on Hadoop: ○ Building search indices ○ Pricing suggestion system ○ Trust & safety, fraud detection ○ Business analytics ● Dealing with hypergrowth Case study: Airbnb
  • 26. ● Had previously been using EMR, Amazon's managed Hadoop as a service ● EMR suffers from: ○ limited Hive/Pig features ○ feature lag ○ inability to patch or modify Hadoop ● Data infrastructure was prone to error due to significant complexity ○ EMR clusters would be spun up & destroyed every week ○ accessing Hadoop required strange SSH 'hopping' Case study: Airbnb, yesterday
  • 27. Case study: Airbnb, today ● We run Chronos, Hadoop, and Storm on Mesos now ● Finished complete migration to Mesos from EMR (June 2013) ● ~500 Chronos jobs ● ~20TiB of daily Hive data, ~1-2PiB of archived data
  • 28. ● Data availability: all time high ● Eng. & analytics customer satisfaction through the roof Case study: Airbnb, today
  • 29. Action shots
  • 30. Action shots
  • 31. Next steps ● Locality awareness ● HDFS on Mesos ● HA JobTracker ● JobTracker on Mesos
  • 32. Links ● The code: https://github.com/airbnb/mesos ● Airbnb Engineering Blog: http://nerds.airbnb. com/ ● My other stuff: https://github. com/brndnmtthws brenden@diddyinc.com brenden.matthews@airbnb.com
  • 33. Thanks!
  • 34. Questions?