Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We will talk about how YARN is built fundamentally different than Hadoop 1.0. what is the motivation for doing so ? What it buys us ?Hadoop 1.0 as classic MRHadoop 2.0 has MR on Yarn
  • I work primarily on Map-Reduce side and was part of the team when yarn was conceptualizedI work at InMobi, which is a mobile advertising company. I lead the development of big data platforms at InMobi, right from data collection to data analytics systems.I don’t see many folks from India. I am the organizer of hadoopmeetup group.
  • Quick primer on the Hadoop 1.0 architectureSingle Master known as JobTracker. Slave daemons are called TaskTrackerClient submit Jobs to JobTracker.Individual jobs contain map and reduce definitions.Jobtracker knows about the cluster resource and schedules the map and reduce tasks accordingly.
  • Single master known as Resource Manager - RM manages the resources of the clusterSlave daemons known as NodeManager - manages the resources of individual nodesClient submits Applications (Jobs are now called applications in YARN) to ResourceManagerEach Application has its own master process which gets spawned when the Application starts running - this process is responsible for managing the lifecycle of the Application- called Application master - fi Application Master wants to spawn more processes in the cluster, it ask the resource manager to spawn one. - the resource definition of the process which needs to be launched in the cluster is container – it says about things like RAM, disk, cpuetcFundamentally the application state mgmt is distributedRM is only responsible for cluster mgmt
  • What the new architecture gets usTodo:put animationScale and general purpose distributed compute platformI will discuss First Lets understand what scale meanshadoop context
  • For a distributed computate platform, scalability is at two levelsin terms of how big a single application couldAnd the number of concurrent running tasks in a single clusterApplication size is number of sub tasks and application level state.Number of concurrent tasks is nothing but the cluster size
  • In hadoop 1.0, application size is constrained by JobtrakerJT is a huge Monolithic master.Keeps cluster level metadata, task level metadata and application specific meta data. You see things like counter limits etc for the same reasonTODO: put the formulae
  • Todo: put animationBecauseApplication management is distributedLets see the other : number of concurrent tasks
  • Todo: animationWhy nobody runs more than say 3k or 4k nodes in JTBecause as the cluster size grows the utilization drops. The steeper the curve, the more you sacrifice on utilization So we said at 4k utilization is acceptable, so cluster size should not grow beyond thatLets see why this drops
  • Task scheduling happens in the heartbeatJob tracker has a global lock and heartbeat is process synchronouslyJT thru put is limited say 200 heartbeats/secAs cluster size is increased the interval a TT sends a heartbeat increasesJT is not very concurrentNeed to design for better concurrency
  • Same as in slides
  • Asynchronous processing of eventsEach component encapsulates its state. Mutations happen only via eventsReads can happen direclty
  • In Resource manager, the heartbeat processing is asynchronous, so it can handle large number of heartbeats/secso what is the impact of this on utilization
  • as the cluster size increases, the drop in utlization is much lowerYarn cluster can have large number of nodes within a single cluster
  • Lets look at state management aspectsThe state management in distributed systems where there are lot of moving parts is very crucial
  • This is the state transition picture for Jobsimilarly for different entities like Job, task, attemptTask and attempt have even more nodes
  • This is a very small snippet of Jobtracker code. It is thru out like this.No one dares to touch this. Very very fragile
  • For this reason, yarn has very light weight state machine library
  • All valid state transitions are declared upfrontApart from the obvious benefits:Now one can visualize/discuss/argue about the proposed changes to state machine which is not possible in current Hadoop 1.0I remember the first version of all the state transitions we designed in a spreadsheet in which we could see what all valid transitions we are missing
  • Lets look at the HA story
  • Same as in slides
  • Now since the work being done by Resource Manager is limited to cluster management and scheduling, now it is much much simpler to build HA in RM as opposed to JT which has a huge state
  • There are several compute paradigm being built over YARN. This list some of themThere are several others as well
  • Same as slideYARN is a general purpose distributed compute platform
  • Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

    1. HadoopYARN - Under the Hood Sharad Agarwal sharad@apache.org
    2. About me
    3. Recap: Hadoop 1.0 Map-ReduceJobTracker Manages cluster resources and job schedulingTaskTracker Per-node agent Manage tasks
    4. YARN Architecture Node Node Manager Manager Container App Mstr App Mstr Client Resource Resource Node Node Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
    5. What the new Architecture gets us? Scale Compute Platform
    6. Scale for a compute platform• Application Size • No of sub-tasks • Application level state • eg. Counters• Number of Concurrent Tasks in a single cluster
    7. Application size scaling inHadoop 1.0JTHeap µTotalTasks, Nodes, JobCounters
    8. Application size scaling in YARN is by Architecture
    9. Why a limitation on cluster size ? Hadoop 1.0ClusterUtilization Cluster Size
    10. JobTracker JIP TIP Scheduler Heartbeat Request • Synchronous Heartbeat Processing • JobTracker Global LockHeartbeatResponse JT transaction rate limit: 200 heartbeats/sec
    11. Highly Concurrent Systems • scales much better (if done right) • makes effective use of multi- core hardware • managing eventual consistency of states hard • need for a systemic framework to manage this
    12. Event Queue Event Dispatcher Component Component Component A B N• Mutations only via events• Components only expose Read APIs• Use Re-entrant locks• Components follow clear lifecycle Event Model
    13. Heartbeat NodeManager Listener Event Q Meta Heartbeat Request Get commandsHeartbeatResponse Asynchronous Heartbeat Handling
    14. YARN: Better utilization bigger cluster YARNClusterUtilization Hadoop 1.0 Cluster Size
    15. State Management
    16. State management in JT Very Hard to Maintain Debugging even harder
    17. Complex State Management• Light weight State Machines Library• Declarative way of specifying the state Transitions• Invalid transitions are handled automatically• Fits nicely with the event model• Debug-ability is drastically improved. Lineage of object states can easily be determined• Handy while recovering the state
    18. Declarative State Machine
    19. High Availability
    20. MR Application Master Recovery• Hadoop 1.0 • Application need to resubmit Job • All completed tasks are lost• YARN • Application execution state check pointed in HDFS • Rebuilds the state by replaying the events
    21. Resource Manager HA• Based on Zookeeper• Coming Soon • YARN-128
    22. YARN: New Possibilities • Open MPI - MR-2911 • Master-Worker – MR-3315 • Distributed Shell • Graph processing – Giraph-13 • BSP – HAMA-431 • CEP • S4 – S4-25 • Storm - https://github.com/nathanmarz/storm/issues/74 • Iterative processing - Spark https://github.com/mesos/spark-yarn/
    23. YARN - a solid foundation to take Hadoop to next level on Scale, High Availability, Utilization And Alternate Compute Paradigms
    24. Thank You@twitter: sharad_ag