HadoopYARN - Under the Hood      Sharad Agarwal    sharad@apache.org
About me
Recap: Hadoop 1.0 Map-ReduceJobTracker  Manages cluster resources   and job schedulingTaskTracker  Per-node agent  Manage ...
YARN Architecture                                          Node                                          Node             ...
What the new Architecture gets us?           Scale       Compute Platform
Scale for a compute platform• Application Size  • No of sub-tasks  • Application level state    •   eg. Counters• Number o...
Application size scaling inHadoop 1.0JTHeap µTotalTasks, Nodes, JobCounters
Application size scaling in YARN               is by          Architecture
Why a limitation on cluster size ?                            Hadoop 1.0ClusterUtilization                     Cluster Size
JobTracker   JIP   TIP             Scheduler Heartbeat Request                                     • Synchronous Heartbeat...
Highly Concurrent Systems          • scales much better (if done            right)          • makes effective use of multi...
Event Queue                    Event                                                    Dispatcher      Component        C...
Heartbeat                    NodeManager             Listener     Event Q           Meta Heartbeat Request                ...
YARN: Better utilization bigger    cluster                                   YARNClusterUtilization                       ...
State Management
State management in JT                   Very Hard to Maintain                   Debugging even harder
Complex State Management• Light weight State Machines Library• Declarative way of specifying the state   Transitions• Inva...
Declarative State Machine
High Availability
MR Application Master Recovery• Hadoop 1.0 • Application need to resubmit Job • All completed tasks are lost• YARN • Appli...
Resource Manager HA• Based on Zookeeper• Coming Soon • YARN-128
YARN: New Possibilities  •   Open MPI - MR-2911  •   Master-Worker – MR-3315  •   Distributed Shell  •   Graph processing ...
YARN - a solid foundation to take    Hadoop to next level                    on   Scale, High Availability, Utilization   ...
Thank You@twitter: sharad_ag
Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)
Upcoming SlideShare
Loading in...5
×

Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

2,484

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,484
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide
  • We will talk about how YARN is built fundamentally different than Hadoop 1.0. what is the motivation for doing so ? What it buys us ?Hadoop 1.0 as classic MRHadoop 2.0 has MR on Yarn
  • I work primarily on Map-Reduce side and was part of the team when yarn was conceptualizedI work at InMobi, which is a mobile advertising company. I lead the development of big data platforms at InMobi, right from data collection to data analytics systems.I don’t see many folks from India. I am the organizer of hadoopmeetup group.
  • Quick primer on the Hadoop 1.0 architectureSingle Master known as JobTracker. Slave daemons are called TaskTrackerClient submit Jobs to JobTracker.Individual jobs contain map and reduce definitions.Jobtracker knows about the cluster resource and schedules the map and reduce tasks accordingly.
  • Single master known as Resource Manager - RM manages the resources of the clusterSlave daemons known as NodeManager - manages the resources of individual nodesClient submits Applications (Jobs are now called applications in YARN) to ResourceManagerEach Application has its own master process which gets spawned when the Application starts running - this process is responsible for managing the lifecycle of the Application- called Application master - fi Application Master wants to spawn more processes in the cluster, it ask the resource manager to spawn one. - the resource definition of the process which needs to be launched in the cluster is container – it says about things like RAM, disk, cpuetcFundamentally the application state mgmt is distributedRM is only responsible for cluster mgmt
  • What the new architecture gets usTodo:put animationScale and general purpose distributed compute platformI will discuss First Lets understand what scale meanshadoop context
  • For a distributed computate platform, scalability is at two levelsin terms of how big a single application couldAnd the number of concurrent running tasks in a single clusterApplication size is number of sub tasks and application level state.Number of concurrent tasks is nothing but the cluster size
  • In hadoop 1.0, application size is constrained by JobtrakerJT is a huge Monolithic master.Keeps cluster level metadata, task level metadata and application specific meta data. You see things like counter limits etc for the same reasonTODO: put the formulae
  • Todo: put animationBecauseApplication management is distributedLets see the other : number of concurrent tasks
  • Todo: animationWhy nobody runs more than say 3k or 4k nodes in JTBecause as the cluster size grows the utilization drops. The steeper the curve, the more you sacrifice on utilization So we said at 4k utilization is acceptable, so cluster size should not grow beyond thatLets see why this drops
  • Task scheduling happens in the heartbeatJob tracker has a global lock and heartbeat is process synchronouslyJT thru put is limited say 200 heartbeats/secAs cluster size is increased the interval a TT sends a heartbeat increasesJT is not very concurrentNeed to design for better concurrency
  • Same as in slides
  • Asynchronous processing of eventsEach component encapsulates its state. Mutations happen only via eventsReads can happen direclty
  • In Resource manager, the heartbeat processing is asynchronous, so it can handle large number of heartbeats/secso what is the impact of this on utilization
  • as the cluster size increases, the drop in utlization is much lowerYarn cluster can have large number of nodes within a single cluster
  • Lets look at state management aspectsThe state management in distributed systems where there are lot of moving parts is very crucial
  • This is the state transition picture for Jobsimilarly for different entities like Job, task, attemptTask and attempt have even more nodes
  • This is a very small snippet of Jobtracker code. It is thru out like this.No one dares to touch this. Very very fragile
  • For this reason, yarn has very light weight state machine library
  • All valid state transitions are declared upfrontApart from the obvious benefits:Now one can visualize/discuss/argue about the proposed changes to state machine which is not possible in current Hadoop 1.0I remember the first version of all the state transitions we designed in a spreadsheet in which we could see what all valid transitions we are missing
  • Lets look at the HA story
  • Same as in slides
  • Now since the work being done by Resource Manager is limited to cluster management and scheduling, now it is much much simpler to build HA in RM as opposed to JT which has a huge state
  • There are several compute paradigm being built over YARN. This list some of themThere are several others as well
  • Same as slideYARN is a general purpose distributed compute platform
  • Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

    1. 1. HadoopYARN - Under the Hood Sharad Agarwal sharad@apache.org
    2. 2. About me
    3. 3. Recap: Hadoop 1.0 Map-ReduceJobTracker Manages cluster resources and job schedulingTaskTracker Per-node agent Manage tasks
    4. 4. YARN Architecture Node Node Manager Manager Container App Mstr App Mstr Client Resource Resource Node Node Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
    5. 5. What the new Architecture gets us? Scale Compute Platform
    6. 6. Scale for a compute platform• Application Size • No of sub-tasks • Application level state • eg. Counters• Number of Concurrent Tasks in a single cluster
    7. 7. Application size scaling inHadoop 1.0JTHeap µTotalTasks, Nodes, JobCounters
    8. 8. Application size scaling in YARN is by Architecture
    9. 9. Why a limitation on cluster size ? Hadoop 1.0ClusterUtilization Cluster Size
    10. 10. JobTracker JIP TIP Scheduler Heartbeat Request • Synchronous Heartbeat Processing • JobTracker Global LockHeartbeatResponse JT transaction rate limit: 200 heartbeats/sec
    11. 11. Highly Concurrent Systems • scales much better (if done right) • makes effective use of multi- core hardware • managing eventual consistency of states hard • need for a systemic framework to manage this
    12. 12. Event Queue Event Dispatcher Component Component Component A B N• Mutations only via events• Components only expose Read APIs• Use Re-entrant locks• Components follow clear lifecycle Event Model
    13. 13. Heartbeat NodeManager Listener Event Q Meta Heartbeat Request Get commandsHeartbeatResponse Asynchronous Heartbeat Handling
    14. 14. YARN: Better utilization bigger cluster YARNClusterUtilization Hadoop 1.0 Cluster Size
    15. 15. State Management
    16. 16. State management in JT Very Hard to Maintain Debugging even harder
    17. 17. Complex State Management• Light weight State Machines Library• Declarative way of specifying the state Transitions• Invalid transitions are handled automatically• Fits nicely with the event model• Debug-ability is drastically improved. Lineage of object states can easily be determined• Handy while recovering the state
    18. 18. Declarative State Machine
    19. 19. High Availability
    20. 20. MR Application Master Recovery• Hadoop 1.0 • Application need to resubmit Job • All completed tasks are lost• YARN • Application execution state check pointed in HDFS • Rebuilds the state by replaying the events
    21. 21. Resource Manager HA• Based on Zookeeper• Coming Soon • YARN-128
    22. 22. YARN: New Possibilities • Open MPI - MR-2911 • Master-Worker – MR-3315 • Distributed Shell • Graph processing – Giraph-13 • BSP – HAMA-431 • CEP • S4 – S4-25 • Storm - https://github.com/nathanmarz/storm/issues/74 • Iterative processing - Spark https://github.com/mesos/spark-yarn/
    23. 23. YARN - a solid foundation to take Hadoop to next level on Scale, High Availability, Utilization And Alternate Compute Paradigms
    24. 24. Thank You@twitter: sharad_ag

    ×