YARN - Hadoop's Resource Manager

5,285 views

Published on

Raymie Stata, ex-CTO of Yahoo, talks about YARN, Hadoop's new Resource Manager, and other improvements in Hadoop 2.0.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
5,285
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
167
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

YARN - Hadoop's Resource Manager

  1. 1. YARN Hadoop’s new Resource Manager Raymie Stata, VertiCloudVertiCloud 1
  2. 2. Main features of Hadoop 2.0 • High availability for HDFS • Federation for HDFS • Generalized Resource Management (YARN) • Plus: performance improvements, security improvements, compatibility improvements…VertiCloud 2
  3. 3. HDFS 2.0VertiCloud 3
  4. 4. HDFS 1.0 (and earlier) Name node (Gets to be huge!) Data nodes (Lots of them!)VertiCloud 4
  5. 5. Problems having a single NN • Scalability – NN limits horizontal scaling • Performance – NN is performance bottleneck • Isolation – all tenants share same NN – One misbehaving tenant brings everyone down – Can’t provide higher QOS to mission-critical apps – This is a problem even for small clusters!VertiCloud 5
  6. 6. HDFS Federation ViewFS NN1 NN2 NN3 NN4 Data nodes (Even more of them!)VertiCloud 6
  7. 7. Future possibilities for HDFS • Snapshots (!) • Partial name spaces • Alternative namespace managers • Global replication management • Disaster recoveryVertiCloud 7
  8. 8. YARN AND MAPREDUCE 2.0VertiCloud 8
  9. 9. MapReduce 1.0 (and earlier) JobTracker Queue of jobs Queue of tasks Job and task scheduling and monitoring Slave nodes (Lots of them!)VertiCloud 9
  10. 10. Problems with JT • Scalability – JT limits horizontal scaling • Availability – when JT dies, jobs must restart • Upgradability – must stop jobs to upgrade JT • Hardwired – JT only supports MapReduce • Increasingly hard to improve – Performance, scheduling , or utilizationVertiCloud 10
  11. 11. Observation Move intra-job management out of central node! JobTracker Queue of jobs Why are we Queue of tasks doing all of this on a single Job and task scheduling and node? monitoring When we have Slave nodes all these nodes? (Lots of them!)VertiCloud 11
  12. 12. YARN Yet Another Resource Negotiator Resource Manager Job queue Resource list Job Resource scheduling allocation App Master Tasks Task queue Job lifecycle logic Slave nodesVertiCloud 12
  13. 13. YARN Components • Resource Manager (per cluster) – Manages job scheduling and execution – Global resource allocation • Application Master (per job) – Manages task scheduling and execution – Local resource allocation • Node Manager (per-machine agent) – Manages the lifecycle of task containers – Reports to RM on health and resource usageVertiCloud 13
  14. 14. Lifecycle of a job Resource App Node Client Manager Master Managers Submit OK Go I need resources! Here you are Done? Start containers No Here you are Do work! Done? No Done? Done Done Yes ContainersVertiCloud 14
  15. 15. Why YARN is important • Fixes scalability and availability problems • Supports experimentation – At both YARN and MapReduce levels • Supports alternatives to MapReduce!! – OpenMPI – Interactive SQL (Impala) – Streaming • Storm, Apache S4, others… – HBase integration – Graph progressing (Apache Giraph)VertiCloud 15
  16. 16. Futures of YARN and MR • YARN – Models beyond MapReduce – Scheduling improvements (including preemption) – Container isolation • MapReduce – Decompose into reusable pieces – Push as well as pull in shuffle – Simple hash (no sort) in shuffleVertiCloud 16

×