YARN - Hadoop's Resource Manager
Upcoming SlideShare
Loading in...5

YARN - Hadoop's Resource Manager



Raymie Stata, ex-CTO of Yahoo, talks about YARN, Hadoop's new Resource Manager, and other improvements in Hadoop 2.0.

Raymie Stata, ex-CTO of Yahoo, talks about YARN, Hadoop's new Resource Manager, and other improvements in Hadoop 2.0.



Total Views
Views on SlideShare
Embed Views



1 Embed 4 4



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

YARN - Hadoop's Resource Manager YARN - Hadoop's Resource Manager Presentation Transcript

  • YARN Hadoop’s new Resource Manager Raymie Stata, VertiCloudVertiCloud 1
  • Main features of Hadoop 2.0 • High availability for HDFS • Federation for HDFS • Generalized Resource Management (YARN) • Plus: performance improvements, security improvements, compatibility improvements…VertiCloud 2
  • HDFS 2.0VertiCloud 3
  • HDFS 1.0 (and earlier) Name node (Gets to be huge!) Data nodes (Lots of them!)VertiCloud 4
  • Problems having a single NN • Scalability – NN limits horizontal scaling • Performance – NN is performance bottleneck • Isolation – all tenants share same NN – One misbehaving tenant brings everyone down – Can’t provide higher QOS to mission-critical apps – This is a problem even for small clusters!VertiCloud 5
  • HDFS Federation ViewFS NN1 NN2 NN3 NN4 Data nodes (Even more of them!)VertiCloud 6
  • Future possibilities for HDFS • Snapshots (!) • Partial name spaces • Alternative namespace managers • Global replication management • Disaster recoveryVertiCloud 7
  • YARN AND MAPREDUCE 2.0VertiCloud 8
  • MapReduce 1.0 (and earlier) JobTracker Queue of jobs Queue of tasks Job and task scheduling and monitoring Slave nodes (Lots of them!)VertiCloud 9
  • Problems with JT • Scalability – JT limits horizontal scaling • Availability – when JT dies, jobs must restart • Upgradability – must stop jobs to upgrade JT • Hardwired – JT only supports MapReduce • Increasingly hard to improve – Performance, scheduling , or utilizationVertiCloud 10
  • Observation Move intra-job management out of central node! JobTracker Queue of jobs Why are we Queue of tasks doing all of this on a single Job and task scheduling and node? monitoring When we have Slave nodes all these nodes? (Lots of them!)VertiCloud 11
  • YARN Yet Another Resource Negotiator Resource Manager Job queue Resource list Job Resource scheduling allocation App Master Tasks Task queue Job lifecycle logic Slave nodesVertiCloud 12
  • YARN Components • Resource Manager (per cluster) – Manages job scheduling and execution – Global resource allocation • Application Master (per job) – Manages task scheduling and execution – Local resource allocation • Node Manager (per-machine agent) – Manages the lifecycle of task containers – Reports to RM on health and resource usageVertiCloud 13
  • Lifecycle of a job Resource App Node Client Manager Master Managers Submit OK Go I need resources! Here you are Done? Start containers No Here you are Do work! Done? No Done? Done Done Yes ContainersVertiCloud 14
  • Why YARN is important • Fixes scalability and availability problems • Supports experimentation – At both YARN and MapReduce levels • Supports alternatives to MapReduce!! – OpenMPI – Interactive SQL (Impala) – Streaming • Storm, Apache S4, others… – HBase integration – Graph progressing (Apache Giraph)VertiCloud 15
  • Futures of YARN and MR • YARN – Models beyond MapReduce – Scheduling improvements (including preemption) – Container isolation • MapReduce – Decompose into reusable pieces – Push as well as pull in shuffle – Simple hash (no sort) in shuffleVertiCloud 16