Your SlideShare is downloading. ×
  • Like
YARN - Hadoop's Resource Manager
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

YARN - Hadoop's Resource Manager

  • 3,193 views
Published

Raymie Stata, ex-CTO of Yahoo, talks about YARN, Hadoop's new Resource Manager, and other improvements in Hadoop 2.0.

Raymie Stata, ex-CTO of Yahoo, talks about YARN, Hadoop's new Resource Manager, and other improvements in Hadoop 2.0.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,193
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
121
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. YARN Hadoop’s new Resource Manager Raymie Stata, VertiCloudVertiCloud 1
  • 2. Main features of Hadoop 2.0 • High availability for HDFS • Federation for HDFS • Generalized Resource Management (YARN) • Plus: performance improvements, security improvements, compatibility improvements…VertiCloud 2
  • 3. HDFS 2.0VertiCloud 3
  • 4. HDFS 1.0 (and earlier) Name node (Gets to be huge!) Data nodes (Lots of them!)VertiCloud 4
  • 5. Problems having a single NN • Scalability – NN limits horizontal scaling • Performance – NN is performance bottleneck • Isolation – all tenants share same NN – One misbehaving tenant brings everyone down – Can’t provide higher QOS to mission-critical apps – This is a problem even for small clusters!VertiCloud 5
  • 6. HDFS Federation ViewFS NN1 NN2 NN3 NN4 Data nodes (Even more of them!)VertiCloud 6
  • 7. Future possibilities for HDFS • Snapshots (!) • Partial name spaces • Alternative namespace managers • Global replication management • Disaster recoveryVertiCloud 7
  • 8. YARN AND MAPREDUCE 2.0VertiCloud 8
  • 9. MapReduce 1.0 (and earlier) JobTracker Queue of jobs Queue of tasks Job and task scheduling and monitoring Slave nodes (Lots of them!)VertiCloud 9
  • 10. Problems with JT • Scalability – JT limits horizontal scaling • Availability – when JT dies, jobs must restart • Upgradability – must stop jobs to upgrade JT • Hardwired – JT only supports MapReduce • Increasingly hard to improve – Performance, scheduling , or utilizationVertiCloud 10
  • 11. Observation Move intra-job management out of central node! JobTracker Queue of jobs Why are we Queue of tasks doing all of this on a single Job and task scheduling and node? monitoring When we have Slave nodes all these nodes? (Lots of them!)VertiCloud 11
  • 12. YARN Yet Another Resource Negotiator Resource Manager Job queue Resource list Job Resource scheduling allocation App Master Tasks Task queue Job lifecycle logic Slave nodesVertiCloud 12
  • 13. YARN Components • Resource Manager (per cluster) – Manages job scheduling and execution – Global resource allocation • Application Master (per job) – Manages task scheduling and execution – Local resource allocation • Node Manager (per-machine agent) – Manages the lifecycle of task containers – Reports to RM on health and resource usageVertiCloud 13
  • 14. Lifecycle of a job Resource App Node Client Manager Master Managers Submit OK Go I need resources! Here you are Done? Start containers No Here you are Do work! Done? No Done? Done Done Yes ContainersVertiCloud 14
  • 15. Why YARN is important • Fixes scalability and availability problems • Supports experimentation – At both YARN and MapReduce levels • Supports alternatives to MapReduce!! – OpenMPI – Interactive SQL (Impala) – Streaming • Storm, Apache S4, others… – HBase integration – Graph progressing (Apache Giraph)VertiCloud 15
  • 16. Futures of YARN and MR • YARN – Models beyond MapReduce – Scheduling improvements (including preemption) – Container isolation • MapReduce – Decompose into reusable pieces – Push as well as pull in shuffle – Simple hash (no sort) in shuffleVertiCloud 16