YARN- way to
share cluster
beyond
traditional
HADOOP
Omkar Joshi
YARN team
Hortonworks,Inc
About me..
›  Software

developer at Hortonworks Inc.,

Palo Alto
›  Contributor to Apache YARN and
MAPREDUCE projects
›...
Agenda
›  Classical

HADOOP MAPREDUCE
framework
›  YARN architecture
›  Resource

scheduling
›  Resource localization ...
Classical MAPREDUCE
Framework
Task
Tracker
Map

Map

Client
Task
Tracker

Job
Tracker
Map

Client

Client communication
MA...
Drawbacks
›  Scalability
›  Limited

to ~4000 cluster nodes
›  Maximum ~40000 concurrent tasks
›  Synchronization in J...
Drawbacks contd..
›  Lacks

support to run and share cluster
resources with NON MAPREDUCE
applications.
›  Lacks support...
So what do we need?
›  Better
›  10K+

›  High

scalability
nodes, 10K+ jobs

availability
›  Better resource utilizat...
Thinktank!!
›  Lets

separate the logic of managing
cluster resources from managing
application itself
›  All the applic...
Architecture
›  Application
›  Job

submitted by user

›  Application

Master

›  Just

like job tracker
›  For MAPRE...
Architecture contd...
›  Resource

Manager (RM)

›  Single

resource scheduler (Pluggable)
›  Stores App state (No need...
Architecture contd...
Node
Manager
Container

App
Master

Client

Resource
Manager

Node
Manager
Container

Container

Cli...
How job gets executed?
2.
8.

Client

App
Master
4.

7.

5.

1.

Resource
Manager
(RM)

Node
Manager(NM)
8.
5.

1. 
2. 
3....
Resource Scheduler
›  Pluggable

(Default is Capacity Scheduler)
›  Capacity Scheduler
›  Hierarchical
›  Can

›  Use...
Capacity Scheduler
A
(40%)

B
(60%)

A
(40%)

App1
(60%)

App3
(33%)

App2
(33%)

App1
(33%)

1.

2.

3.

App4
(40%)
App3
...
Capacity Scheduler
Resource Localization
›  When

node manager launches container it
needs the executable to run
›  Resources (files) to be...
Resource Localization contd..
›  Public

localizer downloads public
resources(owned by NM).
›  Private localizer downloa...
Resource Localization contd..
AM requests 2
resources while starting
container
R1 – Public
R2 - Application

HDFS

R1

AM
...
Security
›  All

the users can not be trusted. Confidential
data /application’s data need to be
protected
›  Resource ma...
Security contd..
›  Kerberos(TGT)

while submitting the job.
›  AMRMToken :- for AM to talk to RM.
›  NMToken :- for AM...
Security contd..
2.
8.

Client

App
Master
4.

7.

5.

1.

Resource
Manager
(RM)

Node
Manager(NM)
8.
5.

1. 
2. 
3. 
4. 
...
Resource manager restart
›  Saves

application state.

›  Support

for Zookeeper and HDFS based
state store

›  Can

re...
YARN paper received best
paper award!! J
YARN-paper
Future work
›  RM

restart

›  Non

work preserving mode ..almost done
›  Work preserving mode .. Needs more effort
› ...
Different applications already
running on YARN
›  Apache

Giraph(graph processing)
›  Spark (real time processing)
›  A...
Writing an application on
YARN
Take a look at Distributed shell
›  Write Application Master which once started will
› 

...
Want to contribute to Open
source?
›  Follow

this post
›  Subscribe to apache user, yarn dev/issues
mailing list link
›...
Thank You!!
checkout blog on YARN

Questions??
Upcoming SlideShare
Loading in …5
×

YARN - way to share cluster BEYOND HADOOP

1,318 views

Published on

Describing YARN's architecture, Resource localization model, security and future work (like rm restart, RM -HA), contibuting to open source and hadoop.

Published in: Technology
  • Be the first to comment

YARN - way to share cluster BEYOND HADOOP

  1. 1. YARN- way to share cluster beyond traditional HADOOP Omkar Joshi YARN team Hortonworks,Inc
  2. 2. About me.. ›  Software developer at Hortonworks Inc., Palo Alto ›  Contributor to Apache YARN and MAPREDUCE projects ›  Worked on resource localization (distributed cache), security ›  Currently working on resource manager restart
  3. 3. Agenda ›  Classical HADOOP MAPREDUCE framework ›  YARN architecture ›  Resource scheduling ›  Resource localization (Distributed cache) ›  Security ›  Future work ›  How to write custom application on YARN ›  How to contribute to Open source ›  Q&A
  4. 4. Classical MAPREDUCE Framework Task Tracker Map Map Client Task Tracker Job Tracker Map Client Client communication MAPREDUCE Job Status Map Task Tracker Reduce
  5. 5. Drawbacks ›  Scalability ›  Limited to ~4000 cluster nodes ›  Maximum ~40000 concurrent tasks ›  Synchronization in Job Tracker becomes tricky ›  If Job Tracker fails then everything fails. Users will have to resubmit all the jobs. ›  Very poor cluster utilization ›  Fixed map and reduce slots
  6. 6. Drawbacks contd.. ›  Lacks support to run and share cluster resources with NON MAPREDUCE applications. ›  Lacks support for wire compatibility. ›  All clients need to have same version.
  7. 7. So what do we need? ›  Better ›  10K+ ›  High scalability nodes, 10K+ jobs availability ›  Better resource utilization ›  Support for multiple application frameworks ›  Support for aggregating logs ›  Wire compatibility ›  Easy to up grade the cluster
  8. 8. Thinktank!! ›  Lets separate the logic of managing cluster resources from managing application itself ›  All the applications including MAPREDUCE will run in user land. ›  Better isolation in secure cluster ›  More fault tolerant.
  9. 9. Architecture ›  Application ›  Job submitted by user ›  Application Master ›  Just like job tracker ›  For MAPREDUCE it will manage all the map and reduce tasks – progress, restart etc. ›  Container ›  Unit of allocation(simple process) ›  Replacing fixed map and reduce slots ›  Eg. Container 1 = 2 GB, 4 CPU
  10. 10. Architecture contd... ›  Resource Manager (RM) ›  Single resource scheduler (Pluggable) ›  Stores App state (No need to resubmit application if RM restarts) ›  Node ›  Per Manager (NM) machine ..think like task tracker ›  Manages container life cycle ›  Aggregating application logs
  11. 11. Architecture contd... Node Manager Container App Master Client Resource Manager Node Manager Container Container Client Map Reduce task status Node Manager Node Status Job Submission Resource Request and app status Container App Master
  12. 12. How job gets executed? 2. 8. Client App Master 4. 7. 5. 1. Resource Manager (RM) Node Manager(NM) 8. 5. 1.  2.  3.  4.  5.  6.  7.  8.  3. Node Manager(NM) Client submits application. (ex. MAPREDUCE) RM asks NM to start Application Master (AM) Node manager starts application master inside a container (process). Application Master first registers with RM and then keeps requesting new resources. On the same AMRM protocol it also reports the application status to RM. When RM allocates a new container to AM it then goes to the specified NM and requests it to launch container (eg. Map task). Newly stated container then will follow the application logic and keep reporting to AM its progress. Once done AM informs RM that application is successful. RM then informs NM about finished application and asks it to start aggregating logs and cleaning up container specific files. Container Node Manager(NM) 6.
  13. 13. Resource Scheduler ›  Pluggable (Default is Capacity Scheduler) ›  Capacity Scheduler ›  Hierarchical ›  Can ›  User queues think of it as queues per Organization limit (range of resources to use) ›  Elasticity ›  Black/White listing of resources. ›  Supports resource priorities ›  Security – queue level ACLs ›  Find more about Capacity Scheduler
  14. 14. Capacity Scheduler A (40%) B (60%) A (40%) App1 (60%) App3 (33%) App2 (33%) App1 (33%) 1. 2. 3. App4 (40%) App3 (20%) App4 (40%) App3 (30%) App2 (20%) App1 (20%) 4. App2 (30%) 5.
  15. 15. Capacity Scheduler
  16. 16. Resource Localization ›  When node manager launches container it needs the executable to run ›  Resources (files) to be downloaded should be specified as a part of Container launch context ›  Resource Types PUBLIC : - accessible to all ›  PRIVATE :- accessible to all containers of a single user ›  APPLICATION :- only to single application › 
  17. 17. Resource Localization contd.. ›  Public localizer downloads public resources(owned by NM). ›  Private localizer downloads private and application resources(owned by user). ›  Per user quota not supported yet. ›  LRU cache with configurable size. ›  As soon as it is localized it looses any connection with remote location. ›  Public localizer supports parallel download where as private localizer support limited parallel download.
  18. 18. Resource Localization contd.. AM requests 2 resources while starting container R1 – Public R2 - Application HDFS R1 AM R2 Public Localizer R1 & R2 R1 Public cache (NM) NM Private Localizer Private cache( User) Cache User2 User1 App cache A2 A1 R2
  19. 19. Security ›  All the users can not be trusted. Confidential data /application’s data need to be protected ›  Resource manager and Node managers are started as “yarn(super)” users. ›  All applications and containers run as user who submitted the job ›  Use LinuxContainerExecutor to launch user process. (see container-executor.c) ›  Private localizers too run as app_user.
  20. 20. Security contd.. ›  Kerberos(TGT) while submitting the job. ›  AMRMToken :- for AM to talk to RM. ›  NMToken :- for AM to talk to NM for launching new containers ›  ContainerToken :- way for RM to pass container information from RM to NM via AM. ›  Contains resource and user information ›  LocalizerToken :- Used by private localizer during resource localization ›  RMDelegationToken :- useful when kerberos (TGT) is not available.
  21. 21. Security contd.. 2. 8. Client App Master 4. 7. 5. 1. Resource Manager (RM) Node Manager(NM) 8. 5. 1.  2.  3.  4.  5.  6.  7.  8.  Kerberos (TGT) NMToken .. AMRMToken NMToken 1.  As a part of launching container LocalizerToken .. AMRMToken NMToken 3. Node Manager(NM) Container Node Manager(NM) 6.
  22. 22. Resource manager restart ›  Saves application state. ›  Support for Zookeeper and HDFS based state store ›  Can recover application from saved state. No need to resubmit the application ›  Today support only non work preserving mode ›  Lays foundation for RM-HA
  23. 23. YARN paper received best paper award!! J YARN-paper
  24. 24. Future work ›  RM restart ›  Non work preserving mode ..almost done ›  Work preserving mode .. Needs more effort ›  RM HA .. Just started ›  Task / container preemption.. ›  Rolling upgrades ›  Support for long running services
  25. 25. Different applications already running on YARN ›  Apache Giraph(graph processing) ›  Spark (real time processing) ›  Apache Tez ›  MapReduce( MRV2) ›  Apache Hbase (HOYA) ›  Apache Helix( incubator project) ›  Apache Samza (incubator project) ›  Storm
  26. 26. Writing an application on YARN Take a look at Distributed shell ›  Write Application Master which once started will ›  ›  ›  ›  ›  First register itself with RM on AMRMprotocol Keep heartbeating and requesting resources via “allocate” Use container management protocol to launch future containers on NM. Once done notify RM via finishApplicationMaster Always use AMRMClient and NMClient while talking to RM / NM. ›  Use distributed cache wisely. › 
  27. 27. Want to contribute to Open source? ›  Follow this post ›  Subscribe to apache user, yarn dev/issues mailing list link ›  Track YARN-issues ›  Post your questions on user mailing list. Try to be specific and add more information to get better and quick replies ›  Try to be patient. ›  ›  Start with simple tickets to get an idea about the underlying component.
  28. 28. Thank You!! checkout blog on YARN Questions??

×