More Related Content
Similar to Understanding yarn - Pune apex meetup jan 06 2016 (20)
Understanding yarn - Pune apex meetup jan 06 2016
- 2. Apache Apex Meetup
Agenda
● Understanding YARN
○ Why YARN
○ Introducing YARN
○ YARN architecture
○ Beyond batch
○ Application Lifecycle
● Building YARN application
- 3. Apache Apex Meetup
Why YARN
Hadoop v1 (MR1) Architecture
● Job Tracker
○ Manages cluster resources
○ Job scheduling
● Task Tracker
○ Per-node Agent
○ Manages tasks
MapReduce Status
Job Submission
Job
Tracker
Task Task
Task Task
Client
Client
Task
Tracker
Task Task
Task
Tracker
Task
Tracker
- 4. Apache Apex Meetup
Limitations with MR1
• Scalability
o Maximum cluster size: 4,000 nodes
o Maximum concurrent tasks: 40,000
• Availability
• Resource Utilization
• Running non-MapReduce applications
Why YARN (Cont…)
- 5. Apache Apex Meetup
Introducing YARN
● YARN - Yet Another Resource Negotiator
● Framework that facilitates writing arbitrary distributed processing
frameworks and applications.
● YARN Applications/frameworks:
e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.
- 7. Apache Apex Meetup
Introducing YARN
≈
7Proprietary and Confidential
Job Tracker
Resource Manager
Application Master
Timeline Server
Task Tracker Node Manager
Map Slot
Reduce Slot
≈
≈
YARNMap Reduce 1
≈
- 8. Apache Apex Meetup
• Resource Manager
o Manages and allocates cluster resources
o Application scheduling
o Applications Manager
• Node Manager
o Per-machine agent
o Manages life-cycle of container
o Monitors resources
• Application Master
o Per-application
o Manages application scheduling and task execution
Hadoop v2 (YARN) Architecture
App
Master
Cntr
Node
Manager
Cntr Cntr
Node
Manager
Cntr
App
Master
Node
Manager
Resource
Manager
MapReduce Status
Job Submission
Node Status
Resource Request
Client
Client
- 9. Apache Apex Meetup
Application Submission workflow
YarnClient
Node RM
(ApplicationsManagers +
Scheduler)
Node NM Node NM
Application
Master
ContainerContainer
1) Submit application
2) Launch application Master
RM = Resource Manager
NM = Node Manager
AM = Application Master
= Heartbeats
3) AM registers with RM
4) AM negotiates for containers
5) Launch Container
5) Launch
Container
- 11. Apache Apex Meetup
Sample YARN application - Client
1. Start the service - YarnClient
- YarnClient.start()
2. Create Application object - YarnClientApplication
- YarnClient.createApplication()
3. Set up App Context - ApplicationSubmissionContext
- ApplicationSubmissionContext represents information needed by
ResourceManager to launch ApplicationMaster
4. Submit application to resource manager
- YarnClient.submitApplication(ApplicationSubmissionContext)
11Proprietary and Confidential
AppName,
Priority,
ContainerLaunchContext,
…
- 12. Apache Apex Meetup
Sample YARN Application - App Master
1. Register App Master with Resource Manager
- AMRMClient.registerApplicationMaster
2. Negotiate containers from resource manager
- Provides ContainerRequest - request for container resources
- AMRMClient.addContainerRequest
3. Build ContainerLaunchContext
- Uses container returned by Resource Manager
- ContainerLaunchContext - represents information needed by node manager to
launch a container
12Proprietary and Confidential
ContainerId,
Commands,
Environment,
LocalResources,
…
- 13. Apache Apex Meetup
Sample YARN Application - App Master (cont…)
4. Launch container using NMClient.startContainer
5. Wait till all containers are done
- AllocateResponse.getCompletedContainersStatuses
6. Unregister application from Resource Manager
- AMRMClient.unregisterApplicationMaster
13Proprietary and Confidential
- 14. Apache Apex Meetup
References
● Simple Yarn code example
○ https://github.com/hortonworks/simple-yarn-app
● Document references
○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/
○ http://www.slideshare.net/