Bhupesh Chawda
bhupesh@apache.org
DataTorrent
Introduction to YARN
Next Gen Hadoop
Image Source: https://memegenerator.net/instance/64508420
Why YARN
Hadoop v1 (MR1) Architecture
● Job Tracker
○ Manages cluster resources
○ Job scheduling
○ Bottleneck
● Task Tracker
○ Per-node Agent
○ Manages tasks
○ Map / Reduce task slots
MapReduce Status
Job Submission
Job
Tracker
Task Task
Task Task
Client
Client
Task
Tracker
Task Task
Task
Tracker
Task
Tracker
Limitations with MR1
• Scalability
Maximum cluster size: 4,000 nodes
Maximum concurrent tasks: 40,000
• Availability - Job Tracker is a SPOF
• Resource Utilization - Map / Reduce slots
• Runs only MapReduce applications
Why YARN (Cont…)
Introducing YARN
● YARN - Yet Another Resource Negotiator
● Framework that facilitates writing arbitrary distributed processing
frameworks and applications.
● YARN Applications/frameworks:
e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc.
Image Source: http://tm.durusau.net/?cat=1525
Hadoop beyond Batch
YARN for better
resource utilization
More applications
than MapReduce
Comparing MapReduce with YARN
MapReduce
YARN
≈
≈
≈
8Proprietary and Confidential
Job Tracker
Resource Manager
Application Master
Task Tracker Node Manager
Map Slot
Reduce Slot
Backward Compatibility
Maintained!
● Existing Map Reduce
jobs run as is on the
YARN framework
● No Job Tracker and
Task Tracker processes
• Resource Manager
Manages and allocates cluster resources
Application scheduling
Applications Manager
• Node Manager
Per-machine agent
Manages life-cycle of container
Monitors resources
• Application Master
Per-application
Manages application scheduling and task execution
Hadoop v2 (YARN) Architecture
Image Source: hadoop.apache.org
Application Submission workflow
YarnClient
Node RM
(ApplicationsManagers +
Scheduler)
Resource Manager
Node
NM
Node Manager
Node
NM
Node Manager
Application
Master
Container
Container
1) Submit application
2) Launch application Master
RM = Resource Manager
NM = Node Manager
AM = Application Master
= Heartbeats
3) AM registers with RM
4) AM negotiates for containers
5) Launch Container
Application Masters - One for each Application Type
MapReduce Application
MapReduce
Application Master
Apex Application
Apex
Application Master
(StrAM)
Flink Application
Flink
Application Master
Giraph Application
Giraph
Application Master
Already provided by
Hadoop as a backward
compatibility option for
MapReduce
Provided by Apache
Apex
●YARN enables non-MapReduce applications to run in a distributed fashion
●Each Application first asks for a container for the Application Master
○The Application Master then talks to YARN to get resources needed by
the application
○Once YARN allocates containers as requested to the Application Master,
it starts the application components in those containers.
●Hadoop is no more just batch processing!!
Key Takeaways
References
● Simple Yarn code example
○ https://github.com/hortonworks/simple-yarn-app
● Document references
○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/
○ http://www.slideshare.net/
● Acknowledgements
○ Priyanka Gugale, DataTorrent - Slide deck
Thank You!!
Please send your questions at:
bhupesh@apache.org / bhupesh@datatorrent.com

Introduction to Yarn

  • 1.
  • 2.
  • 3.
    Why YARN Hadoop v1(MR1) Architecture ● Job Tracker ○ Manages cluster resources ○ Job scheduling ○ Bottleneck ● Task Tracker ○ Per-node Agent ○ Manages tasks ○ Map / Reduce task slots MapReduce Status Job Submission Job Tracker Task Task Task Task Client Client Task Tracker Task Task Task Tracker Task Tracker
  • 4.
    Limitations with MR1 •Scalability Maximum cluster size: 4,000 nodes Maximum concurrent tasks: 40,000 • Availability - Job Tracker is a SPOF • Resource Utilization - Map / Reduce slots • Runs only MapReduce applications Why YARN (Cont…)
  • 6.
    Introducing YARN ● YARN- Yet Another Resource Negotiator ● Framework that facilitates writing arbitrary distributed processing frameworks and applications. ● YARN Applications/frameworks: e.g. MapReduce2, Apache Spark, Apache Giraph, Apache Apex etc. Image Source: http://tm.durusau.net/?cat=1525
  • 7.
    Hadoop beyond Batch YARNfor better resource utilization More applications than MapReduce
  • 8.
    Comparing MapReduce withYARN MapReduce YARN ≈ ≈ ≈ 8Proprietary and Confidential Job Tracker Resource Manager Application Master Task Tracker Node Manager Map Slot Reduce Slot Backward Compatibility Maintained! ● Existing Map Reduce jobs run as is on the YARN framework ● No Job Tracker and Task Tracker processes
  • 9.
    • Resource Manager Managesand allocates cluster resources Application scheduling Applications Manager • Node Manager Per-machine agent Manages life-cycle of container Monitors resources • Application Master Per-application Manages application scheduling and task execution Hadoop v2 (YARN) Architecture Image Source: hadoop.apache.org
  • 10.
    Application Submission workflow YarnClient NodeRM (ApplicationsManagers + Scheduler) Resource Manager Node NM Node Manager Node NM Node Manager Application Master Container Container 1) Submit application 2) Launch application Master RM = Resource Manager NM = Node Manager AM = Application Master = Heartbeats 3) AM registers with RM 4) AM negotiates for containers 5) Launch Container
  • 11.
    Application Masters -One for each Application Type MapReduce Application MapReduce Application Master Apex Application Apex Application Master (StrAM) Flink Application Flink Application Master Giraph Application Giraph Application Master Already provided by Hadoop as a backward compatibility option for MapReduce Provided by Apache Apex
  • 12.
    ●YARN enables non-MapReduceapplications to run in a distributed fashion ●Each Application first asks for a container for the Application Master ○The Application Master then talks to YARN to get resources needed by the application ○Once YARN allocates containers as requested to the Application Master, it starts the application components in those containers. ●Hadoop is no more just batch processing!! Key Takeaways
  • 14.
    References ● Simple Yarncode example ○ https://github.com/hortonworks/simple-yarn-app ● Document references ○ https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html ○ http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/ ○ http://www.slideshare.net/ ● Acknowledgements ○ Priyanka Gugale, DataTorrent - Slide deck
  • 15.
    Thank You!! Please sendyour questions at: bhupesh@apache.org / bhupesh@datatorrent.com