What’s in it for you?
Hadoop 1.0 (MR 1)
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
Workloads running on
YARN
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
YARN Architecture
What’s in it for you?
Hadoop 1.0 (MR 1)
Limitations of Hadoop
1.0 (MR 1)
Need for YARN
What is YARN?
Workloads running on
YARN
YARN Components
YARN Architecture
Demo on YARN
Hadoop 1.0 (MR 1)
Hadoop 1.0 (MR 1)
HDFS
(data storage)
MapReduce
(data processing)
Hadoop 1.0
In Hadoop 1.0, MapReduce performed both
data processing and resource management
Data processing Resource management
Hadoop 1.0 (MR 1)
Job
Tracker
Task
Tracker
Allocated resources, performed
scheduling and monitored jobs
MapReduce consisted of
Job Tracker and Task Tracker
Task Trackers reported their progress
to the Job Tracker
Assigned map and reduce tasks to jobs
running on Task Trackers
Task Trackers processed the jobs
Client
Client
Job
Tracker
Clie
nt
Job Submission
Hadoop 1.0 (MR 1)
Client
Client
Job
Tracker
Clie
nt
Job Submission
Hadoop 1.0 (MR 1)
Hadoop Master
Daemon
Client
Client
Job
Tracker
Clie
nt
Task
Tracker
Task Task
Task
Tracker
Task Task
Task
Tracker
Task Task
Hadoop 1.0 (MR 1)
Job Submission
Client
Client
Job
Tracker
Clie
nt
Task
Tracker
Task Task
Task
Tracker
Task Task
Task
Tracker
Task Task
Hadoop 1.0 (MR 1)
Job Submission
Hadoop Slave
Daemons
Client
Client
Job
Tracker
Clie
nt
Task
Tracker
Task Task
Task
Tracker
Task Task
Task
Tracker
Task Task
MapReduce Status
Hadoop 1.0 (MR 1)
Job Submission
Slave
daemon
Slave
daemon
Slave
daemon
Client
Client
Job
Tracker
Clie
nt
Task
Tracker
Task Task
Task
Tracker
Task Task
Task
Tracker
Task Task
Hadoop 1.0 (MR 1)
Slave
daemon
Slave
daemon
Slave
daemonManaging jobs using a single job tracker and utilization of computational
resources was inefficient in MR 1
Limitations of Hadoop 1.0 (MR 1)
Due to a single JobTracker, scalability
became a bottleneck.
Cannot have a cluster size of more than
4000 nodes and cannot run
more than 40000 concurrent tasks
Scalability1
Limitations of Hadoop 1.0 (MR 1)
JobTracker is single point of
failure. Any failure kills all queued
and running jobs. Jobs need to be
resubmitted by
users
Availability issue2
Due to a single JobTracker, scalability
became a bottleneck.
Maximum cluster size – 4000 nodes
Maximum concurrent tasks - 40000
Scalability1
Limitations of Hadoop 1.0 (MR 1)
Due to predefined number of map
and reduce slots for each
TaskTracker, resource utilization
issues occur
Resource Utilization3
Limitations of Hadoop 1.0 (MR 1)
Problem in performing real-time
analysis and running Ad-hoc query as
MapReduce is batch driven
Limitations in running non-
MapReduce applications4
Due to predefined number of map
and reduce slots for each
TaskTracker, resource utilization
issues occur
Resource Utilization3
Need for YARN
HDFS
(data storage)
MapReduce
(data processing)
Hadoop 1.0
Before YARN
Need for YARN
Designed to run MapReduce jobs only and
had issues in scalability, resource
utilization, etc.
MapReduce
(data processing)
Other frameworks
(processing)
YARN
(cluster resource management)
HDFS
(data storage)
Hadoop 2.0
YARN solved those issues and users could
work on multiple processing models along
with MapReduce
HDFS
(data storage)
MapReduce
(data processing)
Hadoop 1.0
Designed to run MapReduce jobs only and
had issues in scalability, resource
utilization, etc.
Before YARN After YARN
Need for YARN
Hadoop 2.0 (YARN)
Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Compatibility
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Resource utilizationCompatibility
Allows dynamic
allocation of cluster
resources to improve
resource utilization
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
Solution - Hadoop 2.0 (YARN)
Scalability
Can have a cluster size of
more than 10,000 nodes
and can run
more than 1,00,000
concurrent tasks
Resource utilization Multitenancy
Can use open-source and
propriety data access
engines and perform real-
time analysis and running
ad-hoc query
Compatibility
Allows dynamic
allocation of cluster
resources to improve
resource utilization
Applications developed for
Hadoop 1 runs on YARN
without any disruption or
availability issues
What is YARN?
What is YARN?
YARN – Yet Another Resource Negotiator
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
What is YARN?
YARN – Yet Another Resource Negotiator
I want resources to
run my applications
MapReduce
Application
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
What is YARN?
YARN – Yet Another Resource Negotiator
Memory
Network CPU
YARN provides the desired
resources
I want resources to
run my applications
MapReduce
Application
YARN is the cluster resource management layer of the Apache Hadoop Ecosystem,
which schedules jobs and assigns resources
Workloads running on YARN
Hadoop Distributed
File System
Cluster Resource
Management
BATCH
(MapReduce)
INTERACTIVE
(Tez)
Column
Oriented
Database
(HBase)
STREAMING
(Storm)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
OTHERS
(Weave)
List of frameworks that runs on top of YARN:
YARN Components
YARN Components
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Submit job
request
A general overview of YARN architectural components
Applications
Manager
Scheduler
YARN Components
Node
Manager
Node
Manager
Node
Manager
Container App Master Container App Master Container App Master
Datanode Datanode Datanode
4 main components – Resource Manager, Node
Manager, Container and App Master
Scheduler
Applications
Manager
Resource
Manager
YARN Components –
Resource Manager
YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Ultimate authority that decides the
allocation of resources among all
the applications in the system
YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Responsible for allocating resources to
various running applications
Does not perform monitoring or tracking
of status for the applications
Offers no guarantee about restarting
failed tasks due to hardware or
application failures
YARN Components – Resource Manager
Scheduler
Applications
Manager
Resource
Manager
Responsible for allocating resources to
various running applications
Does not perform monitoring or tracking
of status for the applications
Offers no guarantee about restarting
failed tasks due to hardware or
application failures
Responsible for accepting job-
submissions
Negotiates the first container for
executing the application specific
ApplicationMaster
Provides the service for restarting the
ApplicationMaster container on failure
YARN Components –
Node Manager
YARN Components – Node Manager
Container App Master
Node
Manager
Slaves track processes and running
jobs and monitor each container’s
resource utilization
YARN Components – Node Manager
Container App Master
Node
Manager
Has a collection of resources like CPU,
memory, disk, network, etc.
Authenticates and provides rights to an
application to use specific amount of
resources
Node Manager
Monitors
Resource Usage,
CPU, Memory, etc.
YARN Components – Node Manager
Container App Master
Node
Manager
Has a collection of resources like CPU,
memory, disk, network, etc.
Authenticates and provides rights to an
application to use specific amount of
resources
Application Master manages resource needs of
individual applications
Interacts with Scheduler to acquire required
resources and Node Manager to execute and
monitor tasks
Node Manager
Monitors
Resource Usage,
CPU, Memory, etc. Resource
Manager
Application
Master
Node
Manager
Interacts Interacts
YARN Architecture
YARN Architecture
Client
YARN Architecture
Resource
ManagerClient
Job Submission
Submit job
request
YARN Architecture
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Submit job
request
YARN Architecture
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
Submit job
request
YARN Architecture
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Submit job
request
YARN Architecture
Resource
ManagerClient
Node
Manager
container
App Master
App Master
container
Node
Manager
Node
Manager
container container
Job Submission
Node Status
MapReduce Status
Resource Request
Submit job
request
Running an application in
YARN
Running an application in YARN
Client
Client submits an application to the ResourceManager1
Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
1
2
Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
1
2
3
Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
Node
Manager
NodeManager launches the container
1
2
3
4
Running an application in YARN
Client
Client submits an application to the ResourceManager
Resource
Manager
ResourceManager allocates a container
App Master
ApplicationMaster contacts the related NodeManager
Node
Manager
NodeManager launches the container
container Container executes the ApplicationMaster
1
2
3
4
5
Demo on YARN
So what’s
your next step?

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutorial | Simplilearn

  • 2.
    What’s in itfor you? Hadoop 1.0 (MR 1)
  • 3.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1)
  • 4.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN Workloads running on YARN
  • 5.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN What is YARN?
  • 6.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN What is YARN? Workloads running on YARN
  • 7.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN What is YARN? Workloads running on YARN YARN Components
  • 8.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN What is YARN? Workloads running on YARN YARN Components YARN Architecture
  • 9.
    What’s in itfor you? Hadoop 1.0 (MR 1) Limitations of Hadoop 1.0 (MR 1) Need for YARN What is YARN? Workloads running on YARN YARN Components YARN Architecture Demo on YARN
  • 10.
  • 11.
    Hadoop 1.0 (MR1) HDFS (data storage) MapReduce (data processing) Hadoop 1.0 In Hadoop 1.0, MapReduce performed both data processing and resource management Data processing Resource management
  • 12.
    Hadoop 1.0 (MR1) Job Tracker Task Tracker Allocated resources, performed scheduling and monitored jobs MapReduce consisted of Job Tracker and Task Tracker Task Trackers reported their progress to the Job Tracker Assigned map and reduce tasks to jobs running on Task Trackers Task Trackers processed the jobs
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Client Client Job Tracker Clie nt Task Tracker Task Task Task Tracker Task Task Task Tracker TaskTask MapReduce Status Hadoop 1.0 (MR 1) Job Submission Slave daemon Slave daemon Slave daemon
  • 18.
    Client Client Job Tracker Clie nt Task Tracker Task Task Task Tracker Task Task Task Tracker TaskTask Hadoop 1.0 (MR 1) Slave daemon Slave daemon Slave daemonManaging jobs using a single job tracker and utilization of computational resources was inefficient in MR 1
  • 19.
    Limitations of Hadoop1.0 (MR 1) Due to a single JobTracker, scalability became a bottleneck. Cannot have a cluster size of more than 4000 nodes and cannot run more than 40000 concurrent tasks Scalability1
  • 20.
    Limitations of Hadoop1.0 (MR 1) JobTracker is single point of failure. Any failure kills all queued and running jobs. Jobs need to be resubmitted by users Availability issue2 Due to a single JobTracker, scalability became a bottleneck. Maximum cluster size – 4000 nodes Maximum concurrent tasks - 40000 Scalability1
  • 21.
    Limitations of Hadoop1.0 (MR 1) Due to predefined number of map and reduce slots for each TaskTracker, resource utilization issues occur Resource Utilization3
  • 22.
    Limitations of Hadoop1.0 (MR 1) Problem in performing real-time analysis and running Ad-hoc query as MapReduce is batch driven Limitations in running non- MapReduce applications4 Due to predefined number of map and reduce slots for each TaskTracker, resource utilization issues occur Resource Utilization3
  • 23.
  • 24.
    HDFS (data storage) MapReduce (data processing) Hadoop1.0 Before YARN Need for YARN Designed to run MapReduce jobs only and had issues in scalability, resource utilization, etc.
  • 25.
    MapReduce (data processing) Other frameworks (processing) YARN (clusterresource management) HDFS (data storage) Hadoop 2.0 YARN solved those issues and users could work on multiple processing models along with MapReduce HDFS (data storage) MapReduce (data processing) Hadoop 1.0 Designed to run MapReduce jobs only and had issues in scalability, resource utilization, etc. Before YARN After YARN Need for YARN
  • 26.
  • 27.
    Solution - Hadoop2.0 (YARN) Scalability Can have a cluster size of more than 10,000 nodes and can run more than 1,00,000 concurrent tasks
  • 28.
    Solution - Hadoop2.0 (YARN) Scalability Can have a cluster size of more than 10,000 nodes and can run more than 1,00,000 concurrent tasks Compatibility Applications developed for Hadoop 1 runs on YARN without any disruption or availability issues
  • 29.
    Solution - Hadoop2.0 (YARN) Scalability Can have a cluster size of more than 10,000 nodes and can run more than 1,00,000 concurrent tasks Resource utilizationCompatibility Allows dynamic allocation of cluster resources to improve resource utilization Applications developed for Hadoop 1 runs on YARN without any disruption or availability issues
  • 30.
    Solution - Hadoop2.0 (YARN) Scalability Can have a cluster size of more than 10,000 nodes and can run more than 1,00,000 concurrent tasks Resource utilization Multitenancy Can use open-source and propriety data access engines and perform real- time analysis and running ad-hoc query Compatibility Allows dynamic allocation of cluster resources to improve resource utilization Applications developed for Hadoop 1 runs on YARN without any disruption or availability issues
  • 31.
  • 32.
    What is YARN? YARN– Yet Another Resource Negotiator YARN is the cluster resource management layer of the Apache Hadoop Ecosystem, which schedules jobs and assigns resources
  • 33.
    What is YARN? YARN– Yet Another Resource Negotiator I want resources to run my applications MapReduce Application YARN is the cluster resource management layer of the Apache Hadoop Ecosystem, which schedules jobs and assigns resources
  • 34.
    What is YARN? YARN– Yet Another Resource Negotiator Memory Network CPU YARN provides the desired resources I want resources to run my applications MapReduce Application YARN is the cluster resource management layer of the Apache Hadoop Ecosystem, which schedules jobs and assigns resources
  • 35.
    Workloads running onYARN Hadoop Distributed File System Cluster Resource Management BATCH (MapReduce) INTERACTIVE (Tez) Column Oriented Database (HBase) STREAMING (Storm) GRAPH (Giraph) IN-MEMORY (Spark) OTHERS (Weave) List of frameworks that runs on top of YARN:
  • 36.
  • 37.
    YARN Components Resource ManagerClient Node Manager container App Master AppMaster container Node Manager Node Manager container container Submit job request A general overview of YARN architectural components Applications Manager Scheduler
  • 38.
    YARN Components Node Manager Node Manager Node Manager Container AppMaster Container App Master Container App Master Datanode Datanode Datanode 4 main components – Resource Manager, Node Manager, Container and App Master Scheduler Applications Manager Resource Manager
  • 39.
  • 40.
    YARN Components –Resource Manager Scheduler Applications Manager Resource Manager Ultimate authority that decides the allocation of resources among all the applications in the system
  • 41.
    YARN Components –Resource Manager Scheduler Applications Manager Resource Manager Responsible for allocating resources to various running applications Does not perform monitoring or tracking of status for the applications Offers no guarantee about restarting failed tasks due to hardware or application failures
  • 42.
    YARN Components –Resource Manager Scheduler Applications Manager Resource Manager Responsible for allocating resources to various running applications Does not perform monitoring or tracking of status for the applications Offers no guarantee about restarting failed tasks due to hardware or application failures Responsible for accepting job- submissions Negotiates the first container for executing the application specific ApplicationMaster Provides the service for restarting the ApplicationMaster container on failure
  • 43.
  • 44.
    YARN Components –Node Manager Container App Master Node Manager Slaves track processes and running jobs and monitor each container’s resource utilization
  • 45.
    YARN Components –Node Manager Container App Master Node Manager Has a collection of resources like CPU, memory, disk, network, etc. Authenticates and provides rights to an application to use specific amount of resources Node Manager Monitors Resource Usage, CPU, Memory, etc.
  • 46.
    YARN Components –Node Manager Container App Master Node Manager Has a collection of resources like CPU, memory, disk, network, etc. Authenticates and provides rights to an application to use specific amount of resources Application Master manages resource needs of individual applications Interacts with Scheduler to acquire required resources and Node Manager to execute and monitor tasks Node Manager Monitors Resource Usage, CPU, Memory, etc. Resource Manager Application Master Node Manager Interacts Interacts
  • 47.
  • 48.
  • 49.
  • 50.
    YARN Architecture Resource ManagerClient Node Manager container App Master AppMaster container Node Manager Node Manager container container Job Submission Submit job request
  • 51.
    YARN Architecture Resource ManagerClient Node Manager container App Master AppMaster container Node Manager Node Manager container container Job Submission Node Status Submit job request
  • 52.
    YARN Architecture Resource ManagerClient Node Manager container App Master AppMaster container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Submit job request
  • 53.
    YARN Architecture Resource ManagerClient Node Manager container App Master AppMaster container Node Manager Node Manager container container Job Submission Node Status MapReduce Status Resource Request Submit job request
  • 54.
  • 55.
    Running an applicationin YARN Client Client submits an application to the ResourceManager1
  • 56.
    Running an applicationin YARN Client Client submits an application to the ResourceManager Resource Manager ResourceManager allocates a container 1 2
  • 57.
    Running an applicationin YARN Client Client submits an application to the ResourceManager Resource Manager ResourceManager allocates a container App Master ApplicationMaster contacts the related NodeManager 1 2 3
  • 58.
    Running an applicationin YARN Client Client submits an application to the ResourceManager Resource Manager ResourceManager allocates a container App Master ApplicationMaster contacts the related NodeManager Node Manager NodeManager launches the container 1 2 3 4
  • 59.
    Running an applicationin YARN Client Client submits an application to the ResourceManager Resource Manager ResourceManager allocates a container App Master ApplicationMaster contacts the related NodeManager Node Manager NodeManager launches the container container Container executes the ApplicationMaster 1 2 3 4 5
  • 60.
  • 61.

Editor's Notes