The Next generation of Hadoop version from the Apache Software Foundation with a detailed comparison of Map-Reduce V1 versus Yarn and the Architecture with important updates
Let me know if anything is required. Happy to help.
Ping me google #bobrupakroy.
Talk soon!
2. Yet Another Resource Negotiator(YARN)
YARN is the 2nd generation of hadoop version of
the Apache Software Foundation.
Jobtracker of hadoop v1 get chocked up from
traffic. To overcome this issue Apache foundation
came up with new technology call Resource
Manager in YARN. So instead of jobtracker and
tasktracker, there is newly developed resource
manger and a application master.
Resource manager consists of Scheduler that
schedules activities and Application Manager for
resource allocation and monitoring.
Rupak Roy
3. Resource Manager is a part of Master node and
Node Manager and Application Master(here its
application master not manager) is a part of Slave
node.
Application Master: equivalent to Task Tracker takes
care of task execution and updation.
Node Manager: takes care of the individual nodes in
a hadoop cluster that includes keeping up-to-date
with the ResourceManager(RM), monitoring usage of
resource, node health and logs management.
The term Container in YARN means encapsulation of
resources.
Rupak Roy
5. Some of the important Updates
in hadoop v2.0 (YARN)
YARN provides central resource manager. There is
no fixed map-reduce slots, so that multiple
applications can be executed with all sharing a
common resource.
YARN can handle more than 8000 plus cluster
than its predecessor.
Hadoop v1 supports only batch jobs like
MapReduce jobs, so it is upgraded to YARN which
supports both batch and non-batch oriented jobs.
YARN is also optimized for machine learning
oriented jobs.
Rupak Roy
6. YARN(MRv2) and MapReduce(MRv1) schedulers
Scheduler determines which jobs run where and when
and the resources allocated to them.
1. First In, First Out (FIFO): allocate resources based on
“who comes first gets first” i.e. the job that got
submitted first gets maximum resources to complete
the job. However a drawback of FIFO is a second job
of higher priority has to wait for the first job to finish
releasing all the resources required by the second job.
2. Capacity Scheduler: is based on the concept of
queues. The queues are typically setup by
administrators to limit resources. Jobs that require high
resources are placed in higher queues to ensure that a
single application or queue cannot consume
disproportionate amount of resources in the cluster.
Rupak Roy
7. Fair Scheduler: is the word itself self-explanatory
which allocates resources according the
requirements of the job. If the second job finishes
its job before the first job then the resources of
second job are freed for the first job which
requires more resources to complete its job in
time.
The 2nd generations of hadoop or above like
Cloudera manager CDH 5 and 4 are set to Fair
Scheduler by default.
Rupak Roy