Yarn and MapReduce v2 
©2014 Zaloni, Inc. All Rights Reserved.
Agenda 
 What is YARN? 
 Why YARN ? 
 Components of YARN 
 Architecture 
 API in MRv2 
 Gain with MRv2 
 Failure in MRv2 
©2014 Zaloni, Inc. All Rights Reserved.
So What YARN is really 
??? 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
YARN – (Yet another resource negotiator) 
And is responsible for 
•Cluster resource management 
•Scheduling 
Various applications may run on YARN- MapReduce is just a 
choice. 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
HADOOP 1.0 
MapReduce 
(cluster resource management 
& data processing) 
HDFS 
(redundant, reliable storage) 
HADOOP 2.0 
YARN 
MapReduce 
(data processing) 
Others 
(data processing) 
(cluster resource management) 
HDFS2 
(redundant, reliable storage) 
Single Use System 
Batch Apps 
Multi Purpose Platform 
Batch, Interactive, Online, Streaming, … 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
Store ALL DATA in one place…Interact with 
Applications Run Natively IN Hadoop 
YARN (Cluster Resource Management) 
HDFS2 (Redundant, Reliable Storage) 
©2014 Zaloni, Inc. All Rights Reserved. 
BATCH 
(MapReduce) 
INTERACTIVE 
(Tez) 
STREAMING 
(Storm, S4,…) 
GRAPH 
(Giraph) 
IN-MEMORY 
(Spark) 
HPC MPI 
(OpenMPI) 
ONLINE 
(HBase) 
OTHER 
(Search) 
(Weave…) 
that data in MULTIPLE WAYS
Why YARN and MRv2 ??? 
©2014 Zaloni, Inc. All Rights Reserved.
Hadoop MapReduce Classic 
©2014 Zaloni, Inc. All Rights Reserved. 
JobTracker 
Manages cluster resources 
and job scheduling 
TaskTracker 
Per-node agent 
Manage tasks
MRv1 Limitations 
• Scalability – JT limits horizontal scaling 
• Cluster utilization- Fixed sized slots degrade the cluster 
utilization 
• Availability – when JT dies, jobs must restart 
• Upgradability – must stop jobs to upgrade JT 
• Hardwired – JT only supports MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
ResourceManager 
©2014 Zaloni, Inc. All Rights Reserved. 
JobTracker 
ApplicationMaster
Architecture 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Components 
So, What was Developed 
• Resource Manager 
• Node Manager 
• Application Master 
• Container 
©2014 Zaloni, Inc. All Rights Reserved.
• Manages the global assignment of 
compute resources to applications. 
• A pure Scheduler 
• No monitoring, tracking status of 
application 
Resource Manager (RM) 
©2014 Zaloni, Inc. All Rights Reserved.
• Each client/application may 
request multiple resources 
– Memory 
– Network 
– Cpu 
– Disk .. 
• This is a significant change from 
static Mapper / Reducer model 
Resource Manager (RM) 
©2014 Zaloni, Inc. All Rights Reserved.
Application Master 
• A per – application ApplicationMaster 
(AM) that manages the application’s life 
cycle (scheduling and coordination). 
• An application is either a single job in the 
classic MapReduce jobs or a DAG of such 
jobs. 
©2014 Zaloni, Inc. All Rights Reserved.
• Application Master has 
the responsibility of 
– negotiating appropriate resource 
containers from the Scheduler 
– launching tasks 
– tracking their status 
– monitoring for progress 
– handling task-failures. 
©2014 Zaloni, Inc. All Rights Reserved. 
Application Master
NodeManager : per-machine 
framework agent 
– responsible for launching the 
applications’ containers, 
monitoring their resource usage 
(cpu, memory, disk, network) 
and reporting the same to the 
Scheduler. 
©2014 Zaloni, Inc. All Rights Reserved. 
Node Manager
– Basic unit of allocation monitoring their 
resource usage (cpu, memory, disk, 
network) 
– Fine-grained resource allocation across 
multiple resource types (memory, cpu, 
disk, network, gpu etc.) 
– Replaces the fixed map/reduce slots 
©2014 Zaloni, Inc. All Rights Reserved. 
Container
Lifecycle of a job 
Here you are 
Do work! 
I need resources! 
Client 
Resource 
Manager 
App 
Master 
Submit 
OK 
Done? 
No 
Done? 
No 
Done? 
Yes 
Go 
Containers 
Done 
Done 
Node 
Managers 
Start containers 
Here you are 
©2014 Zaloni, Inc. All Rights Reserved.
Lifecycle of a job 
©2014 Zaloni, Inc. All Rights Reserved.
Job execution on MRv2 
1. Client submits MapReduce job by interacting with Job objects; 
2. Job’s code interacts with Resource Manager to acquire application meta-data, 
such as application id 
3. Job’s code moves all the job related resources to HDFS to make them 
available for the rest of the job 
4. Job’s code submits the application to Resource Manager 
5. Resource Manager chooses a Node Manager with available resources and 
requests a container for MRAppMaster 
6. Node Manager allocates container for MRAppMaster; MRAppMaster will 
execute and coordinate MapReduce job 
©2014 Zaloni, Inc. All Rights Reserved.
Job execution on MRv2 
7. MRAppMaster grabs required resource from HDFS copied there in step 3 
8. MRAppMaster negotiates with Resource Manager for available resources; 
Resource Manager will select Node Manager that has the most resources 
9. MRAppMaster tells selected NodeManager to start Map and Reduce tasks 
10.NodeManager creates YarnChild containers that will coordinate and run tasks 
11.YarnChild acquires job resources from HDFS that will be required to execute 
Map and Reduce tasks 
12.YarnChild executes Map and Reduce tasks 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ResourceRequest 
–Fine-grained resource ask to the ResourceManager 
–Ask for a specific amount of resources (memory, cpu etc.) on a 
specific machine or rack 
–Use special value of * for resource name for any machine 
© Hortonworks Inc. 2013 
ResourceRequest 
priority 
resourceName 
capability 
numContainers 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ResourceRequest 
priority capability resourceName numContainers 
0 <2gb, 1 core> 
© Hortonworks Inc. 2013 
host01 1 
rack0 1 
* 1 
1 <4gb, 1 core> * 1 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
Container 
–The basic unit of allocation in YARN 
–The result of the ResourceRequest provided by 
ResourceManager to the ApplicationMaster 
–A specific amount of resources (cpu, memory etc.) on a specific 
machine Container 
© Hortonworks Inc. 2013 
containerId 
resourceName 
capability 
tokens 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ContainerLaunchContext 
– The context provided by ApplicationMaster to NodeManager to launch the 
Container 
– Complete specification for a process 
– LocalResource used to specify container binary and dependencies 
– NodeManager responsible for downloading from shared namespace 
(typically HDFS) 
ContainerLaunchContext 
© Hortonworks Inc. 2013 
container 
commands 
environment 
localResources 
LocalResource 
uri 
type 
©2014 Zaloni, Inc. All Rights Reserved.
© Hortonworks Inc. 2013 
YARN - ApplicationMaster 
ApplicationMaster 
–ApplicationSubmissionContext is the complete 
specification of the ApplicationMaster, provided by Client 
–ResourceManager responsible for allocating and launching 
ApplicationMaster container 
ApplicationSubmissionContext 
resourceRequest 
containerLaunchContext 
appName 
queue 
©2014 Zaloni, Inc. All Rights Reserved.
API in MRv2 
ClientRMProtocol (Client—RM ) : 
This is the protocol for a client to communicate with the RM to launch a 
new application (i.e. an AM), check on the status of the application or kill 
the application. 
AMRMProtocol (AM—RM) : 
This is the protocol used by the AM to register/unregister itself with the 
RM, as well as to request resources from the RM Scheduler to run its 
tasks. 
ContainerManager (AM – NM) : 
This is the protocol used by the AM to communicate with the NM to start 
or stop containers and to get status updates on its containers. 
©2014 Zaloni, Inc. All Rights Reserved.
ResourceManager 
YARN Application API – ClientRM 
NodeManager NodeManager NodeManager NodeManager 
NodeManager NodeManager NodeManager 
NodeManager 
Client2 
Application Request: 
YarnClient.createApplication 
Submit 
Application:YarnClient.submitApplication 
1 
2 
Scheduler 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Application API – AMRM 
ResourceManager 
AMRMClient.allocate 
Container 
Scheduler 
NodeManager NodeManager NodeManager NodeManager 
NodeManager NodeManager NodeManager 
AM 
unregisterApplicationMaster 
registerApplicationMaster 
4 
1 
2 
3 
NodeManager NodeManager NodeManager NodeManager 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Application API – ContainerManager 
ResourceManager 
Scheduler 
NodeManager NodeManager NodeManager 
Container 1.1 
AMNMClient.startContainer 
NodeManager NodeManager NodeManager 
AM 1 
AMNMClient.getContainerStatus 
NodeManager NodeManager NodeManager NodeManager 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• RM and Job manager segregated 
• The Hadoop MapReduce 
JobTracker spends a very significant 
portion of time and effort managing 
the life cycle of applications 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• ResourceManage 
– Uses ZooKeeper for fail-over. 
– When primary fails, secondary can 
quickly start using the state stored in 
ZK 
• Application Master 
– MapReduce ApplicationMaster can 
recover from failures by restoring 
itself with the help of checkpoint. 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• MRv2 uses wire-compatible 
protocols to allow different 
versions of servers and clients to 
communicate with each other. 
• Rolling upgrades for the cluster in 
future. 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• New framework is generic. 
– Different versions of MR running in 
parallel 
– End users can upgrade to MR 
versions on their own schedule 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• MRv2 uses a general concept of a 
resource for scheduling and allocating to 
individual applications. 
• Better cluster utilization 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
Store all data in one place and use 
them for more than one application 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
For MapReduce programs running on YARN, we need to consider the 
failure of any of the following entities: the task, the application master, the 
node manager, and the resource manager. 
• Task fail :Failure of the running task is similar to the classic case. Runtime 
exceptions , sudden exits of the JVM ,timed out tasks are marked as failed. 
a task is marked as failed after four attempts (set by mapreduce.map 
.maxattempts for map tasks and mapreduce.reduce.maxsttempts for 
reducer tasks). 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
• NodeManager fail: If a node manager fails, then it will stop sending 
heartbeats to the resource manager, and the node manager will be 
removed from the resource manager’s pool of available nodes. 
• Node managers may be blacklisted if the number of failures for the 
application is high. Blacklisting is done by the application master, and for 
MapReduce the application master will try to reschedule tasks on different 
nodes if more than three tasks fail on a node manager. The threshold may 
be set with mapreduce.job.maxtaskfai lures.per.tracker. 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
• Application master fail: An application master sends periodic 
heartbeats to the resource manager, and in the event of application master 
failure, the resource manager will detect the failure and start a new instance 
of the master running in a new container (managed by a node manager) 
• ResourceManager fail : Most critical as this failure can shut down 
the whole process. Eliminated by checkpoints, or standby node (HA ). 
©2014 Zaloni, Inc. All Rights Reserved.

Yarn

  • 1.
    Yarn and MapReducev2 ©2014 Zaloni, Inc. All Rights Reserved.
  • 2.
    Agenda  Whatis YARN?  Why YARN ?  Components of YARN  Architecture  API in MRv2  Gain with MRv2  Failure in MRv2 ©2014 Zaloni, Inc. All Rights Reserved.
  • 3.
    So What YARNis really ??? ©2014 Zaloni, Inc. All Rights Reserved.
  • 4.
    YARN INTRODUCTION YARN– (Yet another resource negotiator) And is responsible for •Cluster resource management •Scheduling Various applications may run on YARN- MapReduce is just a choice. ©2014 Zaloni, Inc. All Rights Reserved.
  • 5.
    YARN INTRODUCTION HADOOP1.0 MapReduce (cluster resource management & data processing) HDFS (redundant, reliable storage) HADOOP 2.0 YARN MapReduce (data processing) Others (data processing) (cluster resource management) HDFS2 (redundant, reliable storage) Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … ©2014 Zaloni, Inc. All Rights Reserved.
  • 6.
    YARN INTRODUCTION StoreALL DATA in one place…Interact with Applications Run Natively IN Hadoop YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) ©2014 Zaloni, Inc. All Rights Reserved. BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) that data in MULTIPLE WAYS
  • 7.
    Why YARN andMRv2 ??? ©2014 Zaloni, Inc. All Rights Reserved.
  • 8.
    Hadoop MapReduce Classic ©2014 Zaloni, Inc. All Rights Reserved. JobTracker Manages cluster resources and job scheduling TaskTracker Per-node agent Manage tasks
  • 9.
    MRv1 Limitations •Scalability – JT limits horizontal scaling • Cluster utilization- Fixed sized slots degrade the cluster utilization • Availability – when JT dies, jobs must restart • Upgradability – must stop jobs to upgrade JT • Hardwired – JT only supports MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 10.
    ResourceManager ©2014 Zaloni,Inc. All Rights Reserved. JobTracker ApplicationMaster
  • 11.
    Architecture ©2014 Zaloni,Inc. All Rights Reserved.
  • 12.
    YARN Components So,What was Developed • Resource Manager • Node Manager • Application Master • Container ©2014 Zaloni, Inc. All Rights Reserved.
  • 13.
    • Manages theglobal assignment of compute resources to applications. • A pure Scheduler • No monitoring, tracking status of application Resource Manager (RM) ©2014 Zaloni, Inc. All Rights Reserved.
  • 14.
    • Each client/applicationmay request multiple resources – Memory – Network – Cpu – Disk .. • This is a significant change from static Mapper / Reducer model Resource Manager (RM) ©2014 Zaloni, Inc. All Rights Reserved.
  • 15.
    Application Master •A per – application ApplicationMaster (AM) that manages the application’s life cycle (scheduling and coordination). • An application is either a single job in the classic MapReduce jobs or a DAG of such jobs. ©2014 Zaloni, Inc. All Rights Reserved.
  • 16.
    • Application Masterhas the responsibility of – negotiating appropriate resource containers from the Scheduler – launching tasks – tracking their status – monitoring for progress – handling task-failures. ©2014 Zaloni, Inc. All Rights Reserved. Application Master
  • 17.
    NodeManager : per-machine framework agent – responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler. ©2014 Zaloni, Inc. All Rights Reserved. Node Manager
  • 18.
    – Basic unitof allocation monitoring their resource usage (cpu, memory, disk, network) – Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – Replaces the fixed map/reduce slots ©2014 Zaloni, Inc. All Rights Reserved. Container
  • 19.
    Lifecycle of ajob Here you are Do work! I need resources! Client Resource Manager App Master Submit OK Done? No Done? No Done? Yes Go Containers Done Done Node Managers Start containers Here you are ©2014 Zaloni, Inc. All Rights Reserved.
  • 20.
    Lifecycle of ajob ©2014 Zaloni, Inc. All Rights Reserved.
  • 21.
    Job execution onMRv2 1. Client submits MapReduce job by interacting with Job objects; 2. Job’s code interacts with Resource Manager to acquire application meta-data, such as application id 3. Job’s code moves all the job related resources to HDFS to make them available for the rest of the job 4. Job’s code submits the application to Resource Manager 5. Resource Manager chooses a Node Manager with available resources and requests a container for MRAppMaster 6. Node Manager allocates container for MRAppMaster; MRAppMaster will execute and coordinate MapReduce job ©2014 Zaloni, Inc. All Rights Reserved.
  • 22.
    Job execution onMRv2 7. MRAppMaster grabs required resource from HDFS copied there in step 3 8. MRAppMaster negotiates with Resource Manager for available resources; Resource Manager will select Node Manager that has the most resources 9. MRAppMaster tells selected NodeManager to start Map and Reduce tasks 10.NodeManager creates YarnChild containers that will coordinate and run tasks 11.YarnChild acquires job resources from HDFS that will be required to execute Map and Reduce tasks 12.YarnChild executes Map and Reduce tasks ©2014 Zaloni, Inc. All Rights Reserved.
  • 23.
    YARN – ResourceAllocation & Usage ResourceRequest –Fine-grained resource ask to the ResourceManager –Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack –Use special value of * for resource name for any machine © Hortonworks Inc. 2013 ResourceRequest priority resourceName capability numContainers ©2014 Zaloni, Inc. All Rights Reserved.
  • 24.
    YARN – ResourceAllocation & Usage ResourceRequest priority capability resourceName numContainers 0 <2gb, 1 core> © Hortonworks Inc. 2013 host01 1 rack0 1 * 1 1 <4gb, 1 core> * 1 ©2014 Zaloni, Inc. All Rights Reserved.
  • 25.
    YARN – ResourceAllocation & Usage Container –The basic unit of allocation in YARN –The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster –A specific amount of resources (cpu, memory etc.) on a specific machine Container © Hortonworks Inc. 2013 containerId resourceName capability tokens ©2014 Zaloni, Inc. All Rights Reserved.
  • 26.
    YARN – ResourceAllocation & Usage ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext © Hortonworks Inc. 2013 container commands environment localResources LocalResource uri type ©2014 Zaloni, Inc. All Rights Reserved.
  • 27.
    © Hortonworks Inc.2013 YARN - ApplicationMaster ApplicationMaster –ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client –ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue ©2014 Zaloni, Inc. All Rights Reserved.
  • 28.
    API in MRv2 ClientRMProtocol (Client—RM ) : This is the protocol for a client to communicate with the RM to launch a new application (i.e. an AM), check on the status of the application or kill the application. AMRMProtocol (AM—RM) : This is the protocol used by the AM to register/unregister itself with the RM, as well as to request resources from the RM Scheduler to run its tasks. ContainerManager (AM – NM) : This is the protocol used by the AM to communicate with the NM to start or stop containers and to get status updates on its containers. ©2014 Zaloni, Inc. All Rights Reserved.
  • 29.
    ResourceManager YARN ApplicationAPI – ClientRM NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Client2 Application Request: YarnClient.createApplication Submit Application:YarnClient.submitApplication 1 2 Scheduler ©2014 Zaloni, Inc. All Rights Reserved.
  • 30.
    YARN Application API– AMRM ResourceManager AMRMClient.allocate Container Scheduler NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager AM unregisterApplicationMaster registerApplicationMaster 4 1 2 3 NodeManager NodeManager NodeManager NodeManager ©2014 Zaloni, Inc. All Rights Reserved.
  • 31.
    YARN Application API– ContainerManager ResourceManager Scheduler NodeManager NodeManager NodeManager Container 1.1 AMNMClient.startContainer NodeManager NodeManager NodeManager AM 1 AMNMClient.getContainerStatus NodeManager NodeManager NodeManager NodeManager ©2014 Zaloni, Inc. All Rights Reserved.
  • 32.
    Gain with NewArchitecture • RM and Job manager segregated • The Hadoop MapReduce JobTracker spends a very significant portion of time and effort managing the life cycle of applications • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 33.
    Gain with NewArchitecture • ResourceManage – Uses ZooKeeper for fail-over. – When primary fails, secondary can quickly start using the state stored in ZK • Application Master – MapReduce ApplicationMaster can recover from failures by restoring itself with the help of checkpoint. • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 34.
    Gain with NewArchitecture • MRv2 uses wire-compatible protocols to allow different versions of servers and clients to communicate with each other. • Rolling upgrades for the cluster in future. • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 35.
    Gain with NewArchitecture • New framework is generic. – Different versions of MR running in parallel – End users can upgrade to MR versions on their own schedule • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 36.
    Gain with NewArchitecture • MRv2 uses a general concept of a resource for scheduling and allocating to individual applications. • Better cluster utilization • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 37.
    Gain with NewArchitecture Store all data in one place and use them for more than one application • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 38.
    Failures in YARN For MapReduce programs running on YARN, we need to consider the failure of any of the following entities: the task, the application master, the node manager, and the resource manager. • Task fail :Failure of the running task is similar to the classic case. Runtime exceptions , sudden exits of the JVM ,timed out tasks are marked as failed. a task is marked as failed after four attempts (set by mapreduce.map .maxattempts for map tasks and mapreduce.reduce.maxsttempts for reducer tasks). ©2014 Zaloni, Inc. All Rights Reserved.
  • 39.
    Failures in YARN • NodeManager fail: If a node manager fails, then it will stop sending heartbeats to the resource manager, and the node manager will be removed from the resource manager’s pool of available nodes. • Node managers may be blacklisted if the number of failures for the application is high. Blacklisting is done by the application master, and for MapReduce the application master will try to reschedule tasks on different nodes if more than three tasks fail on a node manager. The threshold may be set with mapreduce.job.maxtaskfai lures.per.tracker. ©2014 Zaloni, Inc. All Rights Reserved.
  • 40.
    Failures in YARN • Application master fail: An application master sends periodic heartbeats to the resource manager, and in the event of application master failure, the resource manager will detect the failure and start a new instance of the master running in a new container (managed by a node manager) • ResourceManager fail : Most critical as this failure can shut down the whole process. Eliminated by checkpoints, or standby node (HA ). ©2014 Zaloni, Inc. All Rights Reserved.