SlideShare a Scribd company logo
1 of 40
Yarn and MapReduce v2 
©2014 Zaloni, Inc. All Rights Reserved.
Agenda 
 What is YARN? 
 Why YARN ? 
 Components of YARN 
 Architecture 
 API in MRv2 
 Gain with MRv2 
 Failure in MRv2 
©2014 Zaloni, Inc. All Rights Reserved.
So What YARN is really 
??? 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
YARN – (Yet another resource negotiator) 
And is responsible for 
•Cluster resource management 
•Scheduling 
Various applications may run on YARN- MapReduce is just a 
choice. 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
HADOOP 1.0 
MapReduce 
(cluster resource management 
& data processing) 
HDFS 
(redundant, reliable storage) 
HADOOP 2.0 
YARN 
MapReduce 
(data processing) 
Others 
(data processing) 
(cluster resource management) 
HDFS2 
(redundant, reliable storage) 
Single Use System 
Batch Apps 
Multi Purpose Platform 
Batch, Interactive, Online, Streaming, … 
©2014 Zaloni, Inc. All Rights Reserved.
YARN INTRODUCTION 
Store ALL DATA in one place…Interact with 
Applications Run Natively IN Hadoop 
YARN (Cluster Resource Management) 
HDFS2 (Redundant, Reliable Storage) 
©2014 Zaloni, Inc. All Rights Reserved. 
BATCH 
(MapReduce) 
INTERACTIVE 
(Tez) 
STREAMING 
(Storm, S4,…) 
GRAPH 
(Giraph) 
IN-MEMORY 
(Spark) 
HPC MPI 
(OpenMPI) 
ONLINE 
(HBase) 
OTHER 
(Search) 
(Weave…) 
that data in MULTIPLE WAYS
Why YARN and MRv2 ??? 
©2014 Zaloni, Inc. All Rights Reserved.
Hadoop MapReduce Classic 
©2014 Zaloni, Inc. All Rights Reserved. 
JobTracker 
Manages cluster resources 
and job scheduling 
TaskTracker 
Per-node agent 
Manage tasks
MRv1 Limitations 
• Scalability – JT limits horizontal scaling 
• Cluster utilization- Fixed sized slots degrade the cluster 
utilization 
• Availability – when JT dies, jobs must restart 
• Upgradability – must stop jobs to upgrade JT 
• Hardwired – JT only supports MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
ResourceManager 
©2014 Zaloni, Inc. All Rights Reserved. 
JobTracker 
ApplicationMaster
Architecture 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Components 
So, What was Developed 
• Resource Manager 
• Node Manager 
• Application Master 
• Container 
©2014 Zaloni, Inc. All Rights Reserved.
• Manages the global assignment of 
compute resources to applications. 
• A pure Scheduler 
• No monitoring, tracking status of 
application 
Resource Manager (RM) 
©2014 Zaloni, Inc. All Rights Reserved.
• Each client/application may 
request multiple resources 
– Memory 
– Network 
– Cpu 
– Disk .. 
• This is a significant change from 
static Mapper / Reducer model 
Resource Manager (RM) 
©2014 Zaloni, Inc. All Rights Reserved.
Application Master 
• A per – application ApplicationMaster 
(AM) that manages the application’s life 
cycle (scheduling and coordination). 
• An application is either a single job in the 
classic MapReduce jobs or a DAG of such 
jobs. 
©2014 Zaloni, Inc. All Rights Reserved.
• Application Master has 
the responsibility of 
– negotiating appropriate resource 
containers from the Scheduler 
– launching tasks 
– tracking their status 
– monitoring for progress 
– handling task-failures. 
©2014 Zaloni, Inc. All Rights Reserved. 
Application Master
NodeManager : per-machine 
framework agent 
– responsible for launching the 
applications’ containers, 
monitoring their resource usage 
(cpu, memory, disk, network) 
and reporting the same to the 
Scheduler. 
©2014 Zaloni, Inc. All Rights Reserved. 
Node Manager
– Basic unit of allocation monitoring their 
resource usage (cpu, memory, disk, 
network) 
– Fine-grained resource allocation across 
multiple resource types (memory, cpu, 
disk, network, gpu etc.) 
– Replaces the fixed map/reduce slots 
©2014 Zaloni, Inc. All Rights Reserved. 
Container
Lifecycle of a job 
Here you are 
Do work! 
I need resources! 
Client 
Resource 
Manager 
App 
Master 
Submit 
OK 
Done? 
No 
Done? 
No 
Done? 
Yes 
Go 
Containers 
Done 
Done 
Node 
Managers 
Start containers 
Here you are 
©2014 Zaloni, Inc. All Rights Reserved.
Lifecycle of a job 
©2014 Zaloni, Inc. All Rights Reserved.
Job execution on MRv2 
1. Client submits MapReduce job by interacting with Job objects; 
2. Job’s code interacts with Resource Manager to acquire application meta-data, 
such as application id 
3. Job’s code moves all the job related resources to HDFS to make them 
available for the rest of the job 
4. Job’s code submits the application to Resource Manager 
5. Resource Manager chooses a Node Manager with available resources and 
requests a container for MRAppMaster 
6. Node Manager allocates container for MRAppMaster; MRAppMaster will 
execute and coordinate MapReduce job 
©2014 Zaloni, Inc. All Rights Reserved.
Job execution on MRv2 
7. MRAppMaster grabs required resource from HDFS copied there in step 3 
8. MRAppMaster negotiates with Resource Manager for available resources; 
Resource Manager will select Node Manager that has the most resources 
9. MRAppMaster tells selected NodeManager to start Map and Reduce tasks 
10.NodeManager creates YarnChild containers that will coordinate and run tasks 
11.YarnChild acquires job resources from HDFS that will be required to execute 
Map and Reduce tasks 
12.YarnChild executes Map and Reduce tasks 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ResourceRequest 
–Fine-grained resource ask to the ResourceManager 
–Ask for a specific amount of resources (memory, cpu etc.) on a 
specific machine or rack 
–Use special value of * for resource name for any machine 
© Hortonworks Inc. 2013 
ResourceRequest 
priority 
resourceName 
capability 
numContainers 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ResourceRequest 
priority capability resourceName numContainers 
0 <2gb, 1 core> 
© Hortonworks Inc. 2013 
host01 1 
rack0 1 
* 1 
1 <4gb, 1 core> * 1 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
Container 
–The basic unit of allocation in YARN 
–The result of the ResourceRequest provided by 
ResourceManager to the ApplicationMaster 
–A specific amount of resources (cpu, memory etc.) on a specific 
machine Container 
© Hortonworks Inc. 2013 
containerId 
resourceName 
capability 
tokens 
©2014 Zaloni, Inc. All Rights Reserved.
YARN – Resource Allocation & Usage 
ContainerLaunchContext 
– The context provided by ApplicationMaster to NodeManager to launch the 
Container 
– Complete specification for a process 
– LocalResource used to specify container binary and dependencies 
– NodeManager responsible for downloading from shared namespace 
(typically HDFS) 
ContainerLaunchContext 
© Hortonworks Inc. 2013 
container 
commands 
environment 
localResources 
LocalResource 
uri 
type 
©2014 Zaloni, Inc. All Rights Reserved.
© Hortonworks Inc. 2013 
YARN - ApplicationMaster 
ApplicationMaster 
–ApplicationSubmissionContext is the complete 
specification of the ApplicationMaster, provided by Client 
–ResourceManager responsible for allocating and launching 
ApplicationMaster container 
ApplicationSubmissionContext 
resourceRequest 
containerLaunchContext 
appName 
queue 
©2014 Zaloni, Inc. All Rights Reserved.
API in MRv2 
ClientRMProtocol (Client—RM ) : 
This is the protocol for a client to communicate with the RM to launch a 
new application (i.e. an AM), check on the status of the application or kill 
the application. 
AMRMProtocol (AM—RM) : 
This is the protocol used by the AM to register/unregister itself with the 
RM, as well as to request resources from the RM Scheduler to run its 
tasks. 
ContainerManager (AM – NM) : 
This is the protocol used by the AM to communicate with the NM to start 
or stop containers and to get status updates on its containers. 
©2014 Zaloni, Inc. All Rights Reserved.
ResourceManager 
YARN Application API – ClientRM 
NodeManager NodeManager NodeManager NodeManager 
NodeManager NodeManager NodeManager 
NodeManager 
Client2 
Application Request: 
YarnClient.createApplication 
Submit 
Application:YarnClient.submitApplication 
1 
2 
Scheduler 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Application API – AMRM 
ResourceManager 
AMRMClient.allocate 
Container 
Scheduler 
NodeManager NodeManager NodeManager NodeManager 
NodeManager NodeManager NodeManager 
AM 
unregisterApplicationMaster 
registerApplicationMaster 
4 
1 
2 
3 
NodeManager NodeManager NodeManager NodeManager 
©2014 Zaloni, Inc. All Rights Reserved.
YARN Application API – ContainerManager 
ResourceManager 
Scheduler 
NodeManager NodeManager NodeManager 
Container 1.1 
AMNMClient.startContainer 
NodeManager NodeManager NodeManager 
AM 1 
AMNMClient.getContainerStatus 
NodeManager NodeManager NodeManager NodeManager 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• RM and Job manager segregated 
• The Hadoop MapReduce 
JobTracker spends a very significant 
portion of time and effort managing 
the life cycle of applications 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• ResourceManage 
– Uses ZooKeeper for fail-over. 
– When primary fails, secondary can 
quickly start using the state stored in 
ZK 
• Application Master 
– MapReduce ApplicationMaster can 
recover from failures by restoring 
itself with the help of checkpoint. 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• MRv2 uses wire-compatible 
protocols to allow different 
versions of servers and clients to 
communicate with each other. 
• Rolling upgrades for the cluster in 
future. 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• New framework is generic. 
– Different versions of MR running in 
parallel 
– End users can upgrade to MR 
versions on their own schedule 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
• MRv2 uses a general concept of a 
resource for scheduling and allocating to 
individual applications. 
• Better cluster utilization 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Gain with New Architecture 
Store all data in one place and use 
them for more than one application 
• Scalability 
• Availability 
• Wire-compatibility 
• Innovation & Agility 
• Cluster Utilization 
• Support for programming 
paradigms other than 
MapReduce 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
For MapReduce programs running on YARN, we need to consider the 
failure of any of the following entities: the task, the application master, the 
node manager, and the resource manager. 
• Task fail :Failure of the running task is similar to the classic case. Runtime 
exceptions , sudden exits of the JVM ,timed out tasks are marked as failed. 
a task is marked as failed after four attempts (set by mapreduce.map 
.maxattempts for map tasks and mapreduce.reduce.maxsttempts for 
reducer tasks). 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
• NodeManager fail: If a node manager fails, then it will stop sending 
heartbeats to the resource manager, and the node manager will be 
removed from the resource manager’s pool of available nodes. 
• Node managers may be blacklisted if the number of failures for the 
application is high. Blacklisting is done by the application master, and for 
MapReduce the application master will try to reschedule tasks on different 
nodes if more than three tasks fail on a node manager. The threshold may 
be set with mapreduce.job.maxtaskfai lures.per.tracker. 
©2014 Zaloni, Inc. All Rights Reserved.
Failures in YARN 
• Application master fail: An application master sends periodic 
heartbeats to the resource manager, and in the event of application master 
failure, the resource manager will detect the failure and start a new instance 
of the master running in a new container (managed by a node manager) 
• ResourceManager fail : Most critical as this failure can shut down 
the whole process. Eliminated by checkpoints, or standby node (HA ). 
©2014 Zaloni, Inc. All Rights Reserved.

More Related Content

What's hot

Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Simplilearn
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopSavvycom Savvycom
 

What's hot (20)

Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Hadoop
HadoopHadoop
Hadoop
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Hadoop
Hadoop Hadoop
Hadoop
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
Introduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & HadoopIntroduction of Big data, NoSQL & Hadoop
Introduction of Big data, NoSQL & Hadoop
 

Similar to Yarn

Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPOmkar Joshi
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Big Data Joe™ Rossi
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...Zhijie Shen
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionBig Data Joe™ Rossi
 

Similar to Yarn (20)

Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
YARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOPYARN - way to share cluster BEYOND HADOOP
YARN - way to share cluster BEYOND HADOOP
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
 

Recently uploaded

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 

Recently uploaded (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Yarn

  • 1. Yarn and MapReduce v2 ©2014 Zaloni, Inc. All Rights Reserved.
  • 2. Agenda  What is YARN?  Why YARN ?  Components of YARN  Architecture  API in MRv2  Gain with MRv2  Failure in MRv2 ©2014 Zaloni, Inc. All Rights Reserved.
  • 3. So What YARN is really ??? ©2014 Zaloni, Inc. All Rights Reserved.
  • 4. YARN INTRODUCTION YARN – (Yet another resource negotiator) And is responsible for •Cluster resource management •Scheduling Various applications may run on YARN- MapReduce is just a choice. ©2014 Zaloni, Inc. All Rights Reserved.
  • 5. YARN INTRODUCTION HADOOP 1.0 MapReduce (cluster resource management & data processing) HDFS (redundant, reliable storage) HADOOP 2.0 YARN MapReduce (data processing) Others (data processing) (cluster resource management) HDFS2 (redundant, reliable storage) Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … ©2014 Zaloni, Inc. All Rights Reserved.
  • 6. YARN INTRODUCTION Store ALL DATA in one place…Interact with Applications Run Natively IN Hadoop YARN (Cluster Resource Management) HDFS2 (Redundant, Reliable Storage) ©2014 Zaloni, Inc. All Rights Reserved. BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) ONLINE (HBase) OTHER (Search) (Weave…) that data in MULTIPLE WAYS
  • 7. Why YARN and MRv2 ??? ©2014 Zaloni, Inc. All Rights Reserved.
  • 8. Hadoop MapReduce Classic ©2014 Zaloni, Inc. All Rights Reserved. JobTracker Manages cluster resources and job scheduling TaskTracker Per-node agent Manage tasks
  • 9. MRv1 Limitations • Scalability – JT limits horizontal scaling • Cluster utilization- Fixed sized slots degrade the cluster utilization • Availability – when JT dies, jobs must restart • Upgradability – must stop jobs to upgrade JT • Hardwired – JT only supports MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 10. ResourceManager ©2014 Zaloni, Inc. All Rights Reserved. JobTracker ApplicationMaster
  • 11. Architecture ©2014 Zaloni, Inc. All Rights Reserved.
  • 12. YARN Components So, What was Developed • Resource Manager • Node Manager • Application Master • Container ©2014 Zaloni, Inc. All Rights Reserved.
  • 13. • Manages the global assignment of compute resources to applications. • A pure Scheduler • No monitoring, tracking status of application Resource Manager (RM) ©2014 Zaloni, Inc. All Rights Reserved.
  • 14. • Each client/application may request multiple resources – Memory – Network – Cpu – Disk .. • This is a significant change from static Mapper / Reducer model Resource Manager (RM) ©2014 Zaloni, Inc. All Rights Reserved.
  • 15. Application Master • A per – application ApplicationMaster (AM) that manages the application’s life cycle (scheduling and coordination). • An application is either a single job in the classic MapReduce jobs or a DAG of such jobs. ©2014 Zaloni, Inc. All Rights Reserved.
  • 16. • Application Master has the responsibility of – negotiating appropriate resource containers from the Scheduler – launching tasks – tracking their status – monitoring for progress – handling task-failures. ©2014 Zaloni, Inc. All Rights Reserved. Application Master
  • 17. NodeManager : per-machine framework agent – responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler. ©2014 Zaloni, Inc. All Rights Reserved. Node Manager
  • 18. – Basic unit of allocation monitoring their resource usage (cpu, memory, disk, network) – Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – Replaces the fixed map/reduce slots ©2014 Zaloni, Inc. All Rights Reserved. Container
  • 19. Lifecycle of a job Here you are Do work! I need resources! Client Resource Manager App Master Submit OK Done? No Done? No Done? Yes Go Containers Done Done Node Managers Start containers Here you are ©2014 Zaloni, Inc. All Rights Reserved.
  • 20. Lifecycle of a job ©2014 Zaloni, Inc. All Rights Reserved.
  • 21. Job execution on MRv2 1. Client submits MapReduce job by interacting with Job objects; 2. Job’s code interacts with Resource Manager to acquire application meta-data, such as application id 3. Job’s code moves all the job related resources to HDFS to make them available for the rest of the job 4. Job’s code submits the application to Resource Manager 5. Resource Manager chooses a Node Manager with available resources and requests a container for MRAppMaster 6. Node Manager allocates container for MRAppMaster; MRAppMaster will execute and coordinate MapReduce job ©2014 Zaloni, Inc. All Rights Reserved.
  • 22. Job execution on MRv2 7. MRAppMaster grabs required resource from HDFS copied there in step 3 8. MRAppMaster negotiates with Resource Manager for available resources; Resource Manager will select Node Manager that has the most resources 9. MRAppMaster tells selected NodeManager to start Map and Reduce tasks 10.NodeManager creates YarnChild containers that will coordinate and run tasks 11.YarnChild acquires job resources from HDFS that will be required to execute Map and Reduce tasks 12.YarnChild executes Map and Reduce tasks ©2014 Zaloni, Inc. All Rights Reserved.
  • 23. YARN – Resource Allocation & Usage ResourceRequest –Fine-grained resource ask to the ResourceManager –Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack –Use special value of * for resource name for any machine © Hortonworks Inc. 2013 ResourceRequest priority resourceName capability numContainers ©2014 Zaloni, Inc. All Rights Reserved.
  • 24. YARN – Resource Allocation & Usage ResourceRequest priority capability resourceName numContainers 0 <2gb, 1 core> © Hortonworks Inc. 2013 host01 1 rack0 1 * 1 1 <4gb, 1 core> * 1 ©2014 Zaloni, Inc. All Rights Reserved.
  • 25. YARN – Resource Allocation & Usage Container –The basic unit of allocation in YARN –The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster –A specific amount of resources (cpu, memory etc.) on a specific machine Container © Hortonworks Inc. 2013 containerId resourceName capability tokens ©2014 Zaloni, Inc. All Rights Reserved.
  • 26. YARN – Resource Allocation & Usage ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext © Hortonworks Inc. 2013 container commands environment localResources LocalResource uri type ©2014 Zaloni, Inc. All Rights Reserved.
  • 27. © Hortonworks Inc. 2013 YARN - ApplicationMaster ApplicationMaster –ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client –ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue ©2014 Zaloni, Inc. All Rights Reserved.
  • 28. API in MRv2 ClientRMProtocol (Client—RM ) : This is the protocol for a client to communicate with the RM to launch a new application (i.e. an AM), check on the status of the application or kill the application. AMRMProtocol (AM—RM) : This is the protocol used by the AM to register/unregister itself with the RM, as well as to request resources from the RM Scheduler to run its tasks. ContainerManager (AM – NM) : This is the protocol used by the AM to communicate with the NM to start or stop containers and to get status updates on its containers. ©2014 Zaloni, Inc. All Rights Reserved.
  • 29. ResourceManager YARN Application API – ClientRM NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Client2 Application Request: YarnClient.createApplication Submit Application:YarnClient.submitApplication 1 2 Scheduler ©2014 Zaloni, Inc. All Rights Reserved.
  • 30. YARN Application API – AMRM ResourceManager AMRMClient.allocate Container Scheduler NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager AM unregisterApplicationMaster registerApplicationMaster 4 1 2 3 NodeManager NodeManager NodeManager NodeManager ©2014 Zaloni, Inc. All Rights Reserved.
  • 31. YARN Application API – ContainerManager ResourceManager Scheduler NodeManager NodeManager NodeManager Container 1.1 AMNMClient.startContainer NodeManager NodeManager NodeManager AM 1 AMNMClient.getContainerStatus NodeManager NodeManager NodeManager NodeManager ©2014 Zaloni, Inc. All Rights Reserved.
  • 32. Gain with New Architecture • RM and Job manager segregated • The Hadoop MapReduce JobTracker spends a very significant portion of time and effort managing the life cycle of applications • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 33. Gain with New Architecture • ResourceManage – Uses ZooKeeper for fail-over. – When primary fails, secondary can quickly start using the state stored in ZK • Application Master – MapReduce ApplicationMaster can recover from failures by restoring itself with the help of checkpoint. • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 34. Gain with New Architecture • MRv2 uses wire-compatible protocols to allow different versions of servers and clients to communicate with each other. • Rolling upgrades for the cluster in future. • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 35. Gain with New Architecture • New framework is generic. – Different versions of MR running in parallel – End users can upgrade to MR versions on their own schedule • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 36. Gain with New Architecture • MRv2 uses a general concept of a resource for scheduling and allocating to individual applications. • Better cluster utilization • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 37. Gain with New Architecture Store all data in one place and use them for more than one application • Scalability • Availability • Wire-compatibility • Innovation & Agility • Cluster Utilization • Support for programming paradigms other than MapReduce ©2014 Zaloni, Inc. All Rights Reserved.
  • 38. Failures in YARN For MapReduce programs running on YARN, we need to consider the failure of any of the following entities: the task, the application master, the node manager, and the resource manager. • Task fail :Failure of the running task is similar to the classic case. Runtime exceptions , sudden exits of the JVM ,timed out tasks are marked as failed. a task is marked as failed after four attempts (set by mapreduce.map .maxattempts for map tasks and mapreduce.reduce.maxsttempts for reducer tasks). ©2014 Zaloni, Inc. All Rights Reserved.
  • 39. Failures in YARN • NodeManager fail: If a node manager fails, then it will stop sending heartbeats to the resource manager, and the node manager will be removed from the resource manager’s pool of available nodes. • Node managers may be blacklisted if the number of failures for the application is high. Blacklisting is done by the application master, and for MapReduce the application master will try to reschedule tasks on different nodes if more than three tasks fail on a node manager. The threshold may be set with mapreduce.job.maxtaskfai lures.per.tracker. ©2014 Zaloni, Inc. All Rights Reserved.
  • 40. Failures in YARN • Application master fail: An application master sends periodic heartbeats to the resource manager, and in the event of application master failure, the resource manager will detect the failure and start a new instance of the master running in a new container (managed by a node manager) • ResourceManager fail : Most critical as this failure can shut down the whole process. Eliminated by checkpoints, or standby node (HA ). ©2014 Zaloni, Inc. All Rights Reserved.