SlideShare a Scribd company logo
RESOURCE
MANAGEMENT
IN HADOOP
1
Session Objectives
INTRODUCTION TO BIG
DATA AND HADOOP
UNDERSTANDING HADOOP
2.0 AND ITS FEATURES
UNDERSTANDING YARN
2
Introduction to Big Data and Hadoop
Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.
Systems/Enterprises generate huge amount of data from terabytes to
even petabytes/zettabytes of information.
It’s very difficult to manage such huge data…
3
HADOOP

BIG DATA
&
Big Data and its challenges
Challenges of processing Big Data are 3 V’s.
4
VOLUME VELOCITY VARIETY
Modern systems have
Much more data.
- Terabytes + a day.
- Petabytes + total
We need a new
approach.
To Process such a huge
volume of data within a
specified time period, We
need a new approach .
We have to process different
sorts of data such as
Structured, Semi-structured,
and Unstructured data. We
need a new approach.
What is Hadoop ?
Apache Hadoop is a framework that allows the
distributed processing of large data sets across
clusters of commodity computers using a
simple programming model.
It is an open-source data management
technology with scale-out storage and
distributed processing.
5
Hadoop
Ecosystem
6
Background : Hadoop + HDFS
HDFS Distributed File System
NameNode
DataNode DataNode
Local File
System
Local File
System
 Every node contributes part of
its local file System to HDFS.
 Tasks can only depend on the
local file system
(JVM class path does not
understand HDFS Protocol)
7
Hadoop 1.x Architecture
8
YARN
9
Yet Another Resource Negotiator
YARN Application Resource Negotiator (Recursive
Acronym)
Remedies the scalability shortcomings of “classic”
MapReduce
Classic MapReduce has scalability issues around
4000 nodes and higher
Is more of a general-purpose framework of which
classic MapReduce is one application.
YARN Flow
YARN = YET ANOTHER RESOURCE NEGOTIATOR
Resource Manager
 Cluster-level Resource Manager
 Long Life, High-quality hardware
Node Manager
 One per Data Node
 Monitor resources on Data Node
Application Master
 One per Data Node
 Short Life
 Manages Task/Scheduling
10
YARN – How
It Works
Protocols :
1.) Client – RM: Submit the
App Master
2.) RM – NM: Start the App
Master
3.) AM – RM: Request +
Release containers
4.) RM – NM: Start tasks in
containers
YARN
Client
YARN
Resource Manager
Node Manager
Node Manager
Task
AM
Node Manager
Task
Task
Task Task
1.)
2.)
3.)
4.)
11
YARN Architectural
Overview
 Scalability – Clusters of 6000 –
10000 machines
 Each machine with 16 cores ,
48GB/96GB RAM, 24TB/36TB Hard
Disks.
 100,000 + Concurrent tasks
 10000 concurrent jobs
12
YARN Architectural
Overview(Contd..)
 Splits up the two major functions of JobTracker
 Global Resource Manager - Cluster resource
management
 Application Master - Job scheduling and
monitoring (one per application). The
Application Master negotiates resource
containers from the Scheduler, tracking their
status and monitoring for progress. Application
Master itself runs as a normal container.
 Tasktracker
 NodeManager (NM) - A new per-node slave is
responsible for launching the applications’
containers, monitoring their resource usage
(cpu, memory, disk, network) and reporting to
the Resource Manager.
 YARN maintains compatibility with existing
MapReduce applications and users.
13

More Related Content

Similar to HADOOP_2_0_YARN_Arch - Copy.pptx

Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 

Similar to HADOOP_2_0_YARN_Arch - Copy.pptx (20)

Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Big data
Big dataBig data
Big data
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 

Recently uploaded

678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
YibeltalNibretu
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
Avinash Rai
 

Recently uploaded (20)

The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Benefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational ResourcesBenefits and Challenges of Using Open Educational Resources
Benefits and Challenges of Using Open Educational Resources
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
NLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptxNLC-2024-Orientation-for-RO-SDO (1).pptx
NLC-2024-Orientation-for-RO-SDO (1).pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Accounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdfAccounting and finance exit exam 2016 E.C.pdf
Accounting and finance exit exam 2016 E.C.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 

HADOOP_2_0_YARN_Arch - Copy.pptx

  • 2. Session Objectives INTRODUCTION TO BIG DATA AND HADOOP UNDERSTANDING HADOOP 2.0 AND ITS FEATURES UNDERSTANDING YARN 2
  • 3. Introduction to Big Data and Hadoop Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Systems/Enterprises generate huge amount of data from terabytes to even petabytes/zettabytes of information. It’s very difficult to manage such huge data… 3 HADOOP  BIG DATA &
  • 4. Big Data and its challenges Challenges of processing Big Data are 3 V’s. 4 VOLUME VELOCITY VARIETY Modern systems have Much more data. - Terabytes + a day. - Petabytes + total We need a new approach. To Process such a huge volume of data within a specified time period, We need a new approach . We have to process different sorts of data such as Structured, Semi-structured, and Unstructured data. We need a new approach.
  • 5. What is Hadoop ? Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is an open-source data management technology with scale-out storage and distributed processing. 5
  • 7. Background : Hadoop + HDFS HDFS Distributed File System NameNode DataNode DataNode Local File System Local File System  Every node contributes part of its local file System to HDFS.  Tasks can only depend on the local file system (JVM class path does not understand HDFS Protocol) 7
  • 9. YARN 9 Yet Another Resource Negotiator YARN Application Resource Negotiator (Recursive Acronym) Remedies the scalability shortcomings of “classic” MapReduce Classic MapReduce has scalability issues around 4000 nodes and higher Is more of a general-purpose framework of which classic MapReduce is one application.
  • 10. YARN Flow YARN = YET ANOTHER RESOURCE NEGOTIATOR Resource Manager  Cluster-level Resource Manager  Long Life, High-quality hardware Node Manager  One per Data Node  Monitor resources on Data Node Application Master  One per Data Node  Short Life  Manages Task/Scheduling 10
  • 11. YARN – How It Works Protocols : 1.) Client – RM: Submit the App Master 2.) RM – NM: Start the App Master 3.) AM – RM: Request + Release containers 4.) RM – NM: Start tasks in containers YARN Client YARN Resource Manager Node Manager Node Manager Task AM Node Manager Task Task Task Task 1.) 2.) 3.) 4.) 11
  • 12. YARN Architectural Overview  Scalability – Clusters of 6000 – 10000 machines  Each machine with 16 cores , 48GB/96GB RAM, 24TB/36TB Hard Disks.  100,000 + Concurrent tasks  10000 concurrent jobs 12
  • 13. YARN Architectural Overview(Contd..)  Splits up the two major functions of JobTracker  Global Resource Manager - Cluster resource management  Application Master - Job scheduling and monitoring (one per application). The Application Master negotiates resource containers from the Scheduler, tracking their status and monitoring for progress. Application Master itself runs as a normal container.  Tasktracker  NodeManager (NM) - A new per-node slave is responsible for launching the applications’ containers, monitoring their resource usage (cpu, memory, disk, network) and reporting to the Resource Manager.  YARN maintains compatibility with existing MapReduce applications and users. 13