SlideShare a Scribd company logo
INTRODUCTION TO YARN
Submitted by,
B.Nandhitha
 Now that I have enlightened you with the need for
YARN, let me introduce you to the core component of
Hadoop v2.0 YARN.
 YARN allows different data processing methods
like graph processing, interactive processing, stream
processing as well as batch processing to run and
process data stored in HDFS.
 Therefore YARN opens up Hadoop to other types
of distributed applications beyond MapReduce.
 YARN enabled the users to perform operations as
per requirement by using a variety of tools
like Spark for real-time processing, Hive for
SQL, HBase for NoSQL and others.
COMPONENTS TO YARN
first component
Resource Manager
 It is the ultimate authority in resource allocation.
 On receiving the processing requests, it passes parts of
requests to corresponding node managers accordingly, where
the actual processing takes place.
 It is the arbitrator of the cluster resources and decides the
allocation of the available resources for competing
applications.
 Optimizes the cluster utilization like keeping all resources in
use all the time against various constraints such as capacity
guarantees, fairness, and SLAs.
 It has two major components a) Scheduler b) Application
Manager
a) Scheduler
 The scheduler is responsible for allocating resources to the
various running applications subject to constraints of
capacities, queues etc.
 It is called a pure scheduler in ResourceManager, which means
that it does not perform any monitoring or tracking of status
for the applications.
 If there is an application failure or hardware failure, the
Scheduler does not guarantee to restart the failed tasks.
 Performs scheduling based on the resource requirements of the
applications.
 It has a pluggable policy plug-in, which is responsible for
partitioning the cluster resources among the various
applications.
b) Application Manager
 It is responsible for accepting job submissions.
 Negotiates the first container from the Resource Manager for
executing the application specific Application Master.
 Manages running the Application Masters in a cluster and
provides service for restarting the Application Master
container on failure.
second component
 Node Manager
 It takes care of individual nodes in a Hadoop cluster
and manages user jobs and workflow on the given node.
 It registers with the Resource Manager and sends heartbeats
with the health status of the node.
 Its primary goal is to manage application containers assigned
to it by the resource manager.
 It keeps up-to-date with the Resource Manager.
 Application Master requests the assigned
container from the Node Manager by sending it a
Container Launch Context(CLC) which includes
everything the application needs in order to run.
 The Node Manager creates the requested
container process and starts it.
 Monitors resource usage (memory, CPU) of
individual containers.
 Performs Log management.
 It also kills the container as directed by the
Resource Manager.
third component
Application Master
 An application is a single job submitted to the framework.
Each such application has a unique Application Master
associated with it which is a framework specific entity.
 It is the process that coordinates an application’s execution in
the cluster and also manages faults.
 Its task is to negotiate resources from the Resource Manager
and work with the Node Manager to execute and monitor the
component tasks.
 It is responsible for negotiating appropriate resource containers
from the ResourceManager, tracking their status and
monitoring progress.
 Once started, it periodically sends heartbeats to the Resource
Manager to affirm its health and to update the record of its
resource demands.
fourth component
 Container
 It is a collection of physical resources such as RAM, CPU
cores, and disks on a single node.
 YARN containers are managed by a container launch context
which is container life-cycle(CLC).
 This record contains a map of environment variables,
dependencies stored in a remotely accessible storage, security
tokens, payload for Node Manager services and the command
necessary to create the process.
 It grants rights to an application to use a specific amount of
resources (memory, CPU ) on a specific host.
The first challenge is storing Big data
 HDFS provides a distributed way to store Big data. Your data
is stored in blocks across the DataNodes and you can specify
the size of blocks
 Basically, if you have 512MB of data and you have
configured HDFS such that, it will create 128 MB of data
blocks.
 So HDFS will divide data into 4 blocks as 512/128=4 and
store it across different DataNodes, it will also replicate the
data blocks on different DataNodes.
 Now, as we are using commodity hardware, hence storing is
not a challenge.
 It also solves the scaling problem.
 It focuses on horizontal scaling instead of vertical
scaling.
 You can always add some extra data nodes to HDFS
cluster as and when required, instead of scaling up the
resources of your DataNodes.
 Let summarize it for you basically for storing 1 TB
of data, you don’t need a 1TB system.
 You can instead do it on multiple 128GB systems or
even less.
Next challenge was storing the variety of data
 With HDFS you can store all kinds of data whether it is
structured, semi-structured or unstructured.
 HDFS, there is no pre-dumping schema validation. And it also
follows write once and read many model.
 Due to this, you can just write the data once and you can read
it multiple times for finding insights.
Third challenge was accessing & processing the data
faster
 this is one of the major challenges with Big Data. In order to
solve it, we move processing to data and not data to
processing.
 Instead of moving data to the master node and then
processing it.
 In MapReduce, the processing logic is sent to the various
slave nodes & then data is processed parallely across different
slave nodes.
 Then the processed results are sent to the master node where
the results is merged and the response is sent back to the
client.
 In YARN architecture, we have ResourceManager
and NodeManager.
 ResourceManager might or might not be
configured on the same machine as NameNode.
 But NodeManagers should be configured on the
same machine where DataNodes are present.
THANKYOU..!!!

More Related Content

What's hot

hive lab
hive labhive lab
hive lab
marwa baich
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
David Tjahjono,MD,MBA(UK)
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Discover Database
Discover DatabaseDiscover Database
Discover Database
Wayne Weixin
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET Journal
 
Hadoop
HadoopHadoop
Hadoop
Ankit Prasad
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
Infinity Tech Solutions
 
Discover database
Discover databaseDiscover database
Discover database
Wayne Weixin
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
Omnia Safaan
 
First review presentation
First review presentationFirst review presentation
First review presentation
Arvind Krishnaa
 
Hadoop
HadoopHadoop
Hadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of HadoopHadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of Hadoop
Dr Neelesh Jain
 

What's hot (17)

hive lab
hive labhive lab
hive lab
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Discover Database
Discover DatabaseDiscover Database
Discover Database
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Hadoop
HadoopHadoop
Hadoop
 
paper
paperpaper
paper
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
 
Discover database
Discover databaseDiscover database
Discover database
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 
First review presentation
First review presentationFirst review presentation
First review presentation
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of HadoopHadoop, Evolution of Hadoop, Features of Hadoop
Hadoop, Evolution of Hadoop, Features of Hadoop
 

Similar to Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours college for women

Apache hadoop overview
Apache hadoop overviewApache hadoop overview
Apache hadoop overview
Devi kala
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
Umair Shafique
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
rishavkumar1402
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
ssuser8c3ea7
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
Rupak Roy
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
Santosh Nage
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
Aishwarya Saseendran
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
NPN Training
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
avenkatram
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
BlibBlobb
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
Xuan-Chao Huang
 
G017143640
G017143640G017143640
G017143640
IOSR Journals
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
IOSR Journals
 

Similar to Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours college for women (20)

Apache hadoop overview
Apache hadoop overviewApache hadoop overview
Apache hadoop overview
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Big Data Analytics With Hadoop
Big Data Analytics With HadoopBig Data Analytics With Hadoop
Big Data Analytics With Hadoop
 
hadoop
hadoophadoop
hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
MOD-2 presentation on engineering students
MOD-2 presentation on engineering studentsMOD-2 presentation on engineering students
MOD-2 presentation on engineering students
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATADATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
DATA MINING FRAMEWORK TO ANALYZE ROAD ACCIDENT DATA
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Google Data Engineering.pdf
Google Data Engineering.pdfGoogle Data Engineering.pdf
Google Data Engineering.pdf
 
Data Engineering on GCP
Data Engineering on GCPData Engineering on GCP
Data Engineering on GCP
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
G017143640
G017143640G017143640
G017143640
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 

Recently uploaded

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 

Recently uploaded (20)

Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 

Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours college for women

  • 2.  Now that I have enlightened you with the need for YARN, let me introduce you to the core component of Hadoop v2.0 YARN.  YARN allows different data processing methods like graph processing, interactive processing, stream processing as well as batch processing to run and process data stored in HDFS.  Therefore YARN opens up Hadoop to other types of distributed applications beyond MapReduce.  YARN enabled the users to perform operations as per requirement by using a variety of tools like Spark for real-time processing, Hive for SQL, HBase for NoSQL and others.
  • 3.
  • 5. first component Resource Manager  It is the ultimate authority in resource allocation.  On receiving the processing requests, it passes parts of requests to corresponding node managers accordingly, where the actual processing takes place.  It is the arbitrator of the cluster resources and decides the allocation of the available resources for competing applications.  Optimizes the cluster utilization like keeping all resources in use all the time against various constraints such as capacity guarantees, fairness, and SLAs.  It has two major components a) Scheduler b) Application Manager
  • 6. a) Scheduler  The scheduler is responsible for allocating resources to the various running applications subject to constraints of capacities, queues etc.  It is called a pure scheduler in ResourceManager, which means that it does not perform any monitoring or tracking of status for the applications.  If there is an application failure or hardware failure, the Scheduler does not guarantee to restart the failed tasks.  Performs scheduling based on the resource requirements of the applications.  It has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various applications.
  • 7. b) Application Manager  It is responsible for accepting job submissions.  Negotiates the first container from the Resource Manager for executing the application specific Application Master.  Manages running the Application Masters in a cluster and provides service for restarting the Application Master container on failure.
  • 8. second component  Node Manager  It takes care of individual nodes in a Hadoop cluster and manages user jobs and workflow on the given node.  It registers with the Resource Manager and sends heartbeats with the health status of the node.  Its primary goal is to manage application containers assigned to it by the resource manager.  It keeps up-to-date with the Resource Manager.
  • 9.  Application Master requests the assigned container from the Node Manager by sending it a Container Launch Context(CLC) which includes everything the application needs in order to run.  The Node Manager creates the requested container process and starts it.  Monitors resource usage (memory, CPU) of individual containers.  Performs Log management.  It also kills the container as directed by the Resource Manager.
  • 10. third component Application Master  An application is a single job submitted to the framework. Each such application has a unique Application Master associated with it which is a framework specific entity.  It is the process that coordinates an application’s execution in the cluster and also manages faults.  Its task is to negotiate resources from the Resource Manager and work with the Node Manager to execute and monitor the component tasks.  It is responsible for negotiating appropriate resource containers from the ResourceManager, tracking their status and monitoring progress.  Once started, it periodically sends heartbeats to the Resource Manager to affirm its health and to update the record of its resource demands.
  • 11. fourth component  Container  It is a collection of physical resources such as RAM, CPU cores, and disks on a single node.  YARN containers are managed by a container launch context which is container life-cycle(CLC).  This record contains a map of environment variables, dependencies stored in a remotely accessible storage, security tokens, payload for Node Manager services and the command necessary to create the process.  It grants rights to an application to use a specific amount of resources (memory, CPU ) on a specific host.
  • 12. The first challenge is storing Big data  HDFS provides a distributed way to store Big data. Your data is stored in blocks across the DataNodes and you can specify the size of blocks  Basically, if you have 512MB of data and you have configured HDFS such that, it will create 128 MB of data blocks.  So HDFS will divide data into 4 blocks as 512/128=4 and store it across different DataNodes, it will also replicate the data blocks on different DataNodes.  Now, as we are using commodity hardware, hence storing is not a challenge.
  • 13.  It also solves the scaling problem.  It focuses on horizontal scaling instead of vertical scaling.  You can always add some extra data nodes to HDFS cluster as and when required, instead of scaling up the resources of your DataNodes.  Let summarize it for you basically for storing 1 TB of data, you don’t need a 1TB system.  You can instead do it on multiple 128GB systems or even less.
  • 14. Next challenge was storing the variety of data  With HDFS you can store all kinds of data whether it is structured, semi-structured or unstructured.  HDFS, there is no pre-dumping schema validation. And it also follows write once and read many model.  Due to this, you can just write the data once and you can read it multiple times for finding insights.
  • 15. Third challenge was accessing & processing the data faster  this is one of the major challenges with Big Data. In order to solve it, we move processing to data and not data to processing.  Instead of moving data to the master node and then processing it.  In MapReduce, the processing logic is sent to the various slave nodes & then data is processed parallely across different slave nodes.  Then the processed results are sent to the master node where the results is merged and the response is sent back to the client.
  • 16.  In YARN architecture, we have ResourceManager and NodeManager.  ResourceManager might or might not be configured on the same machine as NameNode.  But NodeManagers should be configured on the same machine where DataNodes are present.