HADOOP_2_0_YARN_Arch - Copy.pptx

RESOURCE
MANAGEMENT
IN HADOOP
1

Session Objectives
INTRODUCTION TO BIG
DATA AND HADOOP
UNDERSTANDING HADOOP
2.0 AND ITS FEATURES
UNDERSTANDING YARN
2

Introduction to Big Data and Hadoop
Big data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database
management tools or traditional data processing applications.
Systems/Enterprises generate huge amount of data from terabytes to
even petabytes/zettabytes of information.
It’s very difficult to manage such huge data…
3
HADOOP

BIG DATA
&

Big Data and its challenges
Challenges of processing Big Data are 3 V’s.
4
VOLUME VELOCITY VARIETY
Modern systems have
Much more data.
- Terabytes + a day.
- Petabytes + total
We need a new
approach.
To Process such a huge
volume of data within a
specified time period, We
need a new approach .
We have to process different
sorts of data such as
Structured, Semi-structured,
and Unstructured data. We
need a new approach.

What is Hadoop ?
Apache Hadoop is a framework that allows the
distributed processing of large data sets across
clusters of commodity computers using a
simple programming model.
It is an open-source data management
technology with scale-out storage and
distributed processing.
5

Background : Hadoop + HDFS
HDFS Distributed File System
NameNode
DataNode DataNode
Local File
System
Local File
System
 Every node contributes part of
its local file System to HDFS.
 Tasks can only depend on the
local file system
(JVM class path does not
understand HDFS Protocol)
7

YARN
9
Yet Another Resource Negotiator
YARN Application Resource Negotiator (Recursive
Acronym)
Remedies the scalability shortcomings of “classic”
MapReduce
Classic MapReduce has scalability issues around
4000 nodes and higher
Is more of a general-purpose framework of which
classic MapReduce is one application.

YARN Flow
YARN = YET ANOTHER RESOURCE NEGOTIATOR
Resource Manager
 Cluster-level Resource Manager
 Long Life, High-quality hardware
Node Manager
 One per Data Node
 Monitor resources on Data Node
Application Master
 One per Data Node
 Short Life
 Manages Task/Scheduling
10

YARN – How
It Works
Protocols :
1.) Client – RM: Submit the
App Master
2.) RM – NM: Start the App
Master
3.) AM – RM: Request +
Release containers
4.) RM – NM: Start tasks in
containers
YARN
Client
YARN
Resource Manager
Node Manager
Node Manager
Task
AM
Node Manager
Task
Task
Task Task
1.)
2.)
3.)
4.)
11

YARN Architectural
Overview
 Scalability – Clusters of 6000 –
10000 machines
 Each machine with 16 cores ,
48GB/96GB RAM, 24TB/36TB Hard
Disks.
 100,000 + Concurrent tasks
 10000 concurrent jobs
12

YARN Architectural
Overview(Contd..)
 Splits up the two major functions of JobTracker
 Global Resource Manager - Cluster resource
management
 Application Master - Job scheduling and
monitoring (one per application). The
Application Master negotiates resource
containers from the Scheduler, tracking their
status and monitoring for progress. Application
Master itself runs as a normal container.
 Tasktracker
 NodeManager (NM) - A new per-node slave is
responsible for launching the applications’
containers, monitoring their resource usage
(cpu, memory, disk, network) and reporting to
the Resource Manager.
 YARN maintains compatibility with existing
MapReduce applications and users.
13

HADOOP_2_0_YARN_Arch - Copy.pptx

Recommended

Recommended

More Related Content

Similar to HADOOP_2_0_YARN_Arch - Copy.pptx

Similar to HADOOP_2_0_YARN_Arch - Copy.pptx (20)

Recently uploaded

Recently uploaded (20)

HADOOP_2_0_YARN_Arch - Copy.pptx