Hadoop YARN is a specific component of the open source Hadoop platform for big data analytics.
YARN stands for “Yet Another Resource Negotiator”. YARN was introduced to make the most out of HDFS.
Job scheduling is also handled by YARN.
3. Hadoop YARN is a specific component of the open sourc
e Hadoop platform for big data analytics.
YARN stands for “Yet Another Resource Negotiator”. Y
ARN was introduced to make the most out of HDFS.
Job scheduling is also handled by YARN.
3
Hadoop YARN
4. Hadoop 1.0
1. Limited to4,000 nodes per cluster
2. ‘0’ number of tasks in a cluster.
3. Jobtracker is bottleneck.
4. It has only one namespace for handl
ing HDFS.
5. It has only one job to run Mapreduc
e
5. “ 1. Limited up to 10,000 nodes per cluster.
2. ‘0’ cluster size
3. YARN has efficient cluster Utilization.
4. It supports multiple namespace for handling
HDFS.
5. Any application can integrate with Hadoop.
5
Hadoop 2.0
8. RESOURCE MANAGER
8
The Resource Manager manages the global
assignment of compute resources to
applications. It consist of two main services:
Scheduler :- It’s a pluggable service
that manages and enforces scheduling policy
in the cluster.
Application manager :- It manages
the running Masters in the cluster and also
responsible for monitoring and restarting on
different nodes.
9. Application Master :
Its responsible for negotiating resources
from Resource Manager and work with the NMs to
execute and monitor the tasks.
Node Manager :
Node Manager manages the user processes
on that machine.
Containers :
It’s a conceptual entity, a certain
amount of resources on a given machine
to run a component task.
9
11. 1. Interaction between client and
Resource Manager
11
A new application request by the client to
the Resource Manager.
The Resource Manager responds with a
unique application ID and Information about
cluster.
Client constructs and submits an
Application submission context to Resource
Manager.
The client can query the Resource Manager.
The client can “force kill” an application.
12. 2. Interaction between Resource Manager
and Application Master
100%
185,244 users
The Resource Manager contacts No
de Manager, the Application Master is l
aunched and registered itself.
The Application Master is used for c
alculating and requesting resources.
The Application Master sends resou
rce allocation request to Resource Mast
er and sends back an allocation respons
e.
Finally, Application Master sends a
finish message.
12
13. The Application Master request
the hosting Node Manager for each
container to start the container.
The Application Master can
request and receive a container status
report from the Node Manager.
13
3. Interaction between Application
Master and Node Manager
15. Use of MapReduce
15
Batch analysis is done to aggregate data on various timescales.
The data collector retrieves the sensor data collected in the
hadoop.
The Map program reads the data from input(stdin) and splits
the data into timestamp and individual sensor readings.
The map program emits key-value pairs where key is a portion
of the timestamp .
The value is a comma separated string of sensor readings.
16. The key-value pairs emitted by the map program are shuffled to the reduce
r and grouped by the key.
The reducer reads the key-value pairs grouped by the same key from stand
ard input and computes the means of temperature, humidity, light and CO re
adings.
The raw sensor readings along with timestamp:
“2014-04-29 10:15:32”,37,44,31,6
:2014-04-30 10:15:32”,84,58,23,2
16