Hadoop2.2

HADOOP 2.2
INTRODUCTION AND INSTALLATION

Sreejith
Oct, 2013

What is new in hadoop 2.2 ?
• Update to the MapReduce framework to
Apache YARN
• MapReduce is a big feature in Hadoop—the
batch processor that lines up search jobs that
go into the Hadoop distributed file system
(HDFS) to pull out useful information. In the
previous version of MapReduce, jobs could
only be done one at a time, in batches,
because that's how the Java-based
MapReduce tool worked.

• Its will enable multiple search tools to hit the
data within the HDFS storage system at the
same time
• YARN does is divide the functionality of
MapReduce even further,
– JobTracker component—resource
management and job
– scheduling/monitoring—into separate
applications

• With MapReduce 2.0, developers can now
build apps directly within Hadoop, instead of
bolting them on from the outside, as many
third-party vendor tools have had to do in
Hadoop 1.0. This essentially will establish
Hadoop 2.0 as a platform into which
developers can create applications that will
search for an manipulate data far more
efficiently.

• YARN is the biggest change in the new
version of Hadoop,
– high availability for HDFS,
– HDFS snapshots
– support for the NFSv3 filesystem to access
data in HDFS

• Hadoop 2.2 is now officially supported on
Microsoft Window

YARN/MapReduce 2.0 architecture
Node
Manager
AppMaster

Container

Client
Node
Manager

Resource
Manager
Client

AppMaster

Container

Node
Manager

Container

Container

YARN/MapReduce 2.0 architecture
Detail of Figure
Mapraduce
Job Submission
Node Status
Resource Request

Single node cluster setup
• Prerequisites:
–
–
–

Java 6 installed
Dedicated user for hadoop
SSH configured

• You can download tarball for hadoop 2.2 from
– http://mirror.metrocast.net/apache/hadoop/common/stable2/

– Extract it to a folder say, /home/hduser/yarn.
We assume dedicated user for Hadoop is
“hduser”.

•

• After download the file justExtract it to a folder
say, /home/hadoop/yarn We assume
dedicated user for Hadoop is “hadoop”.
– $ tar -xvzf hadoop-2.2.0.tar.gz
– $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0
– $ cd /home/hadoop/yarn
– $ sudo chown -R hadoop:hadoop hadoop-2.2.0
– $ sudo chmod -R 755 hadoop-2.2.0

• Setup Environment Variables in ~/.bashrc
– export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop

• After Adding these lines at bottom of the
.bashrc file
– $ source ~/.bashrc

• Create Hadoop Data Directories
# Two Directories for name node and datanode
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
–
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

•

Configuration
– $ cd $YARN_HOME
– $ vi etc/hadoop/yarn-site.xml
– Edit the yarn-site.xml

• Add the following contents inside
configuration tag
# etc/hadoop/yarn-site.xml .
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

• $ vi etc/hadoop/core-site.xml
• Add the following contents inside
configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

• $ vi etc/hadoop/hdfs-site.xml
• Add the following contents inside configuration tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
</property>

• $ vi etc/hadoop/mapred-site.xml
• If this file does not exist, create it and paste
the content provided below:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

• Format namenode(Onetime Process)
– $ bin/hadoop namenode -format

• Starting HDFS processes and Map-Reduce
Process
# HDFS(NameNode & DataNode).

– $ sbin/hadoop-daemon.sh start namenode
– $ sbin/hadoop-daemon.sh start datanode
# MR(Resource Manager, Node Manager & Job History Server).

– $ sbin/yarn-daemon.sh start resourcemanager
– $ sbin/yarn-daemon.sh start nodemanager
– $ sbin/mr-jobhistory-daemon.sh start historyserver

• Verifying Installation
$ jps
# Console Output.

22844 Jps
28711 DataNode
29281 JobHistoryServer
28887 ResourceManager
29022 NodeManager
28180 NameNode

• Running Word count Example Program
$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
• Add input directory to HDFS
$ bin/hadoop hdfs -copyFromLocal input /input

• Run wordcount example jar provided in
HADOOP_HOME:
$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output
• Check the output:
$ bin/hadoop dfs -cat /out/*
This 2
Another 1
is 2
line 1
one 2

• Web interface
• Browse HDFS and check health using
http://localhost:50070 in the browser:

• You can check the status of the applications
running using the following
URL:http://localhost:8088
•

Hadoop2.2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop2.2

Similar to Hadoop2.2 (20)

Recently uploaded

Recently uploaded (20)

Hadoop2.2