SlideShare a Scribd company logo
Setting up a HADOOP 2.2 Cluster on RHEL / CentOS 6
This article presents steps to create a HADOOP 2.2 cluster on VMware workstation 8/9/10. Following
is an outline of the installation process.
1. Clone and configure Virtual Machines for setup
2. Install and configure Java and HADOOP software on Master node
3. Copy Master node VM configuration to slave nodes
Let us start with the cluster configuration. We need at least 3 Virtual Machines. 1 Master node, and 2
Slave nodes. All VMs have similar configuration, as follows.
Processor – 2 CPU (dual core)
RAM – 2 GB
HDD – 100 GB
NIC – Virtual NIC
Virtual Machine (VM) Configuration
Create a virtual machine and install RHEL 6.2 on it. Following is the initial configuration done for this VM.
Hostname node1
IP Address 192.168.1.15
MAC Address 00:0C:29:11:66:D3
Subnet mask 255.255.255.0
Gateway 192.168.1.1
After configuring these settings, make a copy of it that will be utilized for other virtual machines. To
make VMs unique, prior to cloning a VM, change its MAC address and after booting, configure the IP
addresses as per following table.
Step 1– Clone and configure Virtual Machines for setup
Machine Role MAC Address IP Address Hostname
HADOOP Master Node 00:0C:29:11:66:D3 192.168.1.15 master1
HADOOP Slave Node 1 00:50:56:36:EF:D5 192.168.1.16 slave1
HADOOP Slave Node 2 00:50:56:3B:2E:64 192.168.1.17 slave2
After setting up the first virtual machine, we may need to configure initial settings, as per following
details.
1. Disabling SELinux
2. Disabling Firewall
3. Host names, IP addresses and MAC addresses
A record of above is good to be kept for ready reference, as given in the table above.
Configure Hosts for IP network communication
# vim /etc/hosts
192.168.1.15 master1
192.168.1.16 slave1
192.168.1.17 slave2
Create a user hadoop with password-less authentication
A user called hadoop is created and we have to login as "hadoop" for all configuration and management
of HADOOP cluster.
# useradd hadoop
# passwd hadoop
su - hadoop
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
chmod 0600 ~/.ssh/authorized_keys
exit
Download Java binaries
Let us see installing Java from a tar file obtained from oracle.com, unlike the rpm method.
# wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-
i586.tar.gz?AuthParam=1386669648_7d41138392c2fe62a5ad481d4696b647
Java Installation using tarball
Java is a prerequisite for installing HADOOP on any system. Recommended java versions are given for
HADOOP on Apache foundation website. We should go with the recommended versions.
Following steps explain installation of Java on Linux using a tarball.
cd /opt/
tar xvf JDK_7u45_tar/jdk-7u45-linux-i586.tar.gz
cd jdk1.7.0_45/
alternatives --install /usr/bin/java java /opt/jdk1.7.0_45/bin/java 2
alternatives --config java
Output
[root@master1 opt]# cd jdk1.7.0_45/
[root@master1 jdk1.7.0_45]# alternatives --install /usr/bin/java java /opt
/jdk1.7.0_45/bin/java 2
[root@master1 jdk1.7.0_45]# alternatives --config java
There are 3 programs which provide 'java'.
Selection Command
-----------------------------------------------
*+ 1 /usr/lib/jvm/jre-1.6.0-openjdk/bin/java
2 /usr/lib/jvm/jre-1.5.0-gcj/bin/java
3 /opt/jdk1.7.0_45/bin/java
Enter to keep the current selection[+], or type selection number: 3
[root@master1 jdk1.7.0_45]# ll /etc/alternatives/java
lrwxrwxrwx 1 root root 25 Dec 10 16:03 /etc/alternatives/java -> /opt/jdk1.7.0_4
5/bin/java
[root@master1 jdk1.7.0_45]#
[root@master1 jdk1.7.0_45]# java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode)
[root@master1 jdk1.7.0_45]# export JAVA_HOME=/opt/jdk1.7.0_45/
[root@master1 jdk1.7.0_45]# export JRE_HOME=/opt/jdk1.7.0_45/jre
[root@master1 jdk1.7.0_45]# export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin
[root@master1 jdk1.7.0_45]#
Configure Java PATH
export JAVA_HOME=/opt/jdk1.7.0_45/
export JRE_HOME=/opt/jdk1.7.0_45/jre
export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin
After installing Java, its path need to be persistent across reboots. The above setting can be appended to
/etc/profile so that it is common to all users.
Installing HADOOP binaries
The "/opt" directory in Linux is provided for 3rd party applications.
# cd /opt/
[root@master1 hadoop]# wget http://hadoop-2.2.....tar.gz
# tar -xzf hadoop-2.2....tar.gz
# mv hadoop-2.2.0... hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/
cd /opt
Tar -zxvf hadoop.2.2.tar
[root@master1 ~]# ll /opt/
total 12
drwxr-xr-x 11 hadoop hadoop 4096 Jun 26 02:31 hadoop
[hadoop@master1 ~]$ ll /opt/hadoop/
total 2680
drwxr-xr-x 2 hadoop hadoop 4096 Jun 27 02:14 bin
drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 etc
-rwxrw-rw- 1 hadoop hadoop 2679682 Jun 26 02:29 hadoop-test.jar
drwxr-xr-x 2 hadoop hadoop 4096 Oct 6 2013 include
drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 lib
drwxr-xr-x 2 hadoop hadoop 4096 Jun 12 09:52 libexec
-rw-r--r-- 1 hadoop hadoop 15164 Oct 6 2013 LICENSE.txt
drwxrwxr-x 3 hadoop hadoop 4096 Jun 27 02:38 logs
-rw-r--r-- 1 hadoop hadoop 101 Oct 6 2013 NOTICE.txt
-rw-r--r-- 1 hadoop hadoop 1366 Oct 6 2013 README.txt
drwxr-xr-x 2 hadoop hadoop 4096 May 18 04:55 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Oct 6 2013 share
drwxrwxr-x 4 hadoop hadoop 4096 Jun 26 20:47 tmp
Configure hadoop cluster setup using these steps on all nodes:
Login as user hadoop and edit '~/.bashrc' as follows.
[hadoop@master1 ~]$ pwd
/home/hadoop
[hadoop@master1 ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export JAVA_HOME=/opt/jdk1.7.0_60
export HADOOP_INSTALL=/opt/hadoop
export HADOOP_PREFIX=/opt/hadoop
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
[hadoop@master1 ~]$
Configuring HADOOP, starting, and viewing status
Change folder to /opt/hadoop/hadoop/etc/hadoop
Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'.
Do not leave it as ${JAVA_HOME} as that does not works.
[hadoop@master1 ~]$ cd /opt/hadoop/etc/hadoop/
[hadoop@master1 hadoop]$ cat hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_60
Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of
script:
export JAVA_HOME=/opt/jdk1.7.0_60
Create Hadoop tmp directory
Use 'mdkir /opt/hadoop/tmp'
Edit 'core-site.xml' and add following between <configuration> and </configuration>:
[hadoop@master1 hadoop]$ cat core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
Setup folders for HDFS
cd ~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode
cd /opt/hadoop/hadoop/etc/hadoop
Edit 'hdfs-site.xml'
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/mydata/hdfs/datanode</value>
</property>
<property>
<name>dfs.hosts</name>
<value>/opt/hadoop/etc/hadoop/dfs.include</value>
</property>
</configuration>
Copy mapred-site.xml template using 'cp mapred-site.xml.template mapred-site.xml'
Edit 'mapred-site.xml' as following:
[hadoop@master1 hadoop]$ cat mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit 'yarn-site.xml' and as following
[hadoop@master1 hadoop]$ cat yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8040</value>
</property>
</configuration>
Copy Master node VM configuration to slave nodes
Format all namenodes master, slave1, slave2, etc. using 'hdfs namenode -format'
Do following only on master machine:
Edit 'slaves' files so that it contains:
slave1
slave2
Note : If master is also expected to serve as datanode (store hdfs files) then add 'master' to the slaves
file as well.
 Run 'start-dfs.sh' and 'start-yarn.sh' commands
 Run 'jps' and verify on master 'ResourceManager', 'NameNode' and 'SecondaryNameNode' are
running.
 Run 'jps' on slaves and verify that 'NodeManager' and 'DataNode' are running.
To stop all HADOOP services, run the following command:
Run 'stop-dfs.sh' and 'stop-yarn.sh' commands
Web Access URLs for Services
After starting HADOOP services, you can view and monitor their status using following URLs.
Access NameNode at http://master1:50070 and ResourceManager at http://master1:8088

More Related Content

What's hot

Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
維泰 蔡
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全
維泰 蔡
 
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Codemotion
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
arunkumar sadhasivam
 
Docker practice
Docker practiceDocker practice
Docker practice
wonyong hwang
 
AMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion PassengerAMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion Passenger
icemobile
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
exsuns
 
Install tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war fileInstall tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war file
Nguyen Cao Hung
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...
Tiago Simões
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
Sruthi Kumar Annamnidu
 
L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5
William Lee
 
Hadoop 3.1.1 single node
Hadoop 3.1.1 single nodeHadoop 3.1.1 single node
Hadoop 3.1.1 single node
康志強 大人
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
Nur Ahammad
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
exsuns
 
Really useful linux commands
Really useful linux commandsReally useful linux commands
Really useful linux commands
Michael J Geiser
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linux
Ali Rachman
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete Example
Mohammed Farrag
 
Hadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse supportHadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse support
mdcdwh
 

What's hot (18)

Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全
 
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
 
Docker practice
Docker practiceDocker practice
Docker practice
 
AMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion PassengerAMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion Passenger
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Install tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war fileInstall tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war file
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5
 
Hadoop 3.1.1 single node
Hadoop 3.1.1 single nodeHadoop 3.1.1 single node
Hadoop 3.1.1 single node
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Really useful linux commands
Really useful linux commandsReally useful linux commands
Really useful linux commands
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linux
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete Example
 
Hadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse supportHadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse support
 

Viewers also liked

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
Manish Chopra
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the Enterprise
Manish Chopra
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in Linux
Manish Chopra
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
Manish Chopra
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search Engine
Manish Chopra
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clusters
Manish Chopra
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installation
Manish Chopra
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOC
Manish Chopra
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
Manish Chopra
 

Viewers also liked (9)

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the Enterprise
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in Linux
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search Engine
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clusters
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installation
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOC
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 

Similar to Setting up a HADOOP 2.2 cluster on CentOS 6

Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
jijukjoseph
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
habeebulla g
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
baabtra.com - No. 1 supplier of quality freshers
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop Installation
mrinalsingh385
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
Mahantesh Angadi
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
IMC Institute
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Titus Damaiyanti
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
Devopam Mittra
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
habeebulla g
 
Centos config
Centos configCentos config
Centos config
Muhammad Abdi
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
AnandMHadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
Shashwat Shriparv
 
installation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptxinstallation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptx
vishal choudhary
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
valeri kopaleishvili
 
Single node setup
Single node setupSingle node setup
Single node setup
KBCHOW123
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
recast203
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
Rupak Roy
 
Bundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPMBundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPM
Alexander Shopov
 

Similar to Setting up a HADOOP 2.2 cluster on CentOS 6 (20)

Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop Installation
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
Centos config
Centos configCentos config
Centos config
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
installation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptxinstallation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptx
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Bundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPMBundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPM
 

More from Manish Chopra

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
Manish Chopra
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdf
Manish Chopra
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and Usage
Manish Chopra
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdf
Manish Chopra
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution Document
Manish Chopra
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution Document
Manish Chopra
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToC
Manish Chopra
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToC
Manish Chopra
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and Kubernetes
Manish Chopra
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating Systems
Manish Chopra
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
Manish Chopra
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
Manish Chopra
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
Manish Chopra
 

More from Manish Chopra (13)

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdf
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and Usage
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdf
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution Document
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution Document
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToC
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToC
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and Kubernetes
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating Systems
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 

Recently uploaded

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
ISH Technologies
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 

Recently uploaded (20)

What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Preparing Non - Technical Founders for Engaging a Tech Agency
Preparing Non - Technical Founders for Engaging  a  Tech AgencyPreparing Non - Technical Founders for Engaging  a  Tech Agency
Preparing Non - Technical Founders for Engaging a Tech Agency
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 

Setting up a HADOOP 2.2 cluster on CentOS 6

  • 1. Setting up a HADOOP 2.2 Cluster on RHEL / CentOS 6 This article presents steps to create a HADOOP 2.2 cluster on VMware workstation 8/9/10. Following is an outline of the installation process. 1. Clone and configure Virtual Machines for setup 2. Install and configure Java and HADOOP software on Master node 3. Copy Master node VM configuration to slave nodes Let us start with the cluster configuration. We need at least 3 Virtual Machines. 1 Master node, and 2 Slave nodes. All VMs have similar configuration, as follows. Processor – 2 CPU (dual core) RAM – 2 GB HDD – 100 GB NIC – Virtual NIC Virtual Machine (VM) Configuration Create a virtual machine and install RHEL 6.2 on it. Following is the initial configuration done for this VM. Hostname node1 IP Address 192.168.1.15 MAC Address 00:0C:29:11:66:D3 Subnet mask 255.255.255.0 Gateway 192.168.1.1 After configuring these settings, make a copy of it that will be utilized for other virtual machines. To make VMs unique, prior to cloning a VM, change its MAC address and after booting, configure the IP addresses as per following table. Step 1– Clone and configure Virtual Machines for setup Machine Role MAC Address IP Address Hostname HADOOP Master Node 00:0C:29:11:66:D3 192.168.1.15 master1 HADOOP Slave Node 1 00:50:56:36:EF:D5 192.168.1.16 slave1 HADOOP Slave Node 2 00:50:56:3B:2E:64 192.168.1.17 slave2 After setting up the first virtual machine, we may need to configure initial settings, as per following details.
  • 2. 1. Disabling SELinux 2. Disabling Firewall 3. Host names, IP addresses and MAC addresses A record of above is good to be kept for ready reference, as given in the table above. Configure Hosts for IP network communication # vim /etc/hosts 192.168.1.15 master1 192.168.1.16 slave1 192.168.1.17 slave2 Create a user hadoop with password-less authentication A user called hadoop is created and we have to login as "hadoop" for all configuration and management of HADOOP cluster. # useradd hadoop # passwd hadoop su - hadoop ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2 chmod 0600 ~/.ssh/authorized_keys exit Download Java binaries Let us see installing Java from a tar file obtained from oracle.com, unlike the rpm method. # wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux- i586.tar.gz?AuthParam=1386669648_7d41138392c2fe62a5ad481d4696b647 Java Installation using tarball Java is a prerequisite for installing HADOOP on any system. Recommended java versions are given for HADOOP on Apache foundation website. We should go with the recommended versions. Following steps explain installation of Java on Linux using a tarball. cd /opt/ tar xvf JDK_7u45_tar/jdk-7u45-linux-i586.tar.gz cd jdk1.7.0_45/
  • 3. alternatives --install /usr/bin/java java /opt/jdk1.7.0_45/bin/java 2 alternatives --config java Output [root@master1 opt]# cd jdk1.7.0_45/ [root@master1 jdk1.7.0_45]# alternatives --install /usr/bin/java java /opt /jdk1.7.0_45/bin/java 2 [root@master1 jdk1.7.0_45]# alternatives --config java There are 3 programs which provide 'java'. Selection Command ----------------------------------------------- *+ 1 /usr/lib/jvm/jre-1.6.0-openjdk/bin/java 2 /usr/lib/jvm/jre-1.5.0-gcj/bin/java 3 /opt/jdk1.7.0_45/bin/java Enter to keep the current selection[+], or type selection number: 3 [root@master1 jdk1.7.0_45]# ll /etc/alternatives/java lrwxrwxrwx 1 root root 25 Dec 10 16:03 /etc/alternatives/java -> /opt/jdk1.7.0_4 5/bin/java [root@master1 jdk1.7.0_45]# [root@master1 jdk1.7.0_45]# java -version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode) [root@master1 jdk1.7.0_45]# export JAVA_HOME=/opt/jdk1.7.0_45/ [root@master1 jdk1.7.0_45]# export JRE_HOME=/opt/jdk1.7.0_45/jre [root@master1 jdk1.7.0_45]# export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin [root@master1 jdk1.7.0_45]# Configure Java PATH export JAVA_HOME=/opt/jdk1.7.0_45/ export JRE_HOME=/opt/jdk1.7.0_45/jre export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin After installing Java, its path need to be persistent across reboots. The above setting can be appended to /etc/profile so that it is common to all users.
  • 4. Installing HADOOP binaries The "/opt" directory in Linux is provided for 3rd party applications. # cd /opt/ [root@master1 hadoop]# wget http://hadoop-2.2.....tar.gz # tar -xzf hadoop-2.2....tar.gz # mv hadoop-2.2.0... hadoop # chown -R hadoop /opt/hadoop # cd /opt/hadoop/hadoop/ cd /opt Tar -zxvf hadoop.2.2.tar [root@master1 ~]# ll /opt/ total 12 drwxr-xr-x 11 hadoop hadoop 4096 Jun 26 02:31 hadoop [hadoop@master1 ~]$ ll /opt/hadoop/ total 2680 drwxr-xr-x 2 hadoop hadoop 4096 Jun 27 02:14 bin drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 etc -rwxrw-rw- 1 hadoop hadoop 2679682 Jun 26 02:29 hadoop-test.jar drwxr-xr-x 2 hadoop hadoop 4096 Oct 6 2013 include drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 lib drwxr-xr-x 2 hadoop hadoop 4096 Jun 12 09:52 libexec -rw-r--r-- 1 hadoop hadoop 15164 Oct 6 2013 LICENSE.txt drwxrwxr-x 3 hadoop hadoop 4096 Jun 27 02:38 logs -rw-r--r-- 1 hadoop hadoop 101 Oct 6 2013 NOTICE.txt -rw-r--r-- 1 hadoop hadoop 1366 Oct 6 2013 README.txt drwxr-xr-x 2 hadoop hadoop 4096 May 18 04:55 sbin drwxr-xr-x 4 hadoop hadoop 4096 Oct 6 2013 share drwxrwxr-x 4 hadoop hadoop 4096 Jun 26 20:47 tmp Configure hadoop cluster setup using these steps on all nodes: Login as user hadoop and edit '~/.bashrc' as follows. [hadoop@master1 ~]$ pwd /home/hadoop [hadoop@master1 ~]$ cat .bashrc # .bashrc
  • 5. # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # User specific aliases and functions export JAVA_HOME=/opt/jdk1.7.0_60 export HADOOP_INSTALL=/opt/hadoop export HADOOP_PREFIX=/opt/hadoop export HADOOP_HOME=/opt/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop [hadoop@master1 ~]$ Configuring HADOOP, starting, and viewing status Change folder to /opt/hadoop/hadoop/etc/hadoop Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'. Do not leave it as ${JAVA_HOME} as that does not works. [hadoop@master1 ~]$ cd /opt/hadoop/etc/hadoop/ [hadoop@master1 hadoop]$ cat hadoop-env.sh export JAVA_HOME=/opt/jdk1.7.0_60 Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of script: export JAVA_HOME=/opt/jdk1.7.0_60 Create Hadoop tmp directory Use 'mdkir /opt/hadoop/tmp' Edit 'core-site.xml' and add following between <configuration> and </configuration>: [hadoop@master1 hadoop]$ cat core-site.xml <configuration> <property>
  • 6. <name>fs.default.name</name> <value>hdfs://master1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration> Setup folders for HDFS cd ~ mkdir -p mydata/hdfs/namenode mkdir -p mydata/hdfs/datanode cd /opt/hadoop/hadoop/etc/hadoop Edit 'hdfs-site.xml' <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/mydata/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/mydata/hdfs/datanode</value> </property> <property> <name>dfs.hosts</name> <value>/opt/hadoop/etc/hadoop/dfs.include</value> </property> </configuration> Copy mapred-site.xml template using 'cp mapred-site.xml.template mapred-site.xml' Edit 'mapred-site.xml' as following: [hadoop@master1 hadoop]$ cat mapred-site.xml
  • 7. <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> Edit 'yarn-site.xml' and as following [hadoop@master1 hadoop]$ cat yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master1:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master1:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master1:8040</value> </property> </configuration> Copy Master node VM configuration to slave nodes Format all namenodes master, slave1, slave2, etc. using 'hdfs namenode -format' Do following only on master machine: Edit 'slaves' files so that it contains: slave1 slave2 Note : If master is also expected to serve as datanode (store hdfs files) then add 'master' to the slaves file as well.
  • 8.  Run 'start-dfs.sh' and 'start-yarn.sh' commands  Run 'jps' and verify on master 'ResourceManager', 'NameNode' and 'SecondaryNameNode' are running.  Run 'jps' on slaves and verify that 'NodeManager' and 'DataNode' are running. To stop all HADOOP services, run the following command: Run 'stop-dfs.sh' and 'stop-yarn.sh' commands Web Access URLs for Services After starting HADOOP services, you can view and monitor their status using following URLs. Access NameNode at http://master1:50070 and ResourceManager at http://master1:8088