SlideShare a Scribd company logo
1 of 8
Download to read offline
Setting up a HADOOP 2.2 Cluster on RHEL / CentOS 6
This article presents steps to create a HADOOP 2.2 cluster on VMware workstation 8/9/10. Following
is an outline of the installation process.
1. Clone and configure Virtual Machines for setup
2. Install and configure Java and HADOOP software on Master node
3. Copy Master node VM configuration to slave nodes
Let us start with the cluster configuration. We need at least 3 Virtual Machines. 1 Master node, and 2
Slave nodes. All VMs have similar configuration, as follows.
Processor – 2 CPU (dual core)
RAM – 2 GB
HDD – 100 GB
NIC – Virtual NIC
Virtual Machine (VM) Configuration
Create a virtual machine and install RHEL 6.2 on it. Following is the initial configuration done for this VM.
Hostname node1
IP Address 192.168.1.15
MAC Address 00:0C:29:11:66:D3
Subnet mask 255.255.255.0
Gateway 192.168.1.1
After configuring these settings, make a copy of it that will be utilized for other virtual machines. To
make VMs unique, prior to cloning a VM, change its MAC address and after booting, configure the IP
addresses as per following table.
Step 1– Clone and configure Virtual Machines for setup
Machine Role MAC Address IP Address Hostname
HADOOP Master Node 00:0C:29:11:66:D3 192.168.1.15 master1
HADOOP Slave Node 1 00:50:56:36:EF:D5 192.168.1.16 slave1
HADOOP Slave Node 2 00:50:56:3B:2E:64 192.168.1.17 slave2
After setting up the first virtual machine, we may need to configure initial settings, as per following
details.
1. Disabling SELinux
2. Disabling Firewall
3. Host names, IP addresses and MAC addresses
A record of above is good to be kept for ready reference, as given in the table above.
Configure Hosts for IP network communication
# vim /etc/hosts
192.168.1.15 master1
192.168.1.16 slave1
192.168.1.17 slave2
Create a user hadoop with password-less authentication
A user called hadoop is created and we have to login as "hadoop" for all configuration and management
of HADOOP cluster.
# useradd hadoop
# passwd hadoop
su - hadoop
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2
chmod 0600 ~/.ssh/authorized_keys
exit
Download Java binaries
Let us see installing Java from a tar file obtained from oracle.com, unlike the rpm method.
# wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-
i586.tar.gz?AuthParam=1386669648_7d41138392c2fe62a5ad481d4696b647
Java Installation using tarball
Java is a prerequisite for installing HADOOP on any system. Recommended java versions are given for
HADOOP on Apache foundation website. We should go with the recommended versions.
Following steps explain installation of Java on Linux using a tarball.
cd /opt/
tar xvf JDK_7u45_tar/jdk-7u45-linux-i586.tar.gz
cd jdk1.7.0_45/
alternatives --install /usr/bin/java java /opt/jdk1.7.0_45/bin/java 2
alternatives --config java
Output
[root@master1 opt]# cd jdk1.7.0_45/
[root@master1 jdk1.7.0_45]# alternatives --install /usr/bin/java java /opt
/jdk1.7.0_45/bin/java 2
[root@master1 jdk1.7.0_45]# alternatives --config java
There are 3 programs which provide 'java'.
Selection Command
-----------------------------------------------
*+ 1 /usr/lib/jvm/jre-1.6.0-openjdk/bin/java
2 /usr/lib/jvm/jre-1.5.0-gcj/bin/java
3 /opt/jdk1.7.0_45/bin/java
Enter to keep the current selection[+], or type selection number: 3
[root@master1 jdk1.7.0_45]# ll /etc/alternatives/java
lrwxrwxrwx 1 root root 25 Dec 10 16:03 /etc/alternatives/java -> /opt/jdk1.7.0_4
5/bin/java
[root@master1 jdk1.7.0_45]#
[root@master1 jdk1.7.0_45]# java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode)
[root@master1 jdk1.7.0_45]# export JAVA_HOME=/opt/jdk1.7.0_45/
[root@master1 jdk1.7.0_45]# export JRE_HOME=/opt/jdk1.7.0_45/jre
[root@master1 jdk1.7.0_45]# export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin
[root@master1 jdk1.7.0_45]#
Configure Java PATH
export JAVA_HOME=/opt/jdk1.7.0_45/
export JRE_HOME=/opt/jdk1.7.0_45/jre
export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin
After installing Java, its path need to be persistent across reboots. The above setting can be appended to
/etc/profile so that it is common to all users.
Installing HADOOP binaries
The "/opt" directory in Linux is provided for 3rd party applications.
# cd /opt/
[root@master1 hadoop]# wget http://hadoop-2.2.....tar.gz
# tar -xzf hadoop-2.2....tar.gz
# mv hadoop-2.2.0... hadoop
# chown -R hadoop /opt/hadoop
# cd /opt/hadoop/hadoop/
cd /opt
Tar -zxvf hadoop.2.2.tar
[root@master1 ~]# ll /opt/
total 12
drwxr-xr-x 11 hadoop hadoop 4096 Jun 26 02:31 hadoop
[hadoop@master1 ~]$ ll /opt/hadoop/
total 2680
drwxr-xr-x 2 hadoop hadoop 4096 Jun 27 02:14 bin
drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 etc
-rwxrw-rw- 1 hadoop hadoop 2679682 Jun 26 02:29 hadoop-test.jar
drwxr-xr-x 2 hadoop hadoop 4096 Oct 6 2013 include
drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 lib
drwxr-xr-x 2 hadoop hadoop 4096 Jun 12 09:52 libexec
-rw-r--r-- 1 hadoop hadoop 15164 Oct 6 2013 LICENSE.txt
drwxrwxr-x 3 hadoop hadoop 4096 Jun 27 02:38 logs
-rw-r--r-- 1 hadoop hadoop 101 Oct 6 2013 NOTICE.txt
-rw-r--r-- 1 hadoop hadoop 1366 Oct 6 2013 README.txt
drwxr-xr-x 2 hadoop hadoop 4096 May 18 04:55 sbin
drwxr-xr-x 4 hadoop hadoop 4096 Oct 6 2013 share
drwxrwxr-x 4 hadoop hadoop 4096 Jun 26 20:47 tmp
Configure hadoop cluster setup using these steps on all nodes:
Login as user hadoop and edit '~/.bashrc' as follows.
[hadoop@master1 ~]$ pwd
/home/hadoop
[hadoop@master1 ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
export JAVA_HOME=/opt/jdk1.7.0_60
export HADOOP_INSTALL=/opt/hadoop
export HADOOP_PREFIX=/opt/hadoop
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
[hadoop@master1 ~]$
Configuring HADOOP, starting, and viewing status
Change folder to /opt/hadoop/hadoop/etc/hadoop
Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'.
Do not leave it as ${JAVA_HOME} as that does not works.
[hadoop@master1 ~]$ cd /opt/hadoop/etc/hadoop/
[hadoop@master1 hadoop]$ cat hadoop-env.sh
export JAVA_HOME=/opt/jdk1.7.0_60
Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of
script:
export JAVA_HOME=/opt/jdk1.7.0_60
Create Hadoop tmp directory
Use 'mdkir /opt/hadoop/tmp'
Edit 'core-site.xml' and add following between <configuration> and </configuration>:
[hadoop@master1 hadoop]$ cat core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
Setup folders for HDFS
cd ~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode
cd /opt/hadoop/hadoop/etc/hadoop
Edit 'hdfs-site.xml'
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/mydata/hdfs/datanode</value>
</property>
<property>
<name>dfs.hosts</name>
<value>/opt/hadoop/etc/hadoop/dfs.include</value>
</property>
</configuration>
Copy mapred-site.xml template using 'cp mapred-site.xml.template mapred-site.xml'
Edit 'mapred-site.xml' as following:
[hadoop@master1 hadoop]$ cat mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Edit 'yarn-site.xml' and as following
[hadoop@master1 hadoop]$ cat yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master1:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master1:8040</value>
</property>
</configuration>
Copy Master node VM configuration to slave nodes
Format all namenodes master, slave1, slave2, etc. using 'hdfs namenode -format'
Do following only on master machine:
Edit 'slaves' files so that it contains:
slave1
slave2
Note : If master is also expected to serve as datanode (store hdfs files) then add 'master' to the slaves
file as well.
 Run 'start-dfs.sh' and 'start-yarn.sh' commands
 Run 'jps' and verify on master 'ResourceManager', 'NameNode' and 'SecondaryNameNode' are
running.
 Run 'jps' on slaves and verify that 'NodeManager' and 'DataNode' are running.
To stop all HADOOP services, run the following command:
Run 'stop-dfs.sh' and 'stop-yarn.sh' commands
Web Access URLs for Services
After starting HADOOP services, you can view and monitor their status using following URLs.
Access NameNode at http://master1:50070 and ResourceManager at http://master1:8088

More Related Content

What's hot

Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識維泰 蔡
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全維泰 蔡
 
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...Codemotion
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparisonarunkumar sadhasivam
 
AMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion PassengerAMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion Passengericemobile
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Install tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war fileInstall tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war fileNguyen Cao Hung
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...Tiago Simões
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsSruthi Kumar Annamnidu
 
L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5William Lee
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALIDNur Ahammad
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Really useful linux commands
Really useful linux commandsReally useful linux commands
Really useful linux commandsMichael J Geiser
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linuxAli Rachman
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleMohammed Farrag
 
Hadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse supportHadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse supportmdcdwh
 

What's hot (18)

Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識Linux 系統管理與安全:基本 Linux 系統知識
Linux 系統管理與安全:基本 Linux 系統知識
 
Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全Linux 系統管理與安全:系統防駭與資訊安全
Linux 系統管理與安全:系統防駭與資訊安全
 
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
Thijs Feryn - Leverage HTTP to deliver cacheable websites - Codemotion Milan ...
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
 
Docker practice
Docker practiceDocker practice
Docker practice
 
AMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion PassengerAMS Node Meetup December presentation Phusion Passenger
AMS Node Meetup December presentation Phusion Passenger
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Install tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war fileInstall tomcat 5.5 in debian os and deploy war file
Install tomcat 5.5 in debian os and deploy war file
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5L.A.M.P Installation Note --- CentOS 6.5
L.A.M.P Installation Note --- CentOS 6.5
 
Hadoop 3.1.1 single node
Hadoop 3.1.1 single nodeHadoop 3.1.1 single node
Hadoop 3.1.1 single node
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Really useful linux commands
Really useful linux commandsReally useful linux commands
Really useful linux commands
 
High Availability Server with DRBD in linux
High Availability Server with DRBD in linuxHigh Availability Server with DRBD in linux
High Availability Server with DRBD in linux
 
FreeBSD Jail Complete Example
FreeBSD Jail Complete ExampleFreeBSD Jail Complete Example
FreeBSD Jail Complete Example
 
Hadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse supportHadoop Admin role & Hive Data Warehouse support
Hadoop Admin role & Hive Data Warehouse support
 

Viewers also liked

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_TechnologiesManish Chopra
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseManish Chopra
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in LinuxManish Chopra
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineManish Chopra
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clustersManish Chopra
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installationManish Chopra
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCManish Chopra
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File SystemsManish Chopra
 

Viewers also liked (9)

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
 
Emergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the EnterpriseEmergence and Importance of Cloud Computing for the Enterprise
Emergence and Importance of Cloud Computing for the Enterprise
 
Steps to create an RPM package in Linux
Steps to create an RPM package in LinuxSteps to create an RPM package in Linux
Steps to create an RPM package in Linux
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
The Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search EngineThe Anatomy of GOOGLE Search Engine
The Anatomy of GOOGLE Search Engine
 
Organizations with largest hadoop clusters
Organizations with largest hadoop clustersOrganizations with largest hadoop clusters
Organizations with largest hadoop clusters
 
Oracle solaris 11 installation
Oracle solaris 11 installationOracle solaris 11 installation
Oracle solaris 11 installation
 
Big Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOCBig Data Analytics Course Guide TOC
Big Data Analytics Course Guide TOC
 
Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 

Similar to Setting up a HADOOP 2.2 cluster on CentOS 6

Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installationhabeebulla g
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsAnandMHadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
installation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptxinstallation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptxvishal choudhary
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Bundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPMBundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPMAlexander Shopov
 

Similar to Setting up a HADOOP 2.2 cluster on CentOS 6 (20)

Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop Installation
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
Centos config
Centos configCentos config
Centos config
 
Session 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic CommandsSession 03 - Hadoop Installation and Basic Commands
Session 03 - Hadoop Installation and Basic Commands
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
installation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptxinstallation of hadoop on ubuntu.pptx
installation of hadoop on ubuntu.pptx
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Bundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPMBundling Packages and Deploying Applications with RPM
Bundling Packages and Deploying Applications with RPM
 

More from Manish Chopra

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfManish Chopra
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfManish Chopra
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageManish Chopra
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfManish Chopra
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution DocumentManish Chopra
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentManish Chopra
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToCManish Chopra
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToCManish Chopra
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesManish Chopra
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating SystemsManish Chopra
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for ProcessingManish Chopra
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 

More from Manish Chopra (13)

AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdfAWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
AWS and Slack Integration - Sending CloudWatch Notifications to Slack.pdf
 
Getting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdfGetting Started with ChatGPT.pdf
Getting Started with ChatGPT.pdf
 
Grafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and UsageGrafana and AWS - Implementation and Usage
Grafana and AWS - Implementation and Usage
 
Containers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdfContainers Auto Scaling on AWS.pdf
Containers Auto Scaling on AWS.pdf
 
OpenKM Solution Document
OpenKM Solution DocumentOpenKM Solution Document
OpenKM Solution Document
 
Alfresco Content Services - Solution Document
Alfresco Content Services - Solution DocumentAlfresco Content Services - Solution Document
Alfresco Content Services - Solution Document
 
Jenkins Study Guide ToC
Jenkins Study Guide ToCJenkins Study Guide ToC
Jenkins Study Guide ToC
 
Ansible Study Guide ToC
Ansible Study Guide ToCAnsible Study Guide ToC
Ansible Study Guide ToC
 
Microservices with Dockers and Kubernetes
Microservices with Dockers and KubernetesMicroservices with Dockers and Kubernetes
Microservices with Dockers and Kubernetes
 
Unix and Linux Operating Systems
Unix and Linux Operating SystemsUnix and Linux Operating Systems
Unix and Linux Operating Systems
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Preparing a Dataset for Processing
Preparing a Dataset for ProcessingPreparing a Dataset for Processing
Preparing a Dataset for Processing
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 

Recently uploaded

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

Setting up a HADOOP 2.2 cluster on CentOS 6

  • 1. Setting up a HADOOP 2.2 Cluster on RHEL / CentOS 6 This article presents steps to create a HADOOP 2.2 cluster on VMware workstation 8/9/10. Following is an outline of the installation process. 1. Clone and configure Virtual Machines for setup 2. Install and configure Java and HADOOP software on Master node 3. Copy Master node VM configuration to slave nodes Let us start with the cluster configuration. We need at least 3 Virtual Machines. 1 Master node, and 2 Slave nodes. All VMs have similar configuration, as follows. Processor – 2 CPU (dual core) RAM – 2 GB HDD – 100 GB NIC – Virtual NIC Virtual Machine (VM) Configuration Create a virtual machine and install RHEL 6.2 on it. Following is the initial configuration done for this VM. Hostname node1 IP Address 192.168.1.15 MAC Address 00:0C:29:11:66:D3 Subnet mask 255.255.255.0 Gateway 192.168.1.1 After configuring these settings, make a copy of it that will be utilized for other virtual machines. To make VMs unique, prior to cloning a VM, change its MAC address and after booting, configure the IP addresses as per following table. Step 1– Clone and configure Virtual Machines for setup Machine Role MAC Address IP Address Hostname HADOOP Master Node 00:0C:29:11:66:D3 192.168.1.15 master1 HADOOP Slave Node 1 00:50:56:36:EF:D5 192.168.1.16 slave1 HADOOP Slave Node 2 00:50:56:3B:2E:64 192.168.1.17 slave2 After setting up the first virtual machine, we may need to configure initial settings, as per following details.
  • 2. 1. Disabling SELinux 2. Disabling Firewall 3. Host names, IP addresses and MAC addresses A record of above is good to be kept for ready reference, as given in the table above. Configure Hosts for IP network communication # vim /etc/hosts 192.168.1.15 master1 192.168.1.16 slave1 192.168.1.17 slave2 Create a user hadoop with password-less authentication A user called hadoop is created and we have to login as "hadoop" for all configuration and management of HADOOP cluster. # useradd hadoop # passwd hadoop su - hadoop ssh-keygen -t rsa ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master1 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1 ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave2 chmod 0600 ~/.ssh/authorized_keys exit Download Java binaries Let us see installing Java from a tar file obtained from oracle.com, unlike the rpm method. # wget http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux- i586.tar.gz?AuthParam=1386669648_7d41138392c2fe62a5ad481d4696b647 Java Installation using tarball Java is a prerequisite for installing HADOOP on any system. Recommended java versions are given for HADOOP on Apache foundation website. We should go with the recommended versions. Following steps explain installation of Java on Linux using a tarball. cd /opt/ tar xvf JDK_7u45_tar/jdk-7u45-linux-i586.tar.gz cd jdk1.7.0_45/
  • 3. alternatives --install /usr/bin/java java /opt/jdk1.7.0_45/bin/java 2 alternatives --config java Output [root@master1 opt]# cd jdk1.7.0_45/ [root@master1 jdk1.7.0_45]# alternatives --install /usr/bin/java java /opt /jdk1.7.0_45/bin/java 2 [root@master1 jdk1.7.0_45]# alternatives --config java There are 3 programs which provide 'java'. Selection Command ----------------------------------------------- *+ 1 /usr/lib/jvm/jre-1.6.0-openjdk/bin/java 2 /usr/lib/jvm/jre-1.5.0-gcj/bin/java 3 /opt/jdk1.7.0_45/bin/java Enter to keep the current selection[+], or type selection number: 3 [root@master1 jdk1.7.0_45]# ll /etc/alternatives/java lrwxrwxrwx 1 root root 25 Dec 10 16:03 /etc/alternatives/java -> /opt/jdk1.7.0_4 5/bin/java [root@master1 jdk1.7.0_45]# [root@master1 jdk1.7.0_45]# java -version java version "1.7.0_45" Java(TM) SE Runtime Environment (build 1.7.0_45-b18) Java HotSpot(TM) Client VM (build 24.45-b08, mixed mode) [root@master1 jdk1.7.0_45]# export JAVA_HOME=/opt/jdk1.7.0_45/ [root@master1 jdk1.7.0_45]# export JRE_HOME=/opt/jdk1.7.0_45/jre [root@master1 jdk1.7.0_45]# export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin [root@master1 jdk1.7.0_45]# Configure Java PATH export JAVA_HOME=/opt/jdk1.7.0_45/ export JRE_HOME=/opt/jdk1.7.0_45/jre export PATH=$PATH:/opt/jdk1.7.0_45/bin:/opt/jdk1.7.0_45/jre/bin After installing Java, its path need to be persistent across reboots. The above setting can be appended to /etc/profile so that it is common to all users.
  • 4. Installing HADOOP binaries The "/opt" directory in Linux is provided for 3rd party applications. # cd /opt/ [root@master1 hadoop]# wget http://hadoop-2.2.....tar.gz # tar -xzf hadoop-2.2....tar.gz # mv hadoop-2.2.0... hadoop # chown -R hadoop /opt/hadoop # cd /opt/hadoop/hadoop/ cd /opt Tar -zxvf hadoop.2.2.tar [root@master1 ~]# ll /opt/ total 12 drwxr-xr-x 11 hadoop hadoop 4096 Jun 26 02:31 hadoop [hadoop@master1 ~]$ ll /opt/hadoop/ total 2680 drwxr-xr-x 2 hadoop hadoop 4096 Jun 27 02:14 bin drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 etc -rwxrw-rw- 1 hadoop hadoop 2679682 Jun 26 02:29 hadoop-test.jar drwxr-xr-x 2 hadoop hadoop 4096 Oct 6 2013 include drwxr-xr-x 3 hadoop hadoop 4096 Oct 6 2013 lib drwxr-xr-x 2 hadoop hadoop 4096 Jun 12 09:52 libexec -rw-r--r-- 1 hadoop hadoop 15164 Oct 6 2013 LICENSE.txt drwxrwxr-x 3 hadoop hadoop 4096 Jun 27 02:38 logs -rw-r--r-- 1 hadoop hadoop 101 Oct 6 2013 NOTICE.txt -rw-r--r-- 1 hadoop hadoop 1366 Oct 6 2013 README.txt drwxr-xr-x 2 hadoop hadoop 4096 May 18 04:55 sbin drwxr-xr-x 4 hadoop hadoop 4096 Oct 6 2013 share drwxrwxr-x 4 hadoop hadoop 4096 Jun 26 20:47 tmp Configure hadoop cluster setup using these steps on all nodes: Login as user hadoop and edit '~/.bashrc' as follows. [hadoop@master1 ~]$ pwd /home/hadoop [hadoop@master1 ~]$ cat .bashrc # .bashrc
  • 5. # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # User specific aliases and functions export JAVA_HOME=/opt/jdk1.7.0_60 export HADOOP_INSTALL=/opt/hadoop export HADOOP_PREFIX=/opt/hadoop export HADOOP_HOME=/opt/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop [hadoop@master1 ~]$ Configuring HADOOP, starting, and viewing status Change folder to /opt/hadoop/hadoop/etc/hadoop Edit 'hadoop-env.sh' and set proper value for JAVA_HOME such as '/opt/jdk1.7.0_40'. Do not leave it as ${JAVA_HOME} as that does not works. [hadoop@master1 ~]$ cd /opt/hadoop/etc/hadoop/ [hadoop@master1 hadoop]$ cat hadoop-env.sh export JAVA_HOME=/opt/jdk1.7.0_60 Edit '/opt/hadoop/hadoop/libexec/hadoop-config.sh' and prepend following line at start of script: export JAVA_HOME=/opt/jdk1.7.0_60 Create Hadoop tmp directory Use 'mdkir /opt/hadoop/tmp' Edit 'core-site.xml' and add following between <configuration> and </configuration>: [hadoop@master1 hadoop]$ cat core-site.xml <configuration> <property>
  • 6. <name>fs.default.name</name> <value>hdfs://master1:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> </configuration> Setup folders for HDFS cd ~ mkdir -p mydata/hdfs/namenode mkdir -p mydata/hdfs/datanode cd /opt/hadoop/hadoop/etc/hadoop Edit 'hdfs-site.xml' <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/mydata/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/mydata/hdfs/datanode</value> </property> <property> <name>dfs.hosts</name> <value>/opt/hadoop/etc/hadoop/dfs.include</value> </property> </configuration> Copy mapred-site.xml template using 'cp mapred-site.xml.template mapred-site.xml' Edit 'mapred-site.xml' as following: [hadoop@master1 hadoop]$ cat mapred-site.xml
  • 7. <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> Edit 'yarn-site.xml' and as following [hadoop@master1 hadoop]$ cat yarn-site.xml <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master1:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master1:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master1:8040</value> </property> </configuration> Copy Master node VM configuration to slave nodes Format all namenodes master, slave1, slave2, etc. using 'hdfs namenode -format' Do following only on master machine: Edit 'slaves' files so that it contains: slave1 slave2 Note : If master is also expected to serve as datanode (store hdfs files) then add 'master' to the slaves file as well.
  • 8.  Run 'start-dfs.sh' and 'start-yarn.sh' commands  Run 'jps' and verify on master 'ResourceManager', 'NameNode' and 'SecondaryNameNode' are running.  Run 'jps' on slaves and verify that 'NodeManager' and 'DataNode' are running. To stop all HADOOP services, run the following command: Run 'stop-dfs.sh' and 'stop-yarn.sh' commands Web Access URLs for Services After starting HADOOP services, you can view and monitor their status using following URLs. Access NameNode at http://master1:50070 and ResourceManager at http://master1:8088