Hadoop single node installation on ubuntu 14

HADOOP SINGLE NODE INSTALLATION ON UBUNTU 14.04
* Java (version 1.6.0 or above) should be installed
PREREQUISITES
[ If java is not installed you can try any of these methods to install java
Method 1:
sudo apt-get install openjdk-8-jdk
To install the openJDK JDK and JRE 8 use (replace 8 with the version
you want, such as 7 or 6):
Method 2:
sudo add-apt-repository ppa:webupd8team/java
If you instead want to install the official Oracle JDK and JRE and
definitely want to install through apt-get then do (you can replace the 8 with
other versions such as 9, and 7):
sudo apt-get update
sudo apt-get install oracle-java8-installer ]
* SSH should be installed and sshd must be running.
[ If ssh is not installed, you can run the following command to install it
sudo apt-get install openssh-server
check ssh using the following commands after installing
which ssh
output should be -/usr/bin/ssh
which sshd
output should be -/usr/sbin/sshd
HADOOP USER CREATION
user@node:~$ sudo addgroup hadoop
[sudo] password for user:
Adding group `hadoop' (GID 1001) ...
Done.
user@node:~$ sudo adduser --ingroup hadoop hdpuser
Adding user `hdpuser' ...
Adding new user `hdpuser' (1001) with group `hadoop' ...
Creating home directory `/home/hdpuser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hdpuser
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]

SWITCH TO SUPER USER TO ADD HADOOP USER TO SUDOERS GROUP
Switch to root user - su root
Add the hadoop user to sudoers list by additing the below entry in the file /etc/sudoers
hadpuser ALL=(ALL:ALL) ALL
(under # User privilege specification
root ALL=(ALL:ALL) ALL )
Switch to hadoop user - su hadoop
VERIFY JAVA INSTALLATION
hdpuser@node:~$ java -version
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
hdpuser@node:~$ update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-7-oracle/jre/bin/java 1072 auto mode
1 /usr/lib/jvm/java-7-openjdk-amd64/jre/bin/java 1071 manual mode
* 2 /usr/lib/jvm/java-7-oracle/jre/bin/java 1072 manual mode
Press enter to keep the current choice[*], or type selection number:
hdpuser@node:~$
Add the below entry in the ~/.bashrc file
UPDATE JAVA VARIABLES IN THE ~/.BASHRC FILE
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin
source the .bashrc file using the command
source .bashrc
hdpuser@node:~$ which ssh
VERIFY SSH INSTALLATION
/usr/bin/ssh
hdpuser@node:~$ which sshd
/usr/sbin/sshd
hdpuser@node:~$ ssh-keygen -t rsa -P ""
SSH KEY GENERATION
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hdpuser/.ssh/id_rsa):

Created directory '/home/hdpuser/.ssh'.
Your identification has been saved in /home/hdpuser/.ssh/id_rsa.
Your public key has been saved in /home/hdpuser/.ssh/id_rsa.pub.
The key fingerprint is:
da:4c:9a:89:bb:02:ac:7e:00:70:16:11:bc:fa:49:5e hdpuser@node
The key's randomart image is:
+--[ RSA 2048]----+
| .++ |
|. + |
|.o . |
|. . |
|o. S |
|oo. E. O |
|.=.o. = o |
|. =. . |
|....o. |
+-----------------+
hdpuser@node:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
DOWNLOADING AND INSTALLING HADOOP
[ Hadoop can be downloaded using the below link if you don't have the pac kage in your system
wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz]
hdpuser@node:~$ cd /home/user/Documents/
hdpuser@node:/home/user/Documents$ sudo mv hadoop-2.6.0.tar.gz /usr/local/
[sudo] password for hdpuser:
hdpuser@node:/home/user/Documents$ cd /usr/local/
hdpuser@node:/usr/local$ sudo tar xvzf hadoop-2.6.0.tar.gz
hdpuser@node:/usr/local$ sudo chown -R hdpuser:hadoop hadoop-2.6.0
hdpuser@node:/usr/local$ sudo ln -s hadoop-2.6.0 hadoop e
Add the below entry in the ~/.bashrc file and source the .bashrc file
export HADOOP_HOME=/usr/local/hadoop
hdpuser@node:/usr/local$ hadoop version
Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /usr/local/hadoop-2.6.0/share/hadoop/common/hadoop-common-
2.6.0.jar

SETTING UP HADOOP ENVIRONMENT VARIABLES
— You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.
— export JAVA_HOME=/usr/lib/jvm/java-7-oracle
— export HADOOP_HOME=/usr/local/hadoop
— export HADOOP_MAPRED_HOME=$HADOOP_HOME
— export HADOOP_COMMON_HOME=$HADOOP_HOME
— export HADOOP_HDFS_HOME=$HADOOP_HOME
— export YARN_HOME=$HADOOP_HOME
— export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
— export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
— export HADOOP_INSTALL=$HADOOP_HOME
— Now apply all the changes into the current running system.
$ source ~/.bashrc
— Next we need to configure some of the Hadoop files, namely:
HADOOP CONFIGURATION
— hadoop-env.sh
— core-site.xml
— hdfs-site.xml
— mapred-site.xml
These files are located in $HADOOP_HOME/etc/hadoop
— hadoop-env.sh
— In this file, add the following line to define the Java home
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
— mapred-site.xml
— This file may not be present with the same name. In that case we need to
first copy this file from the template file
— cp mapred-site.xml.template mapred-site.xml
— Then add the following property within the configuration tabs
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If "local", then
jobs are run in-process as a single map and reduce task.
</description>
</property>
— core-site.xml
— Add the following property in the configuration tabs
<property>
<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose scheme and
authority determine the FileSystem implementation. The uri's scheme determines
the config property (fs.SCHEME.impl) naming the FileSystem implementation
class.The uri's authority is used to determine the host, port, etc. for a
filesystem.</description>
</property>
— hdfs-site.xml
— We need to create a couple of directories that would be used by the
namenode and the datanode in the Hadoop cluster.
— $ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
— $ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hdpuser:hadoop /usr/local/hadoop_store
— Next we add the following properties within the configuration tabs
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
— Once the Hadoop configuration is over, we need to format the Namenode.
FORMATTING THE NAMENODE
— The Hadoop system can be formatted by the following command:
— hadoop namenode –format
— The Namenode should be successfully formatted before proceeding further.
— Next we need to start the Hadoop Daemons which run as individual Java services.
START THE HADOOP DAEMONS
— Hadoop provides a set of scripts to start and stop the Daemons.
— To start the DFS Daemons, issue the following command in the terminal:
— start-dfs.sh

— To start the Yarn Daemons, issue the following command in the terminal:
— start-yarn.sh
— Hadoop installation can be verified by checking if all the Daemons are running
successfully.
VERIFYING HADOOP INSTALLATION
— Since all the Daemons are Java processes, issue the following command on the terminal:
— $ jps
— It should list the following processes:
— Namenode
— SecondaryNamenode
— Datanode
— NodeManager
— ResourceManager
— Hadoop Namenode and ResourceManager can be monitored using the web interfaces.
HADOOP WEB INTERFACES
— Usually used by Hadoop Administrators.
— For NameNode:
— http://HadoopMaster:50070
— For ResourceManger:
— For Secondary NameNode:
— For DataNode:
Prepared by
Jiju K Joseph, AP/CSE
Asan Memorial College of Engg. & Tech

Hadoop single node installation on ubuntu 14

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Hadoop single node installation on ubuntu 14

Similar to Hadoop single node installation on ubuntu 14 (20)

Recently uploaded

Recently uploaded (20)

Hadoop single node installation on ubuntu 14