2. Profile
Ankit Desai
Ph.D. Scholar, IET, Ahmedabad University
Education: M. Tech. (C.E.), B. E. (I. T.)
Experience: 8 years (Academic and Research)
Research Interest: IoT, Big Data Analytics,
Machine Learning, Data Mining, Algorithms.
3. Install Ubuntu
Ubuntu 14.04.2 LTS
Download Source
http://www.ubuntu.com/download/desktop
64 bit OS v/s 32 bit OS
ubuntu-14.04.2-desktop-amd64.iso file – 64bit
Or
ubuntu-14.10-desktop-i386.iso file – 32bit
4. Download Java
Java 7
http://www.oracle.com/technetwork/java/javase/d
ownloads/jdk7-downloads-1880260.html
Java 6
http://www.oracle.com/technetwork/java/javase/d
ownloads/java-archive-downloads-javase6-
419409.html
x86 or x64 bit as per your computer
configurations
Download:
For 7: jdk-7u75-linux-i586.tar.gz or jdk-7u75-linux-
5. Cont…
Extract the jdk file
Open the terminal
$cd /user/lib (make new dir java)
$ sudo mkdir java
Move jdk folder from Download to usr/lib/java
$sudo mv jdk1.7.0_67/ /usr/lib/java
7. Cont…
Check java version
$java –version
Set env. Variable JAVA_HOME in .bashrc file
$gedit ~/.bashrc file
In .bashrc file
export
JAVA_HOME=”/usr/lib/java/jdk1.7.0_67”
set PATH=”$PATH:$JAVA_HOME/bin”
export PATH
save & exit
8. Create hduser
Create user group
$sudo addgroup hadoop
Create user hduser
$ sudo adduser --ingroup hadoop hduser
Login to hduser
user@ubuntu:~$ su – hduser
9. Working with SSH
hduser@ubuntu: ~$ ssh (should give you path of
ssh), if not then type $sudo apt-get install
ssh
hduser@ubuntu: ~$ sshd (should give you path
of sshd), if not then type user@ubuntu:~$ sudo
apt-get install openssh-server
Generate public and private key pair:
hduser@ubuntu:~$ ssh-keygen -t rsa -P “”
hduser@ubuntu:~$ cat
/home/hduser/.ssh/id_rsa.pub >>
/home/hduser /.ssh/authorized_keys
10. Continue…
Add user to authenticated user.
hduser@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be
established. RSA key fingerprint is
d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting
(yes/no)? yes Warning: Permanently added
'localhost' (RSA) to the list of known hosts. Linux
ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed
Apr 28 13:27:30 UTC 2010 i686 GNU/Linux Ubuntu
10.04 LTS [...snipp...]
11. Disable IPv6
open /etc/sysctl.conf file with
hduser@ubuntu:~$ sudo gedit /etc/sysctl.conf
You have to reboot your machine in order to make the
changes take effect.
You can check whether IPv6 is enabled on your
machine with the following command:
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6
12. Install Hadoop
Download Hadoop 1.0.3 from
https://archive.apache.org/dist/hadoop/core/hadoop
-1.0.3/
hadoop-1.0.3.tar.gz 2012-05-08 20:35 60M
Navigate to: /usr/local/hadoop
$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop
13. Cont…
hadoop-env.sh
Write line:
export JAVA_HOME=”/usr/lib/java/jdk1.7.0_67”
export HADOOP_HOME_WARN_SUPPRESS=”TRUE”
Edit ~/.bashrc
Add following
export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=”/usr/lib/java/jdk1.7.0_67”
set PATH=”$PATH:$JAVA_HOME/bin”
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
export PATH=$PATH:$HADOOP_HOME/bin
export PATH
14. Cont…
conf/*-site.xml
Create dir /app/hadoop/tmp
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop
/app/hadoop/tmp
# ...and if you want to tighten up
security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp
15. Conf files
Add the following snippets between the <configuration>
... </configuration> tags in the respective configuration
XML file.
16. conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.
</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A
URI whose scheme and authority determine the FileSystem
implementation. The uri's scheme determines the config
property (fs.SCHEME.impl) naming the FileSystem
implementation class. The uri's authority is used to determine
the host, port, etc. for a filesystem.
</description>
</property>
20. Formatting the HDFS filesystem
via the NameNode
hduser@ubuntu:~$
/usr/local/hadoop/bin/hadoop
namenode –format
10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************ STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG:
version = 0.20.2 STARTUP_MSG: build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by
'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/ 10/05/08 16:59:56 INFO
namenode.FSNamesystem: fsOwner=hduser,hadoop 10/05/08 16:59:56 INFO
namenode.FSNamesystem: supergroup=supergroup 10/05/08 16:59:56 INFO
namenode.FSNamesystem: isPermissionEnabled=true 10/05/08 16:59:56 INFO
common.Storage: Image file of size 96 saved in 0 seconds. 10/05/08 16:59:57 INFO
common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully
formatted. 10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************ SHUTDOWN_MSG: Shutting down
NameNode at ubuntu/127.0.1.1 ************************************************************/
hduser@ubuntu:/usr/local/hadoop$
21. Starting your single-node
cluster
hduser@ubuntu:~$
/usr/local/hadoop/bin/start-all.sh
starting namenode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-
ubuntu.out localhost: starting datanode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-
ubuntu.out localhost: starting secondarynamenode,
logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-
secondarynamenode-ubuntu.out starting jobtracker,
logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-
jobtracker-ubuntu.out localhost: starting tasktracker,
logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-
tasktracker-ubuntu.out
hduser@ubuntu:/usr/local/hadoop$
24. Hadoop Web Interfaces
http://localhost:50070/ – web UI of the
NameNode daemon
http://localhost:50030/ – web UI of the JobTracker
daemon
http://localhost:50060/ – web UI of the
TaskTracker daemon
30. Making single node cluster
Marge two single node cluster in to multi-node
cluster.
One will become designated master
Will also work as slave (will store and process the data as
well)
Pseudo distributed cluster
Another will become slave only
31. Prerequisites
Configuring single-node clusters first
Copy the Ubuntu install folder and paste it.
(Replication of the same VM)
Make sure your ubuntu system uses DHCP
settings/ reasonably considerable settings for
network setup.
32. Change Host-names
Change hostnames of each systems
Login to each system using hduser@ubuntu$ and
open file /etc/hosts
Find system’s ipv4 address using command ifconfig
Make and entry of ip address and host name of
master and slave both on both systems as follows.
Command:
sudo gedit /etc/hosts
33. Also change
/etc/hostname
On master
master
On slave
slave
To take its effect run:
Run sudo /etc/init.d/hostname restart or sudo
service hostname restart
34. Verification
Exit the terminal atleast once to see the effects:
From:
hduser@ubuntu$
To:
hduser@master$ on master side
hduser@slave$ on slave side
35. SSH access
Distribute the SSH public key of hduser@master
Command:
hduser@master:~$ ssh-copy-id -i
$HOME/.ssh/id_rsa.pub hduser@slave
Above cmd will copy id_rsa.pub on hduser@slave
36. SSH Login
So, connecting from master to master…
Command:
hduser@master:~$ ssh master
Sample output:
hduser@master:~$ ssh master The authenticity of
host 'master (192.168.0.1)' can't be established.
RSA key fingerprint is
3b:21:b3:c0:21:5c:7c:54:2f:1e:2d:96:79:eb:7f:95.
Are you sure you want to continue connecting
(yes/no)? yes Warning: Permanently added 'master'
(RSA) to the list of known hosts. Linux master
2.6.20-16-386 #2 Thu Jun 7 20:16:13 UTC 2007
i686 ...
hduser@master:~$
37. SSH Login
…and from master to slave.
Command:
hduser@master:~$ ssh slave
Sample output:
The authenticity of host 'slave (192.168.0.2)' can't
be established. RSA key fingerprint is
74:d7:61:86:db:86:8f:31:90:9c:68:b0:13:88:52:72.
Are you sure you want to continue connecting
(yes/no)? yes Warning: Permanently added 'slave'
(RSA) to the list of known hosts. Ubuntu 10.04 ...
hduser@slave:~$
38. Only on master side
Update /usr/local/hadoop/conf/masters file as
follow
39. Only on master side
Update /usr/local/hadoop/conf/slaves file as
follow
If you have more than one slaves then…
43. Run name-node format (critical)
hduser@master:/usr/local/hadoop$ bin/hadoop
namenode -format
... INFO dfs.Storage: Storage directory
/app/hadoop/tmp/dfs/name has been successfully
formatted.
hduser@master:/usr/local/hadoop$
44. Start multi-node Cluster
hduser@master:/usr/local/hadoop$
bin/hadoop/start-all.sh
starting namenode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-
master.out
slave: Ubuntu 10.xx
slave: starting datanode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-
slave.out
master: starting datanode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-
master.out
master: starting secondarynamenode, logging to
/usr/local/hadoop/bin/../logs/hadoop-hduser-
secondarynamenode-master.out
hduser@master:/usr/local/hadoop$
45. Common Errors
After some duration… shutdown of Datanode
automatically…
Fix
1. Restart Hadoop
2. Go to /app/hadoop/tmp/dfs/name/current
3. Open VERSION (i.e. by vim VERSION)
4. Record namespaceID
5. Go to /app/hadoop/tmp/dfs/data/current
6. Open VERSION (i.e. by vim VERSION)
7. Replace the namespaceID with the namespaceID
you recorded in step 4.
46. Common Errors
$HADOOP_HOME is deprecated.
Fix
Try setting
export HADOOP_HOME_WARN_SUPPRESS="TRUE" in
my conf/hadoop-env.sh file
A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want).
(Just to give you the idea, YMMV – personally, I create a symlink from hadoop-1.0.3 to hadoop.)
export HADOOP_HOME_WARN_SUPPRESS=”TRUE” (to solve the warning given by $hadoop version, warning: hadoop-home is deprecated
If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section).
Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)!
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine.
This will not work on vmware due to our copy paste operation of the same system. It may say that same file exist on the system folder.
The default value of dfs.replication is 3. However, we have only two nodes available, so we set dfs.replication to 2.