SlideShare a Scribd company logo
1 of 42
Download to read offline
Single-Node Hadoop Cluster Installation 
Presented By: 
Mahantesh Angadi, Nagarjuna D. N., Manoj P. T. 
2nd Sem Mtech-CNE (2014) 
Dept. of ISE, AIT 
Under The Guidance of: 
Manjunath T. N. 
Amogh P. K. 
Assistant Professor 
Dept. of ISE, AIT
OUTLINE 
•Requirements for Java & Hadoop installation 
•Jdk Installation steps 
•Hadoop installation steps
How to Install A Single-Node Hadoop Cluster 
•Assumptions 
oYou are running 32-bit windows 
oYour laptops has 4GB or more of RAM 
•Downloads 
oVMware Workstation-10 or more 
oUbuntu 10 or more 
oJava JDK 1.5 0r more(E.g. JDK 1.7) 
oHadoop 1.2.1 or more
•Instructions to Install Hadoop 
1. Install VMWare Workstation 
2. Create a new Virtual machine 
3. Point the installer disc image to the ISO file (E.g. Ubuntu 10) that you are downloaded 
4. Give the User name & Password (E.g. hduser for both) 
5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Host machine) 
6. Customize hardware 
a. Memory: 2GB RAM (more is better, but you want to leave some for your Host(Windows) machine) 
b. Processors: 2(more is better, but you wanted to leave some for your Host(Windows) machine)
7. Launch your Virtual machine (all the instructions after this step will be performed in Ubuntu) 
8. Login to User (E.g. hduser) 
9. Open a terminal window with Ctrl + Alt + T (you will use this shortcut a lot) 
• Type following commands in the terminal to download recent linux packages(needs internet connections)
$ sudo apt-get update
JDK Installation Steps 
$ sudo apt-get install openssh-server(recommends while connecting to localhost)
10. Install Java JDK 7 
a. Download the java JDK (http://www.wikihow.com/Install-Oracle-Java-JDK-on-Ubuntu-Linux) 
b. Unzip the file 
$ tar –xvf jdk-7u25-linux-i586.tar.gz (or) tar xzf jdk-7u25- linux-i586.tar.gz
•Now move the JDK 7 directory to /usr/lib/java (you suppose to create java folder in lib (your choice of location) directory) 
$ sudo mkdir –p /usr/lib/java 
•Now move from Download/Desktop folder to Java folder using terminal
•$ sudo cp -r jdk1.7.0_25 /usr/lib/java/
c. Do the following steps 
Edit the system PATH file /etc/profile and add the following system variables to your system path. Use nano, gedit or any other text editor, as root, open up /etc/profile. 
•Type/Copy/Paste: $ sudo gedit /etc/profile 
or 
•Type/Copy/Paste: $ sudo nano /etc/profile
•Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file: 
Type/Copy/Paste: JAVA_HOME=/usr/lib/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH
•Change JDK to the version you are going to be installed 
Save(CTRL+X & Y & ENTER for nano) the /etc/profile file and exit.
d. Now run 
•$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_25/bin/java" 1 
oThis command notifies the system that Oracle Java JRE is available for use 
• $ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_25/bin/javac" 1 
oThis command notifies the system that Oracle Java JDK is available for use 
•$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_25/bin/javaws" 1 
oThis command notifies the system that Oracle Java Web start is available for use
Your Ubuntu Linux system that Oracle Java JDK/JRE must be the default Java. 
•Type/Copy/Paste: $ sudo update-alternatives --set java /usr/lib/java/jdk1.7.0_25/bin/java 
othis command will set the java runtime environment for the system 
•Type/Copy/Paste: $ sudo update-alternatives --set javac /usr/lib/java/jdk1.7.0_25/bin/javac 
othis command will set the javac compiler for the system 
•Type/Copy/Paste:$ sudo update-alternatives --set javaws /usr/lib/java/jdk1.7.0_25/bin/javaws 
othis command will set Java Web start for the system
•A successful installation of 32-bit Oracle Java will display: 
Type/Copy/Paste: $ java -version 
oThis command displays the version of java running on your system 
You should receive a message which displays: 
Java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.5_25-b18) Java HotSpot(TM) Server VM (build 24.25-b08, mixed mode) 
Type/Copy/Paste: $ javac -version 
oThis command lets you know that you are now able to compile Java programs from the terminal. 
You should receive a message which displays: 
javac 1.7.0_25
•Successful Java installation displays 
 
“Congratulations you are successfully installed Java JDK”
Hadoop Installation Steps 
Prerequisites 
•Configure JDK: 
oSun Java JDK is compulsory to run hadoop, therefore all the nodes in hadoop cluster should have JDK configured. Ex:-jdk 1.5 & above ( preference:- jdk-7u25-linux-i586.tar.gz) 
•Download hadoop package: 
Ex:- hadoop-1.2.1-bin.tar.gz 
•NOTE: 
In a multi-node hadoop cluster, the master node uses Secure Shell (SSH) commands 
to manipulate the remote nodes. This requires all the nodes must have the same version of JDK and hadoop core. If the versions among nodes are different, errors will occur when you start the cluster.
Adding a dedicated Hadoop system user 
•We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc). 
oThis will add the user hduser and the group hadoop to your local machine. 
$su - hduser 
oThis will change to hduser 
$ sudo addgroup hadoop 
$ sudo adduser --ingroup hadoop hduser
Configuring SSH 
•Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short hadoop installation tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section. 
•we assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. 
•First, we have to generate an SSH key for the hduser user. 
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
hduser@ubuntu:~$ ssh-keygen -t rsa -P "" 
Private Key 
Public Key
•Second, you have to enable SSH access to your local machine with this newly created key. 
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
•The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. 
•If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information). 
•hduser@ubuntu:~$ ssh localhost 
Are you sure you want to continue connecting (yes/no)? yes
•If the SSH connect should fail, these general tips will help:- 
•Enable debugging with ssh -vvv localhost and investigate the error in detail. 
•Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload. 
•Successful connection to localhost diplays:
Disabling IPv6 
•One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of our Ubuntu box. In our case, we realized that there’s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, we simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. 
•To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file: 
# disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1 
/etc/sysctl.conf
•You have to reboot your machine in order to make the changes take effect. 
•You can check whether IPv6 is enabled on your machine with the following command: 
•A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want). 
Alternative 
•You can also disable IPv6 only for Hadoop as documented in HADOOP. You can do so by adding the following line to : 
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6 
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true 
conf/hadoop-env.sh
Hadoop Installation 
•Download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. we picked /usr/local/hadoop. 
Update $HOME/.bashrc 
•Add the following lines to the end of the $HOME/.bashrc file of user hduser. If you use a shell other than bash, you should of course update its appropriate configuration files instead of .bashrc. 
$ cd /usr/local$ sudo tar xzf hadoop-1.0.3.tar.gz 
$ sudo mv hadoop-1.0.3 hadoop 
$ sudo chown -R hduser:hadoop hadoop-1.2.1
Copy n paste it in $HOME/.bashrc and edit to your requirements 
# Set Hadoop-related environment variables 
export HADOOP_HOME=/usr/local/hadoop (edit here) 
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) 
export JAVA_HOME=/usr/lib/jvm/java-6-sun(edit here) 
# Some convenient aliases and functions for running Hadoop-related commands 
unalias fs &> /dev/null 
alias fs="hadoop fs“ 
unalias hls &> /dev/null 
alias hls="fs -ls" 
# If you have LZO compression enabled in your Hadoop cluster and 
# compress job outputs with LZOP (not covered in this tutorial): 
# Conveniently inspect an LZOP compressed filem from the command 
# line; run via: 
# 
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo 
# 
# Requires installed 'lzop' command. 
#lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less} 
# Add Hadoop bin/ directory to PATH 
export PATH=$PATH:$HADOOP_HOME/bin
•The following picture gives an overview of the most important HDFS components.
Configuration 
•The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME environment variable to the Sun JDK/JRE 6 directory. 
Change 
to 
# The java implementation to use. Required. 
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun 
# The java implementation to use. Required. 
export JAVA_HOME=/usr/lib/java/ jdk1.7.0_25 
conf/hadoop-env.sh
•You can leave the settings below “as is” with the exception of the hadoop.tmp.dir parameter – this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don’t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point. 
•Now we create the directory and set the required ownerships and permissions: 
$ sudo mkdir -p /app/hadoop/tmp 
$ sudo chown hduser:hadoop /app/hadoop/tmp 
# ...and if you want to tighten up security, chmod from 755 to 750... 
$ sudo chmod 750 /app/hadoop/tmp
•If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section). 
•Add the following snippets between the <configuration> ... </configuration> tags in the respective configuration XML file. 
•In file conf/core-site.xml: conf/core-site.xml 
<property> 
<name>hadoop.tmp.dir</name> 
<value>/app/hadoop/tmp</value> 
<description>A base for other temporary directories.</description> 
</property> 
<property> 
<name>fs.default.name</name> 
<value>hdfs://localhost:54310</value> 
<description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> 
</property>
•In file conf/hdfs-site.xml: conf/hdfs-site.xml 
<property> 
<name>dfs.replication</name> 
<value>1</value> 
<description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> 
</property>
•In file conf/mapred-site.xml: conf/mapred-site.xml 
<property> 
<name>mapred.job.tracker</name> 
<value>localhost:54311</value> 
<description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> 
</property>
Formatting the HDFS filesystem via the NameNode 
•The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster. 
•Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)! 
•To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command 
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
•The output will look like this: 
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu/127.0.1.1STARTUP_MSG: args = [- format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1************************************************************/hduser@ubuntu:/usr/local/hadoop$
Starting your single-node cluster 
•Run the command: 
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh 
•This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine. 
•The output will look like this:
•A nifty tool for checking whether the expected Hadoop processes are running is jps (part of Sun’s Java since v1.5.0 or more). 
hduser@ubuntu:/usr/local/hadoop$ jps 
•Stopping your single-node cluster 
Run the command 
hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh 
•to stop all the daemons running on your machine.
Hadoop Web Interfaces 
•Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at these locations: 
http://localhost:50070/ – web UI of the NameNode daemon 
http://localhost:50030/ – web UI of the JobTracker daemon 
http://localhost:50060/ – web UI of the TaskTracker daemon 
•These web interfaces provide concise information about what’s happening in your Hadoop cluster. You might want to give them a try. 
•Where 
o50070- namenode port number 
o50030-jobtracker port number 
o50060-tasktracker port number 
•Type the links In local browser to see the hadoop setup output
Single node hadoop cluster installation

More Related Content

What's hot

Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2IMC Institute
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsSruthi Kumar Annamnidu
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseShiva Rama Krishna Dasharathi
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013grim_radical
 
Configuration Surgery with Augeas
Configuration Surgery with AugeasConfiguration Surgery with Augeas
Configuration Surgery with AugeasPuppet
 
How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubTiago Simões
 
How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoringTiago Simões
 
Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe BookTim Riley
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...Tiago Simões
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Denish Patel
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Denish Patel
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294IkiArif1
 

What's hot (17)

Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013
 
Configuration Surgery with Augeas
Configuration Surgery with AugeasConfiguration Surgery with Augeas
Configuration Surgery with Augeas
 
How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHub
 
How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoring
 
Making Your Capistrano Recipe Book
Making Your Capistrano Recipe BookMaking Your Capistrano Recipe Book
Making Your Capistrano Recipe Book
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...How to create a multi tenancy for an interactive data analysis with jupyter h...
How to create a multi tenancy for an interactive data analysis with jupyter h...
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
 

Viewers also liked

Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceJUNWEI GUAN
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpaceAndrei Savu
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and ClouderaJoey Echeverria
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013Cloudera Japan
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013Cloudera Japan
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 

Viewers also liked (20)

Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business Intelligence
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpace
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Cloudera Manager 5 (hadoop運用) #cwt2013
Cloudera Manager 5 (hadoop運用)  #cwt2013Cloudera Manager 5 (hadoop運用)  #cwt2013
Cloudera Manager 5 (hadoop運用) #cwt2013
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 

Similar to Single node hadoop cluster installation

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop InstallationAhmed Salman
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Manish Chopra
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a clusterXuhong Zhang
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linuxVicent Selfa
 

Similar to Single node hadoop cluster installation (20)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
Hadoop Installation
Hadoop InstallationHadoop Installation
Hadoop Installation
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6Setting up a HADOOP 2.2 cluster on CentOS 6
Setting up a HADOOP 2.2 cluster on CentOS 6
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
Linux
LinuxLinux
Linux
 
Install hadoop in a cluster
Install hadoop in a clusterInstall hadoop in a cluster
Install hadoop in a cluster
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linux
 

Recently uploaded

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 

Recently uploaded (20)

Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 

Single node hadoop cluster installation

  • 1. Single-Node Hadoop Cluster Installation Presented By: Mahantesh Angadi, Nagarjuna D. N., Manoj P. T. 2nd Sem Mtech-CNE (2014) Dept. of ISE, AIT Under The Guidance of: Manjunath T. N. Amogh P. K. Assistant Professor Dept. of ISE, AIT
  • 2. OUTLINE •Requirements for Java & Hadoop installation •Jdk Installation steps •Hadoop installation steps
  • 3. How to Install A Single-Node Hadoop Cluster •Assumptions oYou are running 32-bit windows oYour laptops has 4GB or more of RAM •Downloads oVMware Workstation-10 or more oUbuntu 10 or more oJava JDK 1.5 0r more(E.g. JDK 1.7) oHadoop 1.2.1 or more
  • 4. •Instructions to Install Hadoop 1. Install VMWare Workstation 2. Create a new Virtual machine 3. Point the installer disc image to the ISO file (E.g. Ubuntu 10) that you are downloaded 4. Give the User name & Password (E.g. hduser for both) 5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Host machine) 6. Customize hardware a. Memory: 2GB RAM (more is better, but you want to leave some for your Host(Windows) machine) b. Processors: 2(more is better, but you wanted to leave some for your Host(Windows) machine)
  • 5. 7. Launch your Virtual machine (all the instructions after this step will be performed in Ubuntu) 8. Login to User (E.g. hduser) 9. Open a terminal window with Ctrl + Alt + T (you will use this shortcut a lot) • Type following commands in the terminal to download recent linux packages(needs internet connections)
  • 6. $ sudo apt-get update
  • 7. JDK Installation Steps $ sudo apt-get install openssh-server(recommends while connecting to localhost)
  • 8. 10. Install Java JDK 7 a. Download the java JDK (http://www.wikihow.com/Install-Oracle-Java-JDK-on-Ubuntu-Linux) b. Unzip the file $ tar –xvf jdk-7u25-linux-i586.tar.gz (or) tar xzf jdk-7u25- linux-i586.tar.gz
  • 9. •Now move the JDK 7 directory to /usr/lib/java (you suppose to create java folder in lib (your choice of location) directory) $ sudo mkdir –p /usr/lib/java •Now move from Download/Desktop folder to Java folder using terminal
  • 10. •$ sudo cp -r jdk1.7.0_25 /usr/lib/java/
  • 11. c. Do the following steps Edit the system PATH file /etc/profile and add the following system variables to your system path. Use nano, gedit or any other text editor, as root, open up /etc/profile. •Type/Copy/Paste: $ sudo gedit /etc/profile or •Type/Copy/Paste: $ sudo nano /etc/profile
  • 12. •Scroll down to the end of the file using your arrow keys and add the following lines below to the end of your /etc/profile file: Type/Copy/Paste: JAVA_HOME=/usr/lib/java/jdk1.7.0_25 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH
  • 13.
  • 14. •Change JDK to the version you are going to be installed Save(CTRL+X & Y & ENTER for nano) the /etc/profile file and exit.
  • 15. d. Now run •$ sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/java/jdk1.7.0_25/bin/java" 1 oThis command notifies the system that Oracle Java JRE is available for use • $ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/java/jdk1.7.0_25/bin/javac" 1 oThis command notifies the system that Oracle Java JDK is available for use •$ sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/java/jdk1.7.0_25/bin/javaws" 1 oThis command notifies the system that Oracle Java Web start is available for use
  • 16. Your Ubuntu Linux system that Oracle Java JDK/JRE must be the default Java. •Type/Copy/Paste: $ sudo update-alternatives --set java /usr/lib/java/jdk1.7.0_25/bin/java othis command will set the java runtime environment for the system •Type/Copy/Paste: $ sudo update-alternatives --set javac /usr/lib/java/jdk1.7.0_25/bin/javac othis command will set the javac compiler for the system •Type/Copy/Paste:$ sudo update-alternatives --set javaws /usr/lib/java/jdk1.7.0_25/bin/javaws othis command will set Java Web start for the system
  • 17. •A successful installation of 32-bit Oracle Java will display: Type/Copy/Paste: $ java -version oThis command displays the version of java running on your system You should receive a message which displays: Java version "1.7.0_25" Java(TM) SE Runtime Environment (build 1.7.5_25-b18) Java HotSpot(TM) Server VM (build 24.25-b08, mixed mode) Type/Copy/Paste: $ javac -version oThis command lets you know that you are now able to compile Java programs from the terminal. You should receive a message which displays: javac 1.7.0_25
  • 18. •Successful Java installation displays  “Congratulations you are successfully installed Java JDK”
  • 19. Hadoop Installation Steps Prerequisites •Configure JDK: oSun Java JDK is compulsory to run hadoop, therefore all the nodes in hadoop cluster should have JDK configured. Ex:-jdk 1.5 & above ( preference:- jdk-7u25-linux-i586.tar.gz) •Download hadoop package: Ex:- hadoop-1.2.1-bin.tar.gz •NOTE: In a multi-node hadoop cluster, the master node uses Secure Shell (SSH) commands to manipulate the remote nodes. This requires all the nodes must have the same version of JDK and hadoop core. If the versions among nodes are different, errors will occur when you start the cluster.
  • 20. Adding a dedicated Hadoop system user •We will use a dedicated Hadoop user account for running Hadoop. While that’s not required it is recommended because it helps to separate the Hadoop installation from other software applications and user accounts running on the same machine (think: security, permissions, backups, etc). oThis will add the user hduser and the group hadoop to your local machine. $su - hduser oThis will change to hduser $ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser
  • 21. Configuring SSH •Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it (which is what we want to do in this short hadoop installation tutorial). For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the previous section. •we assume that you have SSH up and running on your machine and configured it to allow SSH public key authentication. •First, we have to generate an SSH key for the hduser user. hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
  • 22. hduser@ubuntu:~$ ssh-keygen -t rsa -P "" Private Key Public Key
  • 23. •Second, you have to enable SSH access to your local machine with this newly created key. hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • 24. •The final step is to test the SSH setup by connecting to your local machine with the hduser user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. •If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config (see man ssh_config for more information). •hduser@ubuntu:~$ ssh localhost Are you sure you want to continue connecting (yes/no)? yes
  • 25. •If the SSH connect should fail, these general tips will help:- •Enable debugging with ssh -vvv localhost and investigate the error in detail. •Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication (which should be set to yes) and AllowUsers (if this option is active, add the hduser user to it). If you made any changes to the SSH server configuration file, you can force a configuration reload with sudo /etc/init.d/ssh reload. •Successful connection to localhost diplays:
  • 26. Disabling IPv6 •One problem with IPv6 on Ubuntu is that using 0.0.0.0 for the various networking-related Hadoop configuration options will result in Hadoop binding to the IPv6 addresses of our Ubuntu box. In our case, we realized that there’s no practical point in enabling IPv6 on a box when you are not connected to any IPv6 network. Hence, we simply disabled IPv6 on my Ubuntu machine. Your mileage may vary. •To disable IPv6 on Ubuntu 10.04 LTS, open /etc/sysctl.conf in the editor of your choice and add the following lines to the end of the file: # disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1 /etc/sysctl.conf
  • 27. •You have to reboot your machine in order to make the changes take effect. •You can check whether IPv6 is enabled on your machine with the following command: •A return value of 0 means IPv6 is enabled, a value of 1 means disabled (that’s what we want). Alternative •You can also disable IPv6 only for Hadoop as documented in HADOOP. You can do so by adding the following line to : $ cat /proc/sys/net/ipv6/conf/all/disable_ipv6 export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true conf/hadoop-env.sh
  • 28. Hadoop Installation •Download Hadoop from the Apache Download Mirrors and extract the contents of the Hadoop package to a location of your choice. we picked /usr/local/hadoop. Update $HOME/.bashrc •Add the following lines to the end of the $HOME/.bashrc file of user hduser. If you use a shell other than bash, you should of course update its appropriate configuration files instead of .bashrc. $ cd /usr/local$ sudo tar xzf hadoop-1.0.3.tar.gz $ sudo mv hadoop-1.0.3 hadoop $ sudo chown -R hduser:hadoop hadoop-1.2.1
  • 29. Copy n paste it in $HOME/.bashrc and edit to your requirements # Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop (edit here) # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/java-6-sun(edit here) # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs“ unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed filem from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # # Requires installed 'lzop' command. #lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less} # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin
  • 30. •The following picture gives an overview of the most important HDFS components.
  • 31. Configuration •The only required environment variable we have to configure for Hadoop in this tutorial is JAVA_HOME. Open conf/hadoop-env.sh in the editor of your choice (if you used the installation path in this tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh) and set the JAVA_HOME environment variable to the Sun JDK/JRE 6 directory. Change to # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun # The java implementation to use. Required. export JAVA_HOME=/usr/lib/java/ jdk1.7.0_25 conf/hadoop-env.sh
  • 32. •You can leave the settings below “as is” with the exception of the hadoop.tmp.dir parameter – this parameter you must change to a directory of your choice. We will use the directory /app/hadoop/tmp in this tutorial. Hadoop’s default configurations use hadoop.tmp.dir as the base temporary directory both for the local file system and HDFS, so don’t be surprised if you see Hadoop creating the specified directory automatically on HDFS at some later point. •Now we create the directory and set the required ownerships and permissions: $ sudo mkdir -p /app/hadoop/tmp $ sudo chown hduser:hadoop /app/hadoop/tmp # ...and if you want to tighten up security, chmod from 755 to 750... $ sudo chmod 750 /app/hadoop/tmp
  • 33. •If you forget to set the required ownerships and permissions, you will see a java.io.IOException when you try to format the name node in the next section). •Add the following snippets between the <configuration> ... </configuration> tags in the respective configuration XML file. •In file conf/core-site.xml: conf/core-site.xml <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
  • 34. •In file conf/hdfs-site.xml: conf/hdfs-site.xml <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
  • 35. •In file conf/mapred-site.xml: conf/mapred-site.xml <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
  • 36. Formatting the HDFS filesystem via the NameNode •The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster” (which includes only your local machine if you followed this tutorial). You need to do this the first time you set up a Hadoop cluster. •Do not format a running Hadoop filesystem as you will lose all the data currently in the cluster (in HDFS)! •To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
  • 37. •The output will look like this: hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu/127.0.1.1STARTUP_MSG: args = [- format]STARTUP_MSG: version = 0.20.2STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010************************************************************/10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1************************************************************/hduser@ubuntu:/usr/local/hadoop$
  • 38. Starting your single-node cluster •Run the command: hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh •This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on your machine. •The output will look like this:
  • 39.
  • 40. •A nifty tool for checking whether the expected Hadoop processes are running is jps (part of Sun’s Java since v1.5.0 or more). hduser@ubuntu:/usr/local/hadoop$ jps •Stopping your single-node cluster Run the command hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh •to stop all the daemons running on your machine.
  • 41. Hadoop Web Interfaces •Hadoop comes with several web interfaces which are by default (see conf/hadoop-default.xml) available at these locations: http://localhost:50070/ – web UI of the NameNode daemon http://localhost:50030/ – web UI of the JobTracker daemon http://localhost:50060/ – web UI of the TaskTracker daemon •These web interfaces provide concise information about what’s happening in your Hadoop cluster. You might want to give them a try. •Where o50070- namenode port number o50030-jobtracker port number o50060-tasktracker port number •Type the links In local browser to see the hadoop setup output