Install hadoop in a cluster

Hadoop Installation
Xuhong Zhang, Jiangling Yin
Advisor: Dr. Jun Wang

Hadoop Installation
• Documentation
Goes to http://hadoop.apache.org/

Install Hadoop in a Cluster
UCF CASS:
http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS

Prerequisites
• Several machines
• Linux for production platform (Linux centos in
this example)
• Java installed (Version 6 or later)
• SSH installed

Install Hadoop
• Two steps:
1. Download Hadoop from release page
http://hadoop.apache.org/releases.html#Download
In this example, we use Hadoop 2.2.0
1. Config configuration files

Download and unpack Hadoop
• Unpackage
$ tar hadoop-2.2.0.tar.gz
• ‘cd’ to directory hadoop
$ cd hadoop-2.2.0/
• Inside hadoop-2.2.0 directory

Configuration (1)
• All configuration files are under hadoop-
2.2.0/etc/hadoop directory:

Configuration (2) –
Environment variables
• Java
Set JAVA_HOME to the location of your jdk
for example:
$export JAVA_HOME=/home/ji453898/jan/jdk1.7.0_03
• Hadoop
Set HADOOP_HOME to the location of your hadoop
folder
export HADOOP_HOME=/home/xzhang/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

slaves
A list of machines (one per line) that each run
a datanode and a tasktracker.

Configuration-SSH passwordless
login
SSH passwordless login from master to slaves
• Generate SSH key pairs(public and private)

Configuration-SSH passwordless
login
• Append public key into authorized_keys
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
• Copy id_rsa.pub file into all slave’s ~/.ssh/ folder

Run Hadoop
• Format Name node
• Start dfs(HDFS)
• Start Yarn(resourcemanager,nodemanager)
• Check if success

Format Namenode
• Inside bin directory
$./hadoop namenode –format

Start DFS and Yarn
• Start hadoop

Check
• On namenode
• On datanode

Run a sample MapReduce (1)
• Upload one file into dfs:
Link to hadoop file system shell documentation
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html

Run a sample MapReduce (2)
• Run a MapReduce job:
$ hadoop jar hadoop-mapreduce-examples-2.4.2-SNAPSHOT.jar wordcount
/wordcount/input /wordcount/output
……./hadoop/share/hadoop/mapreduce/

compiling
• compile WordCount.java
$ javac -classpath hadoop-core-0.20.203.0.jar -d
wordcount WordCount.java
• create a jar
$jar -cvf ./word.jar -C wordcount .
• Look up the clasess:
$ jar tf word.jar

Install hadoop in a cluster

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Install hadoop in a cluster

Similar to Install hadoop in a cluster (20)

Install hadoop in a cluster