This document provides instructions for installing Hadoop on a cluster. It outlines prerequisites like having multiple Linux machines with Java installed and SSH configured. The steps include downloading and unpacking Hadoop, configuring environment variables and configuration files, formatting the namenode, starting HDFS and Yarn processes, and running a sample MapReduce job to test the installation.
4. Install Hadoop in a Cluster
UCF CASS:
http://cass.eecs.ucf.edu/ganglia/?p=2&c=CASS
5. Prerequisites
• Several machines
• Linux for production platform (Linux centos in
this example)
• Java installed (Version 6 or later)
• SSH installed
6. Install Hadoop
• Two steps:
1. Download Hadoop from release page
http://hadoop.apache.org/releases.html#Download
In this example, we use Hadoop 2.2.0
1. Config configuration files
7. Download and unpack Hadoop
• Unpackage
$ tar hadoop-2.2.0.tar.gz
• ‘cd’ to directory hadoop
$ cd hadoop-2.2.0/
• Inside hadoop-2.2.0 directory
9. Configuration (2) –
Environment variables
• Java
Set JAVA_HOME to the location of your jdk
for example:
$export JAVA_HOME=/home/ji453898/jan/jdk1.7.0_03
• Hadoop
Set HADOOP_HOME to the location of your hadoop
folder
export HADOOP_HOME=/home/xzhang/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin
23. Run a sample MapReduce (1)
• Upload one file into dfs:
Link to hadoop file system shell documentation
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
24. Run a sample MapReduce (2)
• Run a MapReduce job:
$ hadoop jar hadoop-mapreduce-examples-2.4.2-SNAPSHOT.jar wordcount
/wordcount/input /wordcount/output
……./hadoop/share/hadoop/mapreduce/
25. compiling
• compile WordCount.java
$ javac -classpath hadoop-core-0.20.203.0.jar -d
wordcount WordCount.java
• create a jar
$jar -cvf ./word.jar -C wordcount .
• Look up the clasess:
$ jar tf word.jar