Hadoop single cluster installation
Upcoming SlideShare
Loading in...5
×
 

Hadoop single cluster installation

on

  • 2,458 views

The objective of this slide is to install Hadoop in standalone mode and in a single cluster pseudo mode

The objective of this slide is to install Hadoop in standalone mode and in a single cluster pseudo mode

Statistics

Views

Total Views
2,458
Views on SlideShare
2,458
Embed Views
0

Actions

Likes
2
Downloads
143
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In the preceding sample, MapReduce worked in the local mode without starting any servers and using the local filesystem as the storage system for inputs, outputs, and working data. The following diagram shows what happened in the WordCount program under the covers:

Hadoop single cluster installation Hadoop single cluster installation Presentation Transcript

  • Hadoop Single ClusterInstallationMinh Tran – Software Architect05/2013
  • Prerequisites• Ubuntu Server 10.04 (Lucid Lynx)• JDK 6u34 Linux• Hadoop 1.0.4• VMWare Player / VMWare Workstation /VMWare Server• Ubuntu Server VMWare Image:http://www.thoughtpolice.co.uk/vmware/#ubuntu10.04 (notroot / thoughtpolice)
  • Install SSH• sudo apt-get update• sudo apt-get install openssh-server
  • Install JDK• wget -c -O jdk-6u34-linux-i586.binhttp://download.oracle.com/otn/java/jdk/6u34-b04/jdk-6u34-linux-i586.bin?AuthParam=1347897296_c6dd13e0af9e099dc731937f95c1cd01• chmod 777 jdk-6u34-linux-i586.bin• ./jdk-6u34-linux-i586.bin• sudo mv jdk1.6.0_34 /usr/local• sudo ln -s /usr/local/jdk1.6.0_34 /usr/local/jdk
  • Create group / account forHadoop• sudo addgroup hadoop• sudo adduser --ingroup hadoop hduser
  • Install Local Hadoop• wget http://mirrors.digipower.vn/apache/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz• tar -zxvf hadoop-1.0.4.tar.gz• sudo mv hadoop-1.0.4 /usr/local• sudo chown -R hduser:hadoop /usr/local/hadoop-1.0.4• sudo ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop
  • Install Apache Ant• wget http://mirrors.digipower.vn/apache/ant/binaries/apache-ant-1.9.0-bin.tar.gz• tar -zxvf apache-ant-1.9.0-bin.tar.gz• sudo mv apache-ant-1.9.0 /usr/local• sudo ln -s /usr/local/apache-ant-1.9.0 /usr/local/apache-ant
  • Modify environment variables• su - hduser• vi .bashrc• export JAVA_HOME=/usr/local/jdk• export HADOOP_PREFIX=/usr/local/hadoop• exportPATH=${JAVA_HOME}/bin:${HADOOP_PREFIX}/bin:${PATH}• . .bashrc
  • Try 1st examplehduser@ubuntu:/usr/local/hadoop$ cd $HADOOP_PREFIXhduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar pi 2 10Number of Maps = 2Samples per Map = 10Wrote input for Map #0Wrote input for Map #1Starting Job13/04/03 15:01:40 INFO mapred.FileInputFormat: Total input paths to process : 213/04/03 15:01:41 INFO mapred.JobClient: Running job: job_201304031458_000313/04/03 15:01:42 INFO mapred.JobClient: map 0% reduce 0%13/04/03 15:02:00 INFO mapred.JobClient: map 100% reduce 0%13/04/03 15:02:15 INFO mapred.JobClient: map 100% reduce 100%13/04/03 15:02:19 INFO mapred.JobClient: Job complete: job_201304031458_000313/04/03 15:02:19 INFO mapred.JobClient: Counters: 3013/04/03 15:02:19 INFO mapred.JobClient: Job Counters…13/04/03 15:02:19 INFO mapred.JobClient: Reduce output records=013/04/03 15:02:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=111867084813/04/03 15:02:19 INFO mapred.JobClient: Map output records=4Job Finished in 39.148 secondsEstimated value of Pi is 3.80000000000000000000
  • Setup Single Node Cluster• Disabling ipv6• Configuring SSH• Configuration– hadoop-env.sh– conf/*-site.xml• Start / Stop node cluster• Running MapReduce job
  • Disabling ipv6• Open /etc/sysctl.conf, add following lines# disable ipv6net.ipv6.conf.all.disable_ipv6 = 1net.ipv6.conf.default.disable_ipv6 = 1net.ipv6.conf.lo.disable_ipv6 = 1• Reboot your machine• Verify ipv6 enabled / disabledcat /proc/sys/net/ipv6/conf/all/disable_ipv6(0 – enabled, 1 – disabled)
  • Configuring SSH• Create SSH keys in the localhostsu - hduserssh-keygen -t rsa -P "“• Put the key id_rsa.pub to localhosttouch ~/.ssh/authorized_keys && chmod 600~/.ssh/authorized_keyscat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • Configuration• Edit the configurationin /usr/local/hadoop/conf/hadoop-env.sh, add following lines:export JAVA_HOME=/usr/local/jdk
  • Configuration (cont.)• Create a folder to store data for nodesudo mkdir -p /hadoop_data/namesudo mkdir -p /hadoop_data/datasudo mkdir -p /hadoop_data/tempsudo chown hduser:hadoop /hadoop_data/namesudo chown hduser:hadoop /hadoop_data/datasudo chown hduser:hadoop /hadoop_data/temp
  • conf/core-site.xml<configuration><property><name>hadoop.tmp.dir</name><value>/hadoop_data/temp</value><description>A base for other temporary directories.</description></property><property><name>fs.default.name</name><value>hdfs://localhost:54310</value><description>The name of the default file system. A URI whosescheme and authority determine the FileSystem implementation. Theuris scheme determines the config property (fs.SCHEME.impl) namingthe FileSystem implementation class. The uris authority is used todetermine the host, port, etc. for a filesystem.</description></property></configuration>
  • conf/mapred-site.xml<configuration><property><name>mapred.job.tracker</name><value>localhost:54311</value><description>The host and port that the MapReduce job tracker runs at. If"local", then jobs are run in-process as a single map and reduce task.</description></property></configuration>
  • conf/hdfs-site.xml<configuration><property><name>dfs.name.dir</name><!-- Path to store namespace and transaction logs --><value>/hadoop_data/name</value></property><property><name>dfs.data.dir</name><!-- Path to store data blocks in datanode --><value>/hadoop_data/data</value></property><property><name>dfs.replication</name><value>1</value><description>Default block replication. The actual number of replications canbe specified when the file is created. The default is used if replication is notspecified in create time.</description></property></configuration>
  • Format a new systemnotroot@ubuntu:/usr/local/hadoop/conf$ su - hduserPassword:hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format13/04/03 13:41:24 INFO namenode.NameNode: STARTUP_MSG:/************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = ubuntu.localdomain/127.0.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 1.0.4STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290;compiled by hortonfo on Wed Oct 3 05:13:58 UTC 2012************************************************************/Re-format filesystem in /hadoop_data/name ? (Y or N) Y13/04/03 13:41:26 INFO util.GSet: VM type = 32-bit13/04/03 13:41:26 INFO util.GSet: 2% max memory = 19.33375 MB13/04/03 13:41:26 INFO util.GSet: capacity = 2^22 = 4194304 entries….13/04/03 13:41:28 INFO common.Storage: Storage directory /hadoop_data/name has been successfully formatted.13/04/03 13:41:28 INFO namenode.NameNode: SHUTDOWN_MSG:/************************************************************SHUTDOWN_MSG: Shutting down NameNode at ubuntu.localdomain/127.0.1.1************************************************************/Do not format a running Hadoop file system as you will lose all thedata currently in the cluster (in HDFS)!
  • Start Single Node Clusterhduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.shstarting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-namenode-ubuntu.outlocalhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-datanode-ubuntu.outlocalhost: starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.outstarting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-jobtracker-ubuntu.outlocalhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-tasktracker-ubuntu.out
  • How to verify Hadoop process• A nifty tool for checking whether the expected Hadoop processes are running is jps(part of Sun JDK tool)hduser@ubuntu:~$ jps1203 NameNode1833 Jps1615 JobTracker1541 SecondaryNameNode1362 DataNode1788 TaskTracker• You can also check with netstat if Hadoop is listening on the configured ports.notroot@ubuntu:/usr/local/hadoop/conf$ sudo netstat -plten | grep javatcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 7167 2438/javatcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 7949 2874/javatcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 7898 2791/javatcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 8035 2874/javatcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 7202 2438/javatcp 0 0 0.0.0.0:57143 0.0.0.0:* LISTEN 1001 7585 2791/javatcp 0 0 0.0.0.0:41943 0.0.0.0:* LISTEN 1001 7222 2608/javatcp 0 0 0.0.0.0:58936 0.0.0.0:* LISTEN 1001 6969 2438/javatcp 0 0 127.0.0.1:50234 0.0.0.0:* LISTEN 1001 8158 3050/javatcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 7697 2608/javatcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 7775 2608/javatcp 0 0 0.0.0.0:40067 0.0.0.0:* LISTEN 1001 7764 2874/javatcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 7939 2608/java
  • Stop your single node clusterhduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.shstopping jobtrackerlocalhost: stopping tasktrackerstopping namenodelocalhost: stopping datanodelocalhost: stopping secondarynamenode
  • Running a MapReduce job• We will use three ebooks from ProjectGutenberg for this example:– The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson– The Notebooks of Leonardo Da Vinci– Ulysses by James Joyce• Download each ebook as text files in PlainText UTF-8 encoding and store the files in/tmp/gutenberg
  • Running a MapReduce job(cont.)• Copy these files into HDFShduser@ubuntu:~$ hadoop dfs -copyFromLocal /tmp/gutenberg/user/hduser/gutenberghduser@ubuntu:~$ hadoop dfs -ls /user/hduser/gutenbergFound 3 items-rw-r--r-- 1 hduser supergroup 661807 2013-04-03 14:01/user/hduser/gutenberg/pg20417.txt-rw-r--r-- 1 hduser supergroup 1540092 2013-04-03 14:01/user/hduser/gutenberg/pg4300.txt-rw-r--r-- 1 hduser supergroup 1391684 2013-04-03 14:01/user/hduser/gutenberg/pg5000.txt
  • Running a MapReduce job(cont.)hduser@ubuntu:~$ cd /usr/local/hadoophduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar wordcount /user/hduser/gutenberg/user/hduser/gutenberg-output13/04/03 14:02:45 INFO input.FileInputFormat: Total input paths to process : 313/04/03 14:02:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/04/03 14:02:45 WARN snappy.LoadSnappy: Snappy native library not loaded13/04/03 14:02:45 INFO mapred.JobClient: Running job: job_201304031352_000113/04/03 14:02:46 INFO mapred.JobClient: map 0% reduce 0%13/04/03 14:03:09 INFO mapred.JobClient: map 66% reduce 0%13/04/03 14:03:32 INFO mapred.JobClient: map 100% reduce 0%13/04/03 14:03:47 INFO mapred.JobClient: map 100% reduce 100%13/04/03 14:03:53 INFO mapred.JobClient: Job complete: job_201304031352_000113/04/03 14:03:53 INFO mapred.JobClient: Counters: 2913/04/03 14:03:53 INFO mapred.JobClient: Job Counters13/04/03 14:03:53 INFO mapred.JobClient: Launched reduce tasks=1…13/04/03 14:03:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5911413/04/03 14:03:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=36113/04/03 14:03:53 INFO mapred.JobClient: Reduce input records=10232113/04/03 14:03:53 INFO mapred.JobClient: Reduce input groups=8233413/04/03 14:03:53 INFO mapred.JobClient: Combine output records=10232113/04/03 14:03:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=57606963213/04/03 14:03:53 INFO mapred.JobClient: Reduce output records=8233413/04/03 14:03:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=149048115213/04/03 14:03:53 INFO mapred.JobClient: Map output records=629172
  • Check the result• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -ls /user/hduser/gutenberg-outputFound 3 items-rw-r--r-- 1 hduser supergroup 0 2013-04-03 14:03 /user/hduser/gutenberg-output/_SUCCESSdrwxr-xr-x - hduser supergroup 0 2013-04-03 14:02 /user/hduser/gutenberg-output/_logs-rw-r--r-- 1 hduser supergroup 880829 2013-04-03 14:03 /user/hduser/gutenberg-output/part-r-00000• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000 | more"(Lo)cra" 1"1490 1"1498," 1"35" 1"40," 1"A 2"AS-IS". 1"A_ 1"Absoluti 1"Alack! 1"Alack!" 1"Alla 1"Allegorical 1"Alpha 1"Alpha," 1…
  • Hadoop Interfaces• NameNode Web UI:http://192.168.65.134:50070/• JobTracker Web UI:http://192.168.65.134:50030/• TaskTracker Web UI:http://192.168.65.134:50060/
  • NameNode Web UI daemon
  • JobTracker Web UI
  • TaskTracker Web UI
  • Troubleshooting• VMware Ubuntu image lost eth0 after movingit http://www.whiteboardcoder.com/2012/03/vmware-ubuntu-image-lost-eth0-after.html• Hadoop Troubleshooting:http://wiki.apache.org/hadoop/TroubleShooting• Error when formatting the Hadoopfilesystem: http://askubuntu.com/questions/35551/error-when-formatting-the-hadoop-filesystem
  • THANK YOU