Setup and run hadoop distrubution file system example 2.2


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Setup and run hadoop distrubution file system example 2.2

  1. 1. Setup Hadoop 2.x (2.2.0) on Ubuntu In this tutorial I am going to guide you through setting up hadoop 2.2.0 environment on Ubuntu. Prerequistive $ sudo apt-get install openjdk-7-jdk $ java -version java version "1.7.0_25" OpenJDK Runtime Environment (IcedTea 2.3.12) (7u25-2.3.12-4ubuntu3) OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode) $ cd /usr/lib/jvm $ ln -s java-7-openjdk-amd64 jdk $ sudo apt-get install openssh-server Add Hadoop Group and User $ sudo addgroup hadoop $ sudo adduser --ingroup hadoop hduser $ sudo adduser hduser sudo After user is created, re-login into ubuntu using hduser Setup SSH Certificate $ ssh-keygen -t rsa -P '' ... Your identification has been saved in /home/hduser/.ssh/id_rsa. Your public key has been saved in /home/hduser/.ssh/ ... $ cat ~/.ssh/ >> ~/.ssh/authorized_keys $ ssh localhost Setup Hadoop Environment Variables $cd ~ $vi .bashrc paste following to the end of the file #Hadoop variables export JAVA_HOME=/usr/lib/jvm/jdk/ export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL ###end of paste $ cd /usr/local/hadoop/etc/hadoop $ vi
  2. 2. #modify JAVA_HOME export JAVA_HOME=/usr/lib/jvm/jdk/ Re-login into Ubuntu using hdser and check hadoop version $ hadoop version Hadoop 2.2.0 Subversion -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 This command was run using /usr/local/hadoop2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar At this point, hadoop is installed. Configure Hadoop $ cd /usr/local/hadoop/etc/hadoop $ vi core-site.xml #Paste following between <configuration> <property> <name></name> <value>hdfs://localhost:9000</value> </property> $ vi yarn-site.xml #Paste following between <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> $ mv mapred-site.xml.template mapred-site.xml $ vi mapred-site.xml #Paste following between <configuration> <property> <name></name> <value>yarn</value> </property> $ cd ~ $ mkdir -p mydata/hdfs/namenode $ mkdir -p mydata/hdfs/datanode $ cd /usr/local/hadoop/etc/hadoop $ vi hdfs-site.xml Paste following between <configuration> tag
  3. 3. <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name></name> <value>file:/home/hduser/mydata/hdfs/namenode</value> </property> <property> <name></name> <value>file:/home/hduser/mydata/hdfs/datanode</value> </property> Format Namenode hduser@ubuntu40:~$ hdfs namenode -format Start Hadoop Service $ .... $ .... hduser@ubuntu40:~$ jps If everything is sucessful, you should see following services running 2583 DataNode 2970 ResourceManager 3461 Jps 3177 NodeManager 2361 NameNode 2840 SecondaryNameNode Run Hadoop Example hduser@ubuntu: cd /usr/local/hadoop hduser@ubuntu:/usr/local/hadoop$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5 Number of Maps = 2 Samples per Map = 5 13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Wrote input for Map #0 Wrote input for Map #1 Starting Job 13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at / 13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2 13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2 13/10/21 18:41:04 INFO Configuration.deprecation: is deprecated. Instead, use ...
  4. 4. Hadoop FileSystem (HDFS) Tutorial In this tutorial I will show some common commands for HDFS operations. If you don't have Hadoop setup in your linux, you can follow Hadoop Setup Guide Log into Linux, "hduser" is the login used in following examples. Start Hadoop If it's not running $ .... $ Create someFile.txt in your home directory hduser@ubuntu:~$ vi someFile.txt Paste any text you want in to the file and save it. Create Home Directory In HDFS (If it doesn't exist) hduser@ubuntu:~$ hadoop fs -mkdir -p /user/hduser Copy file someFile.txt from local disk to the user’s directory in HDFS. hduser@ubuntu:~$ hadoop fs -copyFromLocal someFile.txt someFile.txt Get a directory listing of the user’s home directory in HDFS hduser@ubuntu:~$ hadoop fs –ls Found 1 items -rw-r--r-1 hduser supergroup 5 2013-10-27 17:57 someFile.txt Display the contents of the HDFS file /user/hduser/someFile.txt hduser@ubuntu:~$ hadoop fs –cat /user/hduser/someFile.txt Get a directory listing of the HDFS root directory hduser@ubuntu:~$ hadoop fs –ls / copy that file to the local disk, named as someFile2.txt hduser@ubuntu:~$ hadoop fs –copyToLocal /user/hduser/someFile.txt someFile2.txt Delete the file from hadoop hdfs hduser@ubuntu:~$ hadoop fs –rm someFile.txt Deleted someFile.txt