SlideShare a Scribd company logo
1 of 13
How to Install a Single-Node Hadoop Cluster
By Kopaleishvili Valeri
Updated 04/20/2015
Assumptions
1. You’re running 64-bit Windows
2. Your laptop has more than 4 GB of RAM
Download List (No specific order)
 VMWare Player – allows you to run virtual machines with different operating systems
(www.dropbox.com/s/o4773s7mg8l2nox/VMWare-player-5.0.2-1031769.exe)
 Ubuntu 12.04 LTS – Linux operating system with a nice user interface
(www.dropbox.com/s/taeb6jault5siwi/ubuntu-12.04.2-desktop-amd64.iso)
Instructions to Install Hadoop
Next Few Step provide guide on prerequisite requirements for hadoop environment
1. Install VMWare Player
2. Create a new virtual machine
3. Point the installer disc image to the ISO file (Ubuntu) that you just downloaded
4. User name should be hduser
5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Windows
machine)
6. Customize hardware
a. Memory: 2 GB RAM (more is better, but you want to leave some for your Windows machine)
b. Processors: 2 (more is better, but you want to leave some for your Windows machine)
7. Launch your virtual machine (all the instructions after this step will be performed in Ubuntu)
8. Login to hduser
9. Open a terminal window with Ctrl + Alt + T (you will use this keyboard shortcut a lot)
10. Install Java JDK 7
a. Download the Java JDK (https://www.dropbox.com/s/h6bw3tibft3gs17/jdk-7u21-linux-
x64.tar.gz)
b. Unzip the file
tar -xvf jdk-7u21-linux-x64.tar.gz
c. Now move the JDK 7 directory to /usr/lib
sudo mkdir -p /usr/lib/jvm
sudo mv ./jdk1.7.0/usr/lib/jvm/jdk1.7.0
d. Now run
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.7.0/bin/javaws" 1
e. Correct the file ownership and the permissions of the executables:
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chmod a+x /usr/bin/javaws
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
f. Check the version of you new JDK 7 installation:
java -version
11. Install SSH Server
sudo apt-get install openssh-client
sudo apt-get install openssh-server
12. Configure SSH
su - hduser
ssh-keygen -t rsa -P ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh localhost
13. Disabling IPv6 – Run the following command in the extended terminal (Alt + F2)
sudo gedit /etc/sysctl.conf
OR
cd /etc/
vi sysctl.conf
14. Add the following lines to the bottom of the file
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
15. Save the file and close it
16. Restart your Ubuntu (using command : sudo reboot)
Next Step Explain Installation of hadoop
17. Download Apache Hadoop 1.2.1 (http://fossies.org/linux/misc/hadoop-1.2.1.tar.gz/) and store it the
Downloads folder
18. Unzip the file (open up the terminal window),create usergroup and move download to local folder.
cd Downloads
sudo tar xzf hadoop-1.2.1.tar.gz
cd /usr/local/
sudo mv /home/hduser/Downloads/hadoop-1.2.1 hadoop
sudo addgroup hadoop
sudo chown -R hduser:hadoop hadoop
19. Open your .bashrc file in the extended terminal (Alt + F2)
sudo gedit .bashrc OR
vi ~/.bashrc
20. Add the following lines to the bottom of the file as shown below:
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop/hadoop-1.2.1
export PIG_HOME=/usr/local/pig
export PIG_CLASSPATH=/usr/local/hadoop/hadoop-1.2.1/conf
# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"
# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$PIG_HOME/bin
21. Save the .bashrc file and close it
22. Run
sudo gedit usr/local/hadoop/hadoop-1.2.1/conf/hadoop-env.sh
OR
vi usr/local/hadoop/hadoop-1.2.1/conf/hadoop-env.sh
23. Add the following lines
# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
24. Save and close file
25. In the terminal window, create a directory and set the required ownerships and permissions
sudo mkdir -p /app/hadoop/tmp
sudo chown hduser:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
26. Run
sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/core-site.xml
OR
vi /usr/local/hadoop/hadoop-1.2.1/conf/core-site.xml
27. Add the following between the <configuration> … </configuration> tags
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
As shown below;
28. Save and close file
29. Run
sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/mapred-site.xml
OR
vi /usr/local/hadoop/hadoop-1.2.1/conf/mapred-site.xml
30. Add the following between the <configuration> … </configuration> tags
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
31. Save and close file
32. Run
sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/hdfs-site.xml
OR
vi /usr/local/hadoop/hadoop-1.2.1/conf/hdfs-site.xml
33. Add the following between the <configuration> … </configuration> tags
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
The Next Steps are a guide to preparing hdfs after installation and running word count job
34. Format the HDFS
/usr/local/hadoop/hadoop-1.2.1/bin/hadoop namenode -format
35. Start Hadoop services with the following command:
/usr/local/hadoop/bin/start-all.sh
37. A nifty tool for checking whether the expected Hadoop processes are running is jps
38.Restart Ubuntu and login (sudo reboot)
Run a Simple MapReduce Program
1. Download Datasets :
www.gutenberg.org/ebooks/20417
www.gutenberg.org/ebooks/5000
www.gutenberg.org/ebooks/4300
2. Download each ebook as text files in Plain Text UTF-8 encoding and store the files in a local temporary
directory of choice, for example /tmp/gutenberg
ls -l /tmp/gutenberg/
3. Copy the files from our local file system to Hadoop’s HDFS.
hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg
/user/hduser/gutenberg
hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -ls /user/hduser
hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -ls /user/hduser/gutenberg
4. Run the WordCount example job
hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop jar hadoop*examples*.jar wordcount
/user/hduser/gutenberg /user/hduser/gutenberg-output
Below is the output of the running job. This information is useful as it gives map reduce job status as it
runs.
The figures below shows the Admin Web Interfaces for checking Job Status and HDFS files
The URL by default at http://localhost:50030/
Run wordcount job (hadoop)

More Related Content

What's hot

HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseShiva Rama Krishna Dasharathi
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294IkiArif1
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user trainingChris Dwan
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2IMC Institute
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013grim_radical
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab AssignmentFarzad Nozarian
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALIDNur Ahammad
 
Contribuir a Drupal - Entorno
Contribuir a Drupal - EntornoContribuir a Drupal - Entorno
Contribuir a Drupal - EntornoKeopx
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single clusterSalil Navgire
 
Bacula Overview
Bacula OverviewBacula Overview
Bacula Overviewsambismo
 

What's hot (18)

HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Dev ops
Dev opsDev ops
Dev ops
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
ZFS Talk Part 1
ZFS Talk Part 1ZFS Talk Part 1
ZFS Talk Part 1
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Docker and the Oracle Database
Docker and the Oracle DatabaseDocker and the Oracle Database
Docker and the Oracle Database
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013
 
Drupal from scratch
Drupal from scratchDrupal from scratch
Drupal from scratch
 
Refcard en-a4
Refcard en-a4Refcard en-a4
Refcard en-a4
 
Apache HBase - Lab Assignment
Apache HBase - Lab AssignmentApache HBase - Lab Assignment
Apache HBase - Lab Assignment
 
Koha installation BALID
Koha installation BALIDKoha installation BALID
Koha installation BALID
 
Contribuir a Drupal - Entorno
Contribuir a Drupal - EntornoContribuir a Drupal - Entorno
Contribuir a Drupal - Entorno
 
Implementing Hadoop on a single cluster
Implementing Hadoop on a single clusterImplementing Hadoop on a single cluster
Implementing Hadoop on a single cluster
 
Bacula Overview
Bacula OverviewBacula Overview
Bacula Overview
 

Similar to Run wordcount job (hadoop)

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS filesRupak Roy
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installationmellempudilavanya999
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
Medooze MCU Video Multiconference Server Installation and configuration guide...
Medooze MCU Video Multiconference Server Installation and configuration guide...Medooze MCU Video Multiconference Server Installation and configuration guide...
Medooze MCU Video Multiconference Server Installation and configuration guide...sreeharsha43
 
Oracle11g On Fedora14
Oracle11g On Fedora14Oracle11g On Fedora14
Oracle11g On Fedora14kmsa
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 

Similar to Run wordcount job (hadoop) (20)

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Configuring and manipulating HDFS files
Configuring and manipulating HDFS filesConfiguring and manipulating HDFS files
Configuring and manipulating HDFS files
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Big data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with InstallationBig data using Hadoop, Hive, Sqoop with Installation
Big data using Hadoop, Hive, Sqoop with Installation
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Hdfs java api
Hdfs java apiHdfs java api
Hdfs java api
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Medooze MCU Video Multiconference Server Installation and configuration guide...
Medooze MCU Video Multiconference Server Installation and configuration guide...Medooze MCU Video Multiconference Server Installation and configuration guide...
Medooze MCU Video Multiconference Server Installation and configuration guide...
 
Oracle11g On Fedora14
Oracle11g On Fedora14Oracle11g On Fedora14
Oracle11g On Fedora14
 
Oracle11g on fedora14
Oracle11g on fedora14Oracle11g on fedora14
Oracle11g on fedora14
 
Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Linux
LinuxLinux
Linux
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 

More from valeri kopaleishvili (6)

Georgia(格鲁吉亚)
Georgia(格鲁吉亚)Georgia(格鲁吉亚)
Georgia(格鲁吉亚)
 
Staruml
StarumlStaruml
Staruml
 
Software specification for
Software specification forSoftware specification for
Software specification for
 
Erp (sap report)
Erp (sap report)Erp (sap report)
Erp (sap report)
 
Big data
Big dataBig data
Big data
 
Design interpreter pattern
Design interpreter patternDesign interpreter pattern
Design interpreter pattern
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Run wordcount job (hadoop)

  • 1. How to Install a Single-Node Hadoop Cluster By Kopaleishvili Valeri Updated 04/20/2015 Assumptions 1. You’re running 64-bit Windows 2. Your laptop has more than 4 GB of RAM Download List (No specific order)  VMWare Player – allows you to run virtual machines with different operating systems (www.dropbox.com/s/o4773s7mg8l2nox/VMWare-player-5.0.2-1031769.exe)  Ubuntu 12.04 LTS – Linux operating system with a nice user interface (www.dropbox.com/s/taeb6jault5siwi/ubuntu-12.04.2-desktop-amd64.iso) Instructions to Install Hadoop Next Few Step provide guide on prerequisite requirements for hadoop environment 1. Install VMWare Player 2. Create a new virtual machine 3. Point the installer disc image to the ISO file (Ubuntu) that you just downloaded 4. User name should be hduser 5. Hard disk space 40 GB Hard drive (more is better, but you want to leave some for your Windows machine) 6. Customize hardware a. Memory: 2 GB RAM (more is better, but you want to leave some for your Windows machine) b. Processors: 2 (more is better, but you want to leave some for your Windows machine) 7. Launch your virtual machine (all the instructions after this step will be performed in Ubuntu) 8. Login to hduser 9. Open a terminal window with Ctrl + Alt + T (you will use this keyboard shortcut a lot) 10. Install Java JDK 7 a. Download the Java JDK (https://www.dropbox.com/s/h6bw3tibft3gs17/jdk-7u21-linux- x64.tar.gz) b. Unzip the file tar -xvf jdk-7u21-linux-x64.tar.gz c. Now move the JDK 7 directory to /usr/lib sudo mkdir -p /usr/lib/jvm sudo mv ./jdk1.7.0/usr/lib/jvm/jdk1.7.0
  • 2. d. Now run sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1 sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1 sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.7.0/bin/javaws" 1 e. Correct the file ownership and the permissions of the executables: sudo chmod a+x /usr/bin/java sudo chmod a+x /usr/bin/javac sudo chmod a+x /usr/bin/javaws sudo chown -R root:root /usr/lib/jvm/jdk1.7.0 f. Check the version of you new JDK 7 installation:
  • 3. java -version 11. Install SSH Server sudo apt-get install openssh-client sudo apt-get install openssh-server 12. Configure SSH su - hduser ssh-keygen -t rsa -P "" cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys ssh localhost
  • 4. 13. Disabling IPv6 – Run the following command in the extended terminal (Alt + F2) sudo gedit /etc/sysctl.conf OR cd /etc/ vi sysctl.conf 14. Add the following lines to the bottom of the file
  • 5. # disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 15. Save the file and close it 16. Restart your Ubuntu (using command : sudo reboot) Next Step Explain Installation of hadoop 17. Download Apache Hadoop 1.2.1 (http://fossies.org/linux/misc/hadoop-1.2.1.tar.gz/) and store it the Downloads folder 18. Unzip the file (open up the terminal window),create usergroup and move download to local folder. cd Downloads sudo tar xzf hadoop-1.2.1.tar.gz cd /usr/local/ sudo mv /home/hduser/Downloads/hadoop-1.2.1 hadoop sudo addgroup hadoop sudo chown -R hduser:hadoop hadoop 19. Open your .bashrc file in the extended terminal (Alt + F2) sudo gedit .bashrc OR vi ~/.bashrc 20. Add the following lines to the bottom of the file as shown below:
  • 6. # Set Hadoop-related environment variables export HADOOP_HOME=/usr/local/hadoop/hadoop-1.2.1 export PIG_HOME=/usr/local/pig export PIG_CLASSPATH=/usr/local/hadoop/hadoop-1.2.1/conf # Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on) export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 # Some convenient aliases and functions for running Hadoop-related commands unalias fs &> /dev/null alias fs="hadoop fs" unalias hls &> /dev/null alias hls="fs -ls" # If you have LZO compression enabled in your Hadoop cluster and # compress job outputs with LZOP (not covered in this tutorial): # Conveniently inspect an LZOP compressed file from the command # line; run via: # # $ lzohead /hdfs/path/to/lzop/compressed/file.lzo # # Requires installed 'lzop' command. # lzohead () { hadoop fs -cat $1 | lzop -dc | head -1000 | less } # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$PIG_HOME/bin 21. Save the .bashrc file and close it 22. Run sudo gedit usr/local/hadoop/hadoop-1.2.1/conf/hadoop-env.sh OR vi usr/local/hadoop/hadoop-1.2.1/conf/hadoop-env.sh 23. Add the following lines # The java implementation to use. Required. export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
  • 7. 24. Save and close file 25. In the terminal window, create a directory and set the required ownerships and permissions sudo mkdir -p /app/hadoop/tmp sudo chown hduser:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp 26. Run sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/core-site.xml OR vi /usr/local/hadoop/hadoop-1.2.1/conf/core-site.xml 27. Add the following between the <configuration> … </configuration> tags
  • 8. <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> As shown below; 28. Save and close file 29. Run sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/mapred-site.xml OR vi /usr/local/hadoop/hadoop-1.2.1/conf/mapred-site.xml 30. Add the following between the <configuration> … </configuration> tags
  • 9. <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> 31. Save and close file 32. Run sudo gedit /usr/local/hadoop/hadoop-1.2.1/conf/hdfs-site.xml OR vi /usr/local/hadoop/hadoop-1.2.1/conf/hdfs-site.xml 33. Add the following between the <configuration> … </configuration> tags <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
  • 10. The Next Steps are a guide to preparing hdfs after installation and running word count job 34. Format the HDFS /usr/local/hadoop/hadoop-1.2.1/bin/hadoop namenode -format 35. Start Hadoop services with the following command: /usr/local/hadoop/bin/start-all.sh
  • 11. 37. A nifty tool for checking whether the expected Hadoop processes are running is jps 38.Restart Ubuntu and login (sudo reboot) Run a Simple MapReduce Program 1. Download Datasets : www.gutenberg.org/ebooks/20417 www.gutenberg.org/ebooks/5000 www.gutenberg.org/ebooks/4300 2. Download each ebook as text files in Plain Text UTF-8 encoding and store the files in a local temporary directory of choice, for example /tmp/gutenberg ls -l /tmp/gutenberg/ 3. Copy the files from our local file system to Hadoop’s HDFS. hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -ls /user/hduser hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop dfs -ls /user/hduser/gutenberg 4. Run the WordCount example job hduser@ubuntu:/usr/local/hadoop/hadoop-1.2.1$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output
  • 12. Below is the output of the running job. This information is useful as it gives map reduce job status as it runs. The figures below shows the Admin Web Interfaces for checking Job Status and HDFS files The URL by default at http://localhost:50030/