SlideShare a Scribd company logo
1 of 22
HADOOP 2.2
INTRODUCTION AND INSTALLATION

Sreejith
Oct, 2013
What is new in hadoop 2.2 ?
• Update to the MapReduce framework to
Apache YARN
• MapReduce is a big feature in Hadoop—the
batch processor that lines up search jobs that
go into the Hadoop distributed file system
(HDFS) to pull out useful information. In the
previous version of MapReduce, jobs could
only be done one at a time, in batches,
because that's how the Java-based
MapReduce tool worked.
What is new in hadoop 2.2 ?
• Its will enable multiple search tools to hit the
data within the HDFS storage system at the
same time
• YARN does is divide the functionality of
MapReduce even further,
– JobTracker component—resource
management and job
– scheduling/monitoring—into separate
applications
What is new in hadoop 2.2 ?
• With MapReduce 2.0, developers can now
build apps directly within Hadoop, instead of
bolting them on from the outside, as many
third-party vendor tools have had to do in
Hadoop 1.0. This essentially will establish
Hadoop 2.0 as a platform into which
developers can create applications that will
search for an manipulate data far more
efficiently.
What is new in hadoop 2.2 ?
• YARN is the biggest change in the new
version of Hadoop,
– high availability for HDFS,
– HDFS snapshots
– support for the NFSv3 filesystem to access
data in HDFS

• Hadoop 2.2 is now officially supported on
Microsoft Window
YARN/MapReduce 2.0 architecture
Node
Manager
AppMaster

Container

Client
Node
Manager

Resource
Manager
Client

AppMaster

Container

Node
Manager

Container

Container
YARN/MapReduce 2.0 architecture
Detail of Figure
Mapraduce
Job Submission
Node Status
Resource Request
Single node cluster setup
• Prerequisites:
–
–
–

Java 6 installed
Dedicated user for hadoop
SSH configured

• You can download tarball for hadoop 2.2 from
– http://mirror.metrocast.net/apache/hadoop/common/stable2/

– Extract it to a folder say, /home/hduser/yarn.
We assume dedicated user for Hadoop is
“hduser”.

•
Single node cluster setup
• After download the file justExtract it to a folder
say, /home/hadoop/yarn We assume
dedicated user for Hadoop is “hadoop”.
– $ tar -xvzf hadoop-2.2.0.tar.gz
– $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0
– $ cd /home/hadoop/yarn
– $ sudo chown -R hadoop:hadoop hadoop-2.2.0
– $ sudo chmod -R 755 hadoop-2.2.0
Single node cluster setup
• Setup Environment Variables in ~/.bashrc
– export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop

• After Adding these lines at bottom of the
.bashrc file
– $ source ~/.bashrc
Single node cluster setup
• Create Hadoop Data Directories
# Two Directories for name node and datanode
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
–
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

•

Configuration
– $ cd $YARN_HOME
– $ vi etc/hadoop/yarn-site.xml
– Edit the yarn-site.xml
Single node cluster setup
• Add the following contents inside
configuration tag
# etc/hadoop/yarn-site.xml .
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/core-site.xml
• Add the following contents inside
configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/hdfs-site.xml
• Add the following contents inside configuration tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/mapred-site.xml
• If this file does not exist, create it and paste
the content provided below:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Single node cluster setup
• Format namenode(Onetime Process)
– $ bin/hadoop namenode -format

• Starting HDFS processes and Map-Reduce
Process
# HDFS(NameNode & DataNode).

– $ sbin/hadoop-daemon.sh start namenode
– $ sbin/hadoop-daemon.sh start datanode
# MR(Resource Manager, Node Manager & Job History Server).

– $ sbin/yarn-daemon.sh start resourcemanager
– $ sbin/yarn-daemon.sh start nodemanager
– $ sbin/mr-jobhistory-daemon.sh start historyserver
Single node cluster setup
• Verifying Installation
$ jps
# Console Output.

22844 Jps
28711 DataNode
29281 JobHistoryServer
28887 ResourceManager
29022 NodeManager
28180 NameNode
Single node cluster setup
• Running Word count Example Program
$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
• Add input directory to HDFS
$ bin/hadoop hdfs -copyFromLocal input /input
Single node cluster setup
• Run wordcount example jar provided in
HADOOP_HOME:
$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output
• Check the output:
$ bin/hadoop dfs -cat /out/*
This 2
Another 1
is 2
line 1
one 2
Single node cluster setup
• Web interface
• Browse HDFS and check health using
http://localhost:50070 in the browser:
Single node cluster setup
• You can check the status of the applications
running using the following
URL:http://localhost:8088
•
Hadoop2.2

More Related Content

What's hot

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
First-Come-First-Serve (FCFS)
First-Come-First-Serve (FCFS)First-Come-First-Serve (FCFS)
First-Come-First-Serve (FCFS)
nikeAthena
 

What's hot (20)

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
[143] Modern C++ 무조건 써야 해?
[143] Modern C++ 무조건 써야 해?[143] Modern C++ 무조건 써야 해?
[143] Modern C++ 무조건 써야 해?
 
Css pseudo-classes
Css pseudo-classesCss pseudo-classes
Css pseudo-classes
 
Databind in asp.net
Databind in asp.netDatabind in asp.net
Databind in asp.net
 
JVM Memory Management Details
JVM Memory Management DetailsJVM Memory Management Details
JVM Memory Management Details
 
Hyperledger Fabric practice (v2.0)
Hyperledger Fabric practice (v2.0) Hyperledger Fabric practice (v2.0)
Hyperledger Fabric practice (v2.0)
 
Java Server Pages
Java Server PagesJava Server Pages
Java Server Pages
 
Html phrase tags
Html phrase tagsHtml phrase tags
Html phrase tags
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van HovellAn Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
An Introduction to Higher Order Functions in Spark SQL with Herman van Hovell
 
MongoDB: How it Works
MongoDB: How it WorksMongoDB: How it Works
MongoDB: How it Works
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Javascript essentials
Javascript essentialsJavascript essentials
Javascript essentials
 
Apache Spark's Built-in File Sources in Depth
Apache Spark's Built-in File Sources in DepthApache Spark's Built-in File Sources in Depth
Apache Spark's Built-in File Sources in Depth
 
Advance Java Programming(CM5I) Event handling
Advance Java Programming(CM5I) Event handlingAdvance Java Programming(CM5I) Event handling
Advance Java Programming(CM5I) Event handling
 
First-Come-First-Serve (FCFS)
First-Come-First-Serve (FCFS)First-Come-First-Serve (FCFS)
First-Come-First-Serve (FCFS)
 

Viewers also liked (6)

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar to Hadoop2.2

Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 

Similar to Hadoop2.2 (20)

Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Presentation
PresentationPresentation
Presentation
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Unit 5
Unit  5Unit  5
Unit 5
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

Hadoop2.2

  • 1. HADOOP 2.2 INTRODUCTION AND INSTALLATION Sreejith Oct, 2013
  • 2. What is new in hadoop 2.2 ? • Update to the MapReduce framework to Apache YARN • MapReduce is a big feature in Hadoop—the batch processor that lines up search jobs that go into the Hadoop distributed file system (HDFS) to pull out useful information. In the previous version of MapReduce, jobs could only be done one at a time, in batches, because that's how the Java-based MapReduce tool worked.
  • 3. What is new in hadoop 2.2 ? • Its will enable multiple search tools to hit the data within the HDFS storage system at the same time • YARN does is divide the functionality of MapReduce even further, – JobTracker component—resource management and job – scheduling/monitoring—into separate applications
  • 4. What is new in hadoop 2.2 ? • With MapReduce 2.0, developers can now build apps directly within Hadoop, instead of bolting them on from the outside, as many third-party vendor tools have had to do in Hadoop 1.0. This essentially will establish Hadoop 2.0 as a platform into which developers can create applications that will search for an manipulate data far more efficiently.
  • 5. What is new in hadoop 2.2 ? • YARN is the biggest change in the new version of Hadoop, – high availability for HDFS, – HDFS snapshots – support for the NFSv3 filesystem to access data in HDFS • Hadoop 2.2 is now officially supported on Microsoft Window
  • 7. YARN/MapReduce 2.0 architecture Detail of Figure Mapraduce Job Submission Node Status Resource Request
  • 8. Single node cluster setup • Prerequisites: – – – Java 6 installed Dedicated user for hadoop SSH configured • You can download tarball for hadoop 2.2 from – http://mirror.metrocast.net/apache/hadoop/common/stable2/ – Extract it to a folder say, /home/hduser/yarn. We assume dedicated user for Hadoop is “hduser”. •
  • 9. Single node cluster setup • After download the file justExtract it to a folder say, /home/hadoop/yarn We assume dedicated user for Hadoop is “hadoop”. – $ tar -xvzf hadoop-2.2.0.tar.gz – $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0 – $ cd /home/hadoop/yarn – $ sudo chown -R hadoop:hadoop hadoop-2.2.0 – $ sudo chmod -R 755 hadoop-2.2.0
  • 10. Single node cluster setup • Setup Environment Variables in ~/.bashrc – export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0 – export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0 – export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop • After Adding these lines at bottom of the .bashrc file – $ source ~/.bashrc
  • 11. Single node cluster setup • Create Hadoop Data Directories # Two Directories for name node and datanode – $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode – – $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode • Configuration – $ cd $YARN_HOME – $ vi etc/hadoop/yarn-site.xml – Edit the yarn-site.xml
  • 12. Single node cluster setup • Add the following contents inside configuration tag # etc/hadoop/yarn-site.xml . <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
  • 13. Single node cluster setup • $ vi etc/hadoop/core-site.xml • Add the following contents inside configuration tag <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
  • 14. Single node cluster setup • $ vi etc/hadoop/hdfs-site.xml • Add the following contents inside configuration tag <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value> </property>
  • 15. Single node cluster setup • $ vi etc/hadoop/mapred-site.xml • If this file does not exist, create it and paste the content provided below: <?xml version="1.0"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
  • 16. Single node cluster setup • Format namenode(Onetime Process) – $ bin/hadoop namenode -format • Starting HDFS processes and Map-Reduce Process # HDFS(NameNode & DataNode). – $ sbin/hadoop-daemon.sh start namenode – $ sbin/hadoop-daemon.sh start datanode # MR(Resource Manager, Node Manager & Job History Server). – $ sbin/yarn-daemon.sh start resourcemanager – $ sbin/yarn-daemon.sh start nodemanager – $ sbin/mr-jobhistory-daemon.sh start historyserver
  • 17. Single node cluster setup • Verifying Installation $ jps # Console Output. 22844 Jps 28711 DataNode 29281 JobHistoryServer 28887 ResourceManager 29022 NodeManager 28180 NameNode
  • 18. Single node cluster setup • Running Word count Example Program $ mkdir input $ cat > input/file This is word count example using hadoop 2.2.0 • Add input directory to HDFS $ bin/hadoop hdfs -copyFromLocal input /input
  • 19. Single node cluster setup • Run wordcount example jar provided in HADOOP_HOME: $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output • Check the output: $ bin/hadoop dfs -cat /out/* This 2 Another 1 is 2 line 1 one 2
  • 20. Single node cluster setup • Web interface • Browse HDFS and check health using http://localhost:50070 in the browser:
  • 21. Single node cluster setup • You can check the status of the applications running using the following URL:http://localhost:8088 •