SlideShare a Scribd company logo
1 of 22
HADOOP 2.2
INTRODUCTION AND INSTALLATION

Sreejith
Oct, 2013
What is new in hadoop 2.2 ?
• Update to the MapReduce framework to
Apache YARN
• MapReduce is a big feature in Hadoop—the
batch processor that lines up search jobs that
go into the Hadoop distributed file system
(HDFS) to pull out useful information. In the
previous version of MapReduce, jobs could
only be done one at a time, in batches,
because that's how the Java-based
MapReduce tool worked.
What is new in hadoop 2.2 ?
• Its will enable multiple search tools to hit the
data within the HDFS storage system at the
same time
• YARN does is divide the functionality of
MapReduce even further,
– JobTracker component—resource
management and job
– scheduling/monitoring—into separate
applications
What is new in hadoop 2.2 ?
• With MapReduce 2.0, developers can now
build apps directly within Hadoop, instead of
bolting them on from the outside, as many
third-party vendor tools have had to do in
Hadoop 1.0. This essentially will establish
Hadoop 2.0 as a platform into which
developers can create applications that will
search for an manipulate data far more
efficiently.
What is new in hadoop 2.2 ?
• YARN is the biggest change in the new
version of Hadoop,
– high availability for HDFS,
– HDFS snapshots
– support for the NFSv3 filesystem to access
data in HDFS

• Hadoop 2.2 is now officially supported on
Microsoft Window
YARN/MapReduce 2.0 architecture
Node
Manager
AppMaster

Container

Client
Node
Manager

Resource
Manager
Client

AppMaster

Container

Node
Manager

Container

Container
YARN/MapReduce 2.0 architecture
Detail of Figure
Mapraduce
Job Submission
Node Status
Resource Request
Single node cluster setup
• Prerequisites:
–
–
–

Java 6 installed
Dedicated user for hadoop
SSH configured

• You can download tarball for hadoop 2.2 from
– http://mirror.metrocast.net/apache/hadoop/common/stable2/

– Extract it to a folder say, /home/hduser/yarn.
We assume dedicated user for Hadoop is
“hduser”.

•
Single node cluster setup
• After download the file justExtract it to a folder
say, /home/hadoop/yarn We assume
dedicated user for Hadoop is “hadoop”.
– $ tar -xvzf hadoop-2.2.0.tar.gz
– $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0
– $ cd /home/hadoop/yarn
– $ sudo chown -R hadoop:hadoop hadoop-2.2.0
– $ sudo chmod -R 755 hadoop-2.2.0
Single node cluster setup
• Setup Environment Variables in ~/.bashrc
– export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0
– export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0
– export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop

• After Adding these lines at bottom of the
.bashrc file
– $ source ~/.bashrc
Single node cluster setup
• Create Hadoop Data Directories
# Two Directories for name node and datanode
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode
–
– $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode

•

Configuration
– $ cd $YARN_HOME
– $ vi etc/hadoop/yarn-site.xml
– Edit the yarn-site.xml
Single node cluster setup
• Add the following contents inside
configuration tag
# etc/hadoop/yarn-site.xml .
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/core-site.xml
• Add the following contents inside
configuration tag
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/hdfs-site.xml
• Add the following contents inside configuration tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value>
</property>
Single node cluster setup
• $ vi etc/hadoop/mapred-site.xml
• If this file does not exist, create it and paste
the content provided below:
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Single node cluster setup
• Format namenode(Onetime Process)
– $ bin/hadoop namenode -format

• Starting HDFS processes and Map-Reduce
Process
# HDFS(NameNode & DataNode).

– $ sbin/hadoop-daemon.sh start namenode
– $ sbin/hadoop-daemon.sh start datanode
# MR(Resource Manager, Node Manager & Job History Server).

– $ sbin/yarn-daemon.sh start resourcemanager
– $ sbin/yarn-daemon.sh start nodemanager
– $ sbin/mr-jobhistory-daemon.sh start historyserver
Single node cluster setup
• Verifying Installation
$ jps
# Console Output.

22844 Jps
28711 DataNode
29281 JobHistoryServer
28887 ResourceManager
29022 NodeManager
28180 NameNode
Single node cluster setup
• Running Word count Example Program
$ mkdir input
$ cat > input/file
This is word count example
using hadoop 2.2.0
• Add input directory to HDFS
$ bin/hadoop hdfs -copyFromLocal input /input
Single node cluster setup
• Run wordcount example jar provided in
HADOOP_HOME:
$ bin/hadoop jar
share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output
• Check the output:
$ bin/hadoop dfs -cat /out/*
This 2
Another 1
is 2
line 1
one 2
Single node cluster setup
• Web interface
• Browse HDFS and check health using
http://localhost:50070 in the browser:
Single node cluster setup
• You can check the status of the applications
running using the following
URL:http://localhost:8088
•
Hadoop2.2

More Related Content

What's hot

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Cal Henderson
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Simplilearn
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Databricks
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 

What's hot (20)

Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYCScalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Introduction to pig & pig latin
Introduction to pig & pig latinIntroduction to pig & pig latin
Introduction to pig & pig latin
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™Troubleshooting Apache® Ignite™
Troubleshooting Apache® Ignite™
 

Viewers also liked (6)

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar to Hadoop2.2

Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
Edureka!
 

Similar to Hadoop2.2 (20)

Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
MapReduce1.pptx
MapReduce1.pptxMapReduce1.pptx
MapReduce1.pptx
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Presentation
PresentationPresentation
Presentation
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Unit 5
Unit  5Unit  5
Unit 5
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Hadoop2.2

  • 1. HADOOP 2.2 INTRODUCTION AND INSTALLATION Sreejith Oct, 2013
  • 2. What is new in hadoop 2.2 ? • Update to the MapReduce framework to Apache YARN • MapReduce is a big feature in Hadoop—the batch processor that lines up search jobs that go into the Hadoop distributed file system (HDFS) to pull out useful information. In the previous version of MapReduce, jobs could only be done one at a time, in batches, because that's how the Java-based MapReduce tool worked.
  • 3. What is new in hadoop 2.2 ? • Its will enable multiple search tools to hit the data within the HDFS storage system at the same time • YARN does is divide the functionality of MapReduce even further, – JobTracker component—resource management and job – scheduling/monitoring—into separate applications
  • 4. What is new in hadoop 2.2 ? • With MapReduce 2.0, developers can now build apps directly within Hadoop, instead of bolting them on from the outside, as many third-party vendor tools have had to do in Hadoop 1.0. This essentially will establish Hadoop 2.0 as a platform into which developers can create applications that will search for an manipulate data far more efficiently.
  • 5. What is new in hadoop 2.2 ? • YARN is the biggest change in the new version of Hadoop, – high availability for HDFS, – HDFS snapshots – support for the NFSv3 filesystem to access data in HDFS • Hadoop 2.2 is now officially supported on Microsoft Window
  • 7. YARN/MapReduce 2.0 architecture Detail of Figure Mapraduce Job Submission Node Status Resource Request
  • 8. Single node cluster setup • Prerequisites: – – – Java 6 installed Dedicated user for hadoop SSH configured • You can download tarball for hadoop 2.2 from – http://mirror.metrocast.net/apache/hadoop/common/stable2/ – Extract it to a folder say, /home/hduser/yarn. We assume dedicated user for Hadoop is “hduser”. •
  • 9. Single node cluster setup • After download the file justExtract it to a folder say, /home/hadoop/yarn We assume dedicated user for Hadoop is “hadoop”. – $ tar -xvzf hadoop-2.2.0.tar.gz – $ mv hadoop-2.2.0 /home/hadoop/yarn/hadoop2.2.0 – $ cd /home/hadoop/yarn – $ sudo chown -R hadoop:hadoop hadoop-2.2.0 – $ sudo chmod -R 755 hadoop-2.2.0
  • 10. Single node cluster setup • Setup Environment Variables in ~/.bashrc – export HADOOP_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0 – export HADOOP_MAPRED_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export HADOOP_COMMON_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export HADOOP_HDFS_HOME=$HOME/Programs/Hadoop/hadoop2.2.0 – export YARN_HOME=$HOME/Programs/Hadoop/hadoop-2.2.0 – export HADOOP_CONF_DIR=$HOME/Programs/Hadoop/hadoop2.2.0/etc/hadoop • After Adding these lines at bottom of the .bashrc file – $ source ~/.bashrc
  • 11. Single node cluster setup • Create Hadoop Data Directories # Two Directories for name node and datanode – $ mkdir -p $HOME/yarn/yarn_data/hdfs/namenode – – $ mkdir -p $HOME/yarn/yarn_data/hdfs/datanode • Configuration – $ cd $YARN_HOME – $ vi etc/hadoop/yarn-site.xml – Edit the yarn-site.xml
  • 12. Single node cluster setup • Add the following contents inside configuration tag # etc/hadoop/yarn-site.xml . <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
  • 13. Single node cluster setup • $ vi etc/hadoop/core-site.xml • Add the following contents inside configuration tag <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
  • 14. Single node cluster setup • $ vi etc/hadoop/hdfs-site.xml • Add the following contents inside configuration tag <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/yarn/yarn_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/home/hadoop/yarn/yarn_data/hdfs/datanode</value> </property>
  • 15. Single node cluster setup • $ vi etc/hadoop/mapred-site.xml • If this file does not exist, create it and paste the content provided below: <?xml version="1.0"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
  • 16. Single node cluster setup • Format namenode(Onetime Process) – $ bin/hadoop namenode -format • Starting HDFS processes and Map-Reduce Process # HDFS(NameNode & DataNode). – $ sbin/hadoop-daemon.sh start namenode – $ sbin/hadoop-daemon.sh start datanode # MR(Resource Manager, Node Manager & Job History Server). – $ sbin/yarn-daemon.sh start resourcemanager – $ sbin/yarn-daemon.sh start nodemanager – $ sbin/mr-jobhistory-daemon.sh start historyserver
  • 17. Single node cluster setup • Verifying Installation $ jps # Console Output. 22844 Jps 28711 DataNode 29281 JobHistoryServer 28887 ResourceManager 29022 NodeManager 28180 NameNode
  • 18. Single node cluster setup • Running Word count Example Program $ mkdir input $ cat > input/file This is word count example using hadoop 2.2.0 • Add input directory to HDFS $ bin/hadoop hdfs -copyFromLocal input /input
  • 19. Single node cluster setup • Run wordcount example jar provided in HADOOP_HOME: $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduceexamples-2.2.0.jar wordcount /input /output • Check the output: $ bin/hadoop dfs -cat /out/* This 2 Another 1 is 2 line 1 one 2
  • 20. Single node cluster setup • Web interface • Browse HDFS and check health using http://localhost:50070 in the browser:
  • 21. Single node cluster setup • You can check the status of the applications running using the following URL:http://localhost:8088 •