SlideShare a Scribd company logo
ACADGILDACADGILD
INTRODUCTION
Are you a Hadoop developer and want to know the basics of configuring Hadoop cluster? If yes then
this blog will help you to set up a single node cluster on your machine right away!
This blog aims to provide a brief on the most needed settings that need to be taken care of, for a
successful installation.
What Is The Default Configuration In Hadoop?
This blog will guide you with the right settings to setup a single node cluster step by step. The single
node mode is usually used by the developers to test their sample codes.
When you download the Hadoop tar file and install it with default settings, you get a standalone mode.
All the xml files for Hadoop contains properties defined by Apache through which Hadoop understands
its limitations and responsibilities as well as its working nature.
The links below give us the default property settings for all types of configuration files that are needed
for Hadoop:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-
core/mapred-default.xml
http://hadoop.apache.org/docs/current/hadoop-yarn /hadoop-yarn-common/yarn-default.xml
The four files that need to be configured explicitly while setting up a single node hadoop cluster are:
•Core-site.xml
•HDFS-site.xml
•YARN-site.xml
•xml
Overriding The Default xml Properties In site.xml File
We can override some explicit properties by configuring them in above files.
Example:
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
In Hadoop, default replication factor is 3 but we can override that property by making replication factor
as 1 by explicitly configuring the property in hdfs-site.xml.
Overriding the default parameters optimizes the cluster, improves performance and lets one know about
the internal working of Hadoop ecosystem.
Below screenshot shows different files which can be either overridden with explicit properties or can
be used as default properties in Hadoop cluster.
How site.xml Overrides default.xml Settings
Hadoop’s jar files are available in the following path:
$HADOOP_HOME/share/hadoop/
[here HADOOP_HOME indicates path where Hadoop is installed]
It gets all the default configuration details, like default replication factor which is 3
from DFSclient.java from one of the jar files
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
$HADOOP_HOME/share/hadoop/
The default configuration files have specific classpath from where it is always loaded in reference for
working Hadoop. Similarly the modified site.xml files given to developer are loaded from classpath
and checked for additional configuration objects created and deployed into the existing Hadoop
ecosystem overriding the default.xml files.
We will look through the xml files wich we specifically need to alter files at the time of basic
installation of the single node cluster.
Common things to all xml files
We can specify the new value with tags like <property>, <name>, <description>, <final>, etc. inside
predefined <configuration> tag. As Hadoop is an open source framework so the owners have provided
option to override some features by declaring some attribute inside various site.xml files.
Settings that need to be done in Core-site.xml
Some of the important properties are:
•Configuring the name node address
•Configuring the rack awareness factor
•Selecting the type of security
Refer the Table below for the schematic representation of the above properties:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system. Either the literal string "local" or a host:port for
NDFS. </description>
<final>true</final>
</property>
<property>
<name>hadoop.security.authentication</name>
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
<value>kerberos</value>
<description>
Set the authentication for the cluster. Valid values are: simple or kerberos.
</description>
</property>
<property>
<name>fs.trash.interval</name>
<value>0</value>
<description>Number of minutes between trash checkpoints.
If zero, the trash feature is disabled.
</description>
</property>
<name>fs.default.name</name>Here is a detailed description of the below attribute which is
compulsorily needed for configuring Hadoop single node cluster.
<value>hdfs://localhost:9000</value>
A filesystem path in Hadoop has two main components:
•A URI (Uniform Resource Identifier) that identifies the file system
•A path which specifies only the path
Hadoop tries to find that path on the file system defined by fs.default.name
Syntax:
hdfs://<authority><port>
Hadoop tries to find the path on HDFS whose namenode is running at <authority><port>
At some point, if a user specifies both the URI and the path in the request, then the URI in the request
overrides fs.default.name and Hadoop tries to find the path on the filesystem identified by the URI in
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
the request.
One of the important tasks done by fs.default.name filesystem is handling the delete operation in
Hadoop ecosystem.
Some of the overridden name attributes are hadoop.security.authentication, fs.trash.interval,
fs.default.name. Explanation for the attribute we use while setting single node cluster is explained here
with the help of these examples.These examples help us to understand it better while sharing the
customized config.
Settings To Be Done In HDFS-site.xml
The properties inside this xml file deals with storage procedure inside HDFS of Hadoop. Some of the
important properties are:
•Configure port access
•Manages ssl client authentication
•Controls Network interface
•Changes file permission
Some of the overridden name attributes are dfs.namenode.name.dir, dfs.datanode.data.dir, blocksize,
replication, etc.
Explanation for the attributes that we use while setting single node cluster is explained here.
Block replication can be configured using the below setting:
<name>replication </name>
<value>3</value>
The default is used if replication is not specified in create time which is 3 .
Maximum block replication can be 512 and minimum can be 1.
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
We can change the replication factor on a per file basis using the Hadoop FS shell
$hadoop fs-setrep –w 3 /my_file
All files inside directory are available here
$hadoop fs-setrep –w 3 /my_dir
Block size can be configured using
<name>dfs.namenode.name.dir</name>
<value>/user/tom/Hadoop/namenode</value>
This takes the specified path for namenode directory on local filesystem. It has the parent property of
directory and stores the name table. If this is a comma-delemited list of directories then the name table
is replicated in all the directories, for redundancy. In case of any loss for data, this redundancy helps in
recovering the lost data. Here comes the replication factor, which again defines how many copies of a
file has been stored.
<name>dfs.datanode.data.dir</name>
<value>/user/tom/Hadoop/namenode </value>
This takes the specified path for datanode directory on local filesystem. It has the parent property of
directory on the local filesystem on DFS datanode and stores it in blocks. If this is comma delimited list
of directories then data will be stored in named directories, typically on different devices.
<name>dfs.block.size</name>
<value>134217728</value>
It will change the default block size for all files inside HDFS. In this case, we set the dfs.block.size to
128MB. Changing this setting will only affect the block size of files placed into HDFS after this
settings has taken effect.
The fsck command will give replication factor as result with other important factors as shown in figure
below:
$hdfs fsck /<path of file >/<name of file >
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
Settings In yarn-site.xml
Understanding about yarn-site.xml is easy if I explain you some relative concepts of YARN and why
YARN came into existence in Hadoop v2.x .
In Hadoop v1.x TaskTraker and JobTracker were present to handle the job of allocating resources to
processes.
YARN has ResourceManager settings which effects resource allocation with node manager and
application manager. Some of the important properties are:
•WebAppProxy Configuration
•MapReduce Configuration
•NodeManager Configuration
•ResourceManager Configuration
•IPC Configuration
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
It tells the NodeManager if any auxillary service called mapreduce.shuffle need to implemented. After
we tell the NodeManager to implement that service, we give it a class name as the means to implement
that service. This particular configuration tells MapReduce how to do its shuffle because
NodeManagers won’t shuffle data for a non- MapReduce job. We need to configure such a service for
MapReduce by default.
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
This property tells NodeManager that MapReduce container will have to do a shuffle from the map
tasks to the reduce task.
Previously the shuffle step was part of the MapReduce TaskTracker.
The shuffle is an auxillary service and must be sent in the configuration file. In addition we
have yarn.nodemanager.aux.services.mapreduce.shuffle. Although it is possible to write your own
shuffle handler by extending this class, it is recommended that the default class be used.
Shuffle handler :- It is a process that runs inside the YARN NodeManager, the rest server and many
third party applications and they all use the port 8080. This will result in conflicts if you deploy more
than one at a time without reconfiguring the default port.
Some of the overridden name attributes are yarn.resourcemanager.am.max-attempts,
yarn.resourcemanager.proxy-user-privileges.enabled, yarn.nodemanager.aux-services,
yarn.nodemanager.aux-services.mapreduce.shuffle.class etc.
mapred-site.xml
When Hadoop runs for any analysis of dataset, the framework at runtime for MapReduce jobs is a vast
set of rules for assigning jobs to slave and maintain the jobs records. Here YARN in Hadoop2.x is
introduced to help this framework to work efficiently and take the workload for job related
assignments. It is again a large unit of Hadoop ecosystem which helps running the map and reduce the
collaboration with YARN. Some of the important features it handles are:
•Node health script variables
•Proxy Configuration
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
•Job Notification Configuration
<name>mapreduce.framework.name</name>
<value>yarn</value>
The value of this attribute determines whether you are running MapReduce framework in local mode,
classic (mapreduce v1) mode or YARN(MapReduce v2) mode. The local mode indicates that the job is
run locally using local JobRunner. If set to YARN , the job is submitted and executed via the YARN-
cluster.
Some of the overridden name attributes are yarn.app.mapreduce.client.max-retries,
mapreduce.shuffle.port, mapreduce.job.tags, I/O properties.
All these properties explained above sum up the requirement for a single node hadoop cluster.
Follow the document given in the below link to set up a pseudo mode single node hadoop cluster for a
deep understanding.
https://drive.google.com/file/d/0Bxr27gVaXO5scjVxZDBzV3IwRVE/view?usp=sharing
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
ACADGILDACADGILD
•Job Notification Configuration
<name>mapreduce.framework.name</name>
<value>yarn</value>
The value of this attribute determines whether you are running MapReduce framework in local mode,
classic (mapreduce v1) mode or YARN(MapReduce v2) mode. The local mode indicates that the job is
run locally using local JobRunner. If set to YARN , the job is submitted and executed via the YARN-
cluster.
Some of the overridden name attributes are yarn.app.mapreduce.client.max-retries,
mapreduce.shuffle.port, mapreduce.job.tags, I/O properties.
All these properties explained above sum up the requirement for a single node hadoop cluster.
Follow the document given in the below link to set up a pseudo mode single node hadoop cluster for a
deep understanding.
https://drive.google.com/file/d/0Bxr27gVaXO5scjVxZDBzV3IwRVE/view?usp=sharing
https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/

More Related Content

What's hot

Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
BertrandDrouvot
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
Mohamed Ali Mahmoud khouder
 
Oracle12c Pluggable Database Hands On - TROUG 2014
Oracle12c Pluggable Database Hands On - TROUG 2014Oracle12c Pluggable Database Hands On - TROUG 2014
Oracle12c Pluggable Database Hands On - TROUG 2014Özgür Umut Vurgun
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configuration
Akihiro Kitada
 
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
Dave Stokes
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
Naveed Bashir
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
Schubert Zhang
 
General apache command for hadoop
General apache  command for hadoop  General apache  command for hadoop
General apache command for hadoop
Saum
 
Commands guide apache hadoop
Commands guide   apache hadoopCommands guide   apache hadoop
Commands guide apache hadoop
Saum
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
Madhur Nawandar
 
My sql administration
My sql administrationMy sql administration
My sql administration
Mohd yasin Karim
 
361 Rac
361 Rac361 Rac
Oracle Database on Docker
Oracle Database on DockerOracle Database on Docker
Oracle Database on Docker
Franck Pachot
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing
DaeHyung Lee
 
Oracle Database 11g Product Family
Oracle Database 11g Product FamilyOracle Database 11g Product Family
Oracle Database 11g Product FamilyN/A
 
MySQL Monitoring 101
MySQL Monitoring 101MySQL Monitoring 101
MySQL Monitoring 101
Ronald Bradford
 
My SQL 101
My SQL 101My SQL 101
My SQL 101
Dave Stokes
 

What's hot (18)

Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
Automatic Storage Management (ASM) metrics are a goldmine: Let's use them!
 
Beginning hive and_apache_pig
Beginning hive and_apache_pigBeginning hive and_apache_pig
Beginning hive and_apache_pig
 
Oracle12c Pluggable Database Hands On - TROUG 2014
Oracle12c Pluggable Database Hands On - TROUG 2014Oracle12c Pluggable Database Hands On - TROUG 2014
Oracle12c Pluggable Database Hands On - TROUG 2014
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configuration
 
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015MySQL's new Secure by Default Install -- All Things Open October 20th 2015
MySQL's new Secure by Default Install -- All Things Open October 20th 2015
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
General apache command for hadoop
General apache  command for hadoop  General apache  command for hadoop
General apache command for hadoop
 
Commands guide apache hadoop
Commands guide   apache hadoopCommands guide   apache hadoop
Commands guide apache hadoop
 
Hadoop HDFS
Hadoop HDFS Hadoop HDFS
Hadoop HDFS
 
My sql administration
My sql administrationMy sql administration
My sql administration
 
361 Rac
361 Rac361 Rac
361 Rac
 
Oracle Database on Docker
Oracle Database on DockerOracle Database on Docker
Oracle Database on Docker
 
Learning Puppet basic thing
Learning Puppet basic thing Learning Puppet basic thing
Learning Puppet basic thing
 
Oracle Database 11g Product Family
Oracle Database 11g Product FamilyOracle Database 11g Product Family
Oracle Database 11g Product Family
 
MySQL Monitoring 101
MySQL Monitoring 101MySQL Monitoring 101
MySQL Monitoring 101
 
My SQL 101
My SQL 101My SQL 101
My SQL 101
 

Viewers also liked

Basic interview questions for manual testing
Basic interview questions for manual testingBasic interview questions for manual testing
Basic interview questions for manual testing
JYOTI RANJAN PAL
 
C++ interview question
C++ interview questionC++ interview question
C++ interview question
Durgesh Tripathi
 
QA interview questions and answers
QA interview questions and answersQA interview questions and answers
QA interview questions and answers
Mehul Chauhan
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
Anand Bagmar
 
Top 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and AnswersTop 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and Answers
iimjobs and hirist
 
Top 100 Linux Interview Questions and Answers 2014
Top 100 Linux Interview Questions and Answers 2014Top 100 Linux Interview Questions and Answers 2014
Top 100 Linux Interview Questions and Answers 2014
iimjobs and hirist
 
Manual testing interview question by INFOTECH
Manual testing interview question by INFOTECHManual testing interview question by INFOTECH
Manual testing interview question by INFOTECH
Pravinsinh
 

Viewers also liked (8)

Basic interview questions for manual testing
Basic interview questions for manual testingBasic interview questions for manual testing
Basic interview questions for manual testing
 
C++ Interview Questions
C++ Interview QuestionsC++ Interview Questions
C++ Interview Questions
 
C++ interview question
C++ interview questionC++ interview question
C++ interview question
 
QA interview questions and answers
QA interview questions and answersQA interview questions and answers
QA interview questions and answers
 
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
The What, Why and How of (Web) Analytics Testing (Web, IoT, Big Data)
 
Top 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and AnswersTop 100 SQL Interview Questions and Answers
Top 100 SQL Interview Questions and Answers
 
Top 100 Linux Interview Questions and Answers 2014
Top 100 Linux Interview Questions and Answers 2014Top 100 Linux Interview Questions and Answers 2014
Top 100 Linux Interview Questions and Answers 2014
 
Manual testing interview question by INFOTECH
Manual testing interview question by INFOTECHManual testing interview question by INFOTECH
Manual testing interview question by INFOTECH
 

Similar to ACADGILD:: HADOOP LESSON

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
Shashwat Shriparv
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
puneet yadav
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Nag Arvind Gudiseva
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
Edureka!
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
kapa rohit
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
Edureka!
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
Edureka!
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
Edureka!
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Kalyan Hadoop
 
Hadoop file
Hadoop fileHadoop file
Hadoop file
HR Krutika Meheta
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
baabtra.com - No. 1 supplier of quality freshers
 
Hadoop file
Hadoop fileHadoop file
Hadoop file
HR Krutika Meheta
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
Edureka!
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
vishal choudhary
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Titus Damaiyanti
 

Similar to ACADGILD:: HADOOP LESSON (20)

Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Hadoop Installation presentation
Hadoop Installation presentationHadoop Installation presentation
Hadoop Installation presentation
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Webinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin TasksWebinar: Top 5 Hadoop Admin Tasks
Webinar: Top 5 Hadoop Admin Tasks
 
Top 5 Hadoop Admin Tasks
Top 5 Hadoop Admin TasksTop 5 Hadoop Admin Tasks
Top 5 Hadoop Admin Tasks
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hadoop file
Hadoop fileHadoop file
Hadoop file
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Hadoop file
Hadoop fileHadoop file
Hadoop file
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 

More from Padma shree. T

ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on railsACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
Padma shree. T
 
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
Padma shree. T
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hive
Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
Padma shree. T
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
Padma shree. T
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
Padma shree. T
 

More from Padma shree. T (11)

ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on railsACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
ACADGILD:: FRONTEND LESSON -Ruby on rails vs groovy on rails
 
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
ACADGILD:: ANDROID LESSON-How to analyze &amp; manage memory on android like ...
 
ACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hiveACADGILD:: HADOOP LESSON - File formats in apache hive
ACADGILD:: HADOOP LESSON - File formats in apache hive
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
 
ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON ACADGILD:: ANDROID LESSON
ACADGILD:: ANDROID LESSON
 

Recently uploaded

Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
EduSkills OECD
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 

Recently uploaded (20)

Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 

ACADGILD:: HADOOP LESSON

  • 1. ACADGILDACADGILD INTRODUCTION Are you a Hadoop developer and want to know the basics of configuring Hadoop cluster? If yes then this blog will help you to set up a single node cluster on your machine right away! This blog aims to provide a brief on the most needed settings that need to be taken care of, for a successful installation. What Is The Default Configuration In Hadoop? This blog will guide you with the right settings to setup a single node cluster step by step. The single node mode is usually used by the developers to test their sample codes. When you download the Hadoop tar file and install it with default settings, you get a standalone mode. All the xml files for Hadoop contains properties defined by Apache through which Hadoop understands its limitations and responsibilities as well as its working nature. The links below give us the default property settings for all types of configuration files that are needed for Hadoop: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client- core/mapred-default.xml http://hadoop.apache.org/docs/current/hadoop-yarn /hadoop-yarn-common/yarn-default.xml The four files that need to be configured explicitly while setting up a single node hadoop cluster are: •Core-site.xml •HDFS-site.xml •YARN-site.xml •xml Overriding The Default xml Properties In site.xml File We can override some explicit properties by configuring them in above files. Example: https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 2. ACADGILDACADGILD In Hadoop, default replication factor is 3 but we can override that property by making replication factor as 1 by explicitly configuring the property in hdfs-site.xml. Overriding the default parameters optimizes the cluster, improves performance and lets one know about the internal working of Hadoop ecosystem. Below screenshot shows different files which can be either overridden with explicit properties or can be used as default properties in Hadoop cluster. How site.xml Overrides default.xml Settings Hadoop’s jar files are available in the following path: $HADOOP_HOME/share/hadoop/ [here HADOOP_HOME indicates path where Hadoop is installed] It gets all the default configuration details, like default replication factor which is 3 from DFSclient.java from one of the jar files https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 3. ACADGILDACADGILD $HADOOP_HOME/share/hadoop/ The default configuration files have specific classpath from where it is always loaded in reference for working Hadoop. Similarly the modified site.xml files given to developer are loaded from classpath and checked for additional configuration objects created and deployed into the existing Hadoop ecosystem overriding the default.xml files. We will look through the xml files wich we specifically need to alter files at the time of basic installation of the single node cluster. Common things to all xml files We can specify the new value with tags like <property>, <name>, <description>, <final>, etc. inside predefined <configuration> tag. As Hadoop is an open source framework so the owners have provided option to override some features by declaring some attribute inside various site.xml files. Settings that need to be done in Core-site.xml Some of the important properties are: •Configuring the name node address •Configuring the rack awareness factor •Selecting the type of security Refer the Table below for the schematic representation of the above properties: <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system. Either the literal string "local" or a host:port for NDFS. </description> <final>true</final> </property> <property> <name>hadoop.security.authentication</name> https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 4. ACADGILDACADGILD <value>kerberos</value> <description> Set the authentication for the cluster. Valid values are: simple or kerberos. </description> </property> <property> <name>fs.trash.interval</name> <value>0</value> <description>Number of minutes between trash checkpoints. If zero, the trash feature is disabled. </description> </property> <name>fs.default.name</name>Here is a detailed description of the below attribute which is compulsorily needed for configuring Hadoop single node cluster. <value>hdfs://localhost:9000</value> A filesystem path in Hadoop has two main components: •A URI (Uniform Resource Identifier) that identifies the file system •A path which specifies only the path Hadoop tries to find that path on the file system defined by fs.default.name Syntax: hdfs://<authority><port> Hadoop tries to find the path on HDFS whose namenode is running at <authority><port> At some point, if a user specifies both the URI and the path in the request, then the URI in the request overrides fs.default.name and Hadoop tries to find the path on the filesystem identified by the URI in https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 5. ACADGILDACADGILD the request. One of the important tasks done by fs.default.name filesystem is handling the delete operation in Hadoop ecosystem. Some of the overridden name attributes are hadoop.security.authentication, fs.trash.interval, fs.default.name. Explanation for the attribute we use while setting single node cluster is explained here with the help of these examples.These examples help us to understand it better while sharing the customized config. Settings To Be Done In HDFS-site.xml The properties inside this xml file deals with storage procedure inside HDFS of Hadoop. Some of the important properties are: •Configure port access •Manages ssl client authentication •Controls Network interface •Changes file permission Some of the overridden name attributes are dfs.namenode.name.dir, dfs.datanode.data.dir, blocksize, replication, etc. Explanation for the attributes that we use while setting single node cluster is explained here. Block replication can be configured using the below setting: <name>replication </name> <value>3</value> The default is used if replication is not specified in create time which is 3 . Maximum block replication can be 512 and minimum can be 1. https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 6. ACADGILDACADGILD We can change the replication factor on a per file basis using the Hadoop FS shell $hadoop fs-setrep –w 3 /my_file All files inside directory are available here $hadoop fs-setrep –w 3 /my_dir Block size can be configured using <name>dfs.namenode.name.dir</name> <value>/user/tom/Hadoop/namenode</value> This takes the specified path for namenode directory on local filesystem. It has the parent property of directory and stores the name table. If this is a comma-delemited list of directories then the name table is replicated in all the directories, for redundancy. In case of any loss for data, this redundancy helps in recovering the lost data. Here comes the replication factor, which again defines how many copies of a file has been stored. <name>dfs.datanode.data.dir</name> <value>/user/tom/Hadoop/namenode </value> This takes the specified path for datanode directory on local filesystem. It has the parent property of directory on the local filesystem on DFS datanode and stores it in blocks. If this is comma delimited list of directories then data will be stored in named directories, typically on different devices. <name>dfs.block.size</name> <value>134217728</value> It will change the default block size for all files inside HDFS. In this case, we set the dfs.block.size to 128MB. Changing this setting will only affect the block size of files placed into HDFS after this settings has taken effect. The fsck command will give replication factor as result with other important factors as shown in figure below: $hdfs fsck /<path of file >/<name of file > https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 7. ACADGILDACADGILD Settings In yarn-site.xml Understanding about yarn-site.xml is easy if I explain you some relative concepts of YARN and why YARN came into existence in Hadoop v2.x . In Hadoop v1.x TaskTraker and JobTracker were present to handle the job of allocating resources to processes. YARN has ResourceManager settings which effects resource allocation with node manager and application manager. Some of the important properties are: •WebAppProxy Configuration •MapReduce Configuration •NodeManager Configuration •ResourceManager Configuration •IPC Configuration https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 8. ACADGILDACADGILD <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> It tells the NodeManager if any auxillary service called mapreduce.shuffle need to implemented. After we tell the NodeManager to implement that service, we give it a class name as the means to implement that service. This particular configuration tells MapReduce how to do its shuffle because NodeManagers won’t shuffle data for a non- MapReduce job. We need to configure such a service for MapReduce by default. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> This property tells NodeManager that MapReduce container will have to do a shuffle from the map tasks to the reduce task. Previously the shuffle step was part of the MapReduce TaskTracker. The shuffle is an auxillary service and must be sent in the configuration file. In addition we have yarn.nodemanager.aux.services.mapreduce.shuffle. Although it is possible to write your own shuffle handler by extending this class, it is recommended that the default class be used. Shuffle handler :- It is a process that runs inside the YARN NodeManager, the rest server and many third party applications and they all use the port 8080. This will result in conflicts if you deploy more than one at a time without reconfiguring the default port. Some of the overridden name attributes are yarn.resourcemanager.am.max-attempts, yarn.resourcemanager.proxy-user-privileges.enabled, yarn.nodemanager.aux-services, yarn.nodemanager.aux-services.mapreduce.shuffle.class etc. mapred-site.xml When Hadoop runs for any analysis of dataset, the framework at runtime for MapReduce jobs is a vast set of rules for assigning jobs to slave and maintain the jobs records. Here YARN in Hadoop2.x is introduced to help this framework to work efficiently and take the workload for job related assignments. It is again a large unit of Hadoop ecosystem which helps running the map and reduce the collaboration with YARN. Some of the important features it handles are: •Node health script variables •Proxy Configuration https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 9. ACADGILDACADGILD •Job Notification Configuration <name>mapreduce.framework.name</name> <value>yarn</value> The value of this attribute determines whether you are running MapReduce framework in local mode, classic (mapreduce v1) mode or YARN(MapReduce v2) mode. The local mode indicates that the job is run locally using local JobRunner. If set to YARN , the job is submitted and executed via the YARN- cluster. Some of the overridden name attributes are yarn.app.mapreduce.client.max-retries, mapreduce.shuffle.port, mapreduce.job.tags, I/O properties. All these properties explained above sum up the requirement for a single node hadoop cluster. Follow the document given in the below link to set up a pseudo mode single node hadoop cluster for a deep understanding. https://drive.google.com/file/d/0Bxr27gVaXO5scjVxZDBzV3IwRVE/view?usp=sharing https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/
  • 10. ACADGILDACADGILD •Job Notification Configuration <name>mapreduce.framework.name</name> <value>yarn</value> The value of this attribute determines whether you are running MapReduce framework in local mode, classic (mapreduce v1) mode or YARN(MapReduce v2) mode. The local mode indicates that the job is run locally using local JobRunner. If set to YARN , the job is submitted and executed via the YARN- cluster. Some of the overridden name attributes are yarn.app.mapreduce.client.max-retries, mapreduce.shuffle.port, mapreduce.job.tags, I/O properties. All these properties explained above sum up the requirement for a single node hadoop cluster. Follow the document given in the below link to set up a pseudo mode single node hadoop cluster for a deep understanding. https://drive.google.com/file/d/0Bxr27gVaXO5scjVxZDBzV3IwRVE/view?usp=sharing https://acadgild.com/blog/key-configurations-in-hadoop-installation/https://acadgild.com/blog/key-configurations-in-hadoop-installation/