SlideShare a Scribd company logo
Top 5 Hadoop Admin Tasks
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
View Hadoop Administration Course at www.edureka.co/hadoop-admin
www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives
At the end of this module, you will be able to
Understand Cluster Planning
Understand Hadoop fully distributed cluster setup with two nodes
Add further nodes to the running cluster
Upgrade existing Hadoop Cluster from Hadoop 1 to Hadoop 2
Understand Active NameNode Failure and how passive takes over
Slide 3 www.edureka.in/hadoop-admin
» Great for testing,
developing
» Not a practical
implementation for
large amounts of data
» Initially four or six
nodes
» As the volume of
data grows, more
nodes can easily be
added
Ways of deciding when
the cluster needs to grow
» Increasing amount of
computation power
needed
» Increasing amount of
data which needs to be
stored
» Increasing amount of
memory needed to
process tasks
Hadoop Cluster
Large Cluster
Hadoop Cluster: Thinking About The Problem
Small ClusterSingle Machine
www.edureka.co/hadoop-adminSlide 4
Hadoop Cluster: A Typical Use Case
NameNode Secondary NameNode
DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
DataNode
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
www.edureka.co/hadoop-adminSlide 5
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately
5TB per week
HDFS set up to replicate each
block three times
Thus, 15TB of extra storage
space required per week
Assuming machines with 5x3TB
hard drives, equating to a new
machine required each week
Assume Overheads to be 30%
www.edureka.co/hadoop-adminSlide 6
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
 General ( Depends on requirement
‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a
JBOD* configuration
» Do not use RAID!
» 2 x Quad-core CPUs
» 24 -32GB RAM
» Gigabit Ethernet
General Configuration
 Multiples of ( 1 hard drive + 2 cores
+ 6-8GB RAM) generally work well
for many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
www.edureka.co/hadoop-adminSlide 7
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce task
will take 1GB to 2GB of RAM
Slave nodes should not be using
virtual memory
RULE OF THUMB!
Total number of tasks = 1.5 x number
of processor core
Ensure enough RAM is present to
run all tasks, plus the DataNode,
TaskTracker daemons, plus the
operating system
www.edureka.co/hadoop-adminSlide 8
Master Node Hardware Recommendations
Carrier-class hardware
(Not commodity hardware)
Dual power supplies
Dual Ethernet cards
(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master
Node
Requires
www.edureka.co/hadoop-adminSlide 9
Fully Distributed Mode Cluster
Hadoop requires certain ports on each nodes accessible via the network
However, the default firewall iptables prohibit these ports being accessed
To run hadoop applications, you must make sure that these ports are open
To check the status of iptables, you can use these commands under root privilege:
/etc/init.d/iptables status
You can simply turn iptables off, or at least open these ports:
9000, 9001, 50010, 50020, 50030, 50060, 50070, 50075, 50090
www.edureka.co/hadoop-adminSlide 10
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
 No daemons, everything runs in a single JVM
 Suitable for running MapReduce programs during development
 Has no DFS
 Hadoop daemons run on the local machine
 Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
www.edureka.co/hadoop-adminSlide 11
Hadoop Cluster
Create Dedicated User and Group
» Hadoop requires all the nodes in the cluster have exactly the same structure of directory in which hadoop was
installed
» It will be beneficial if we create a dedicated user (e.g.“hadoop”) and install hadoop in its home folder
» You must have root privilege on each nodes to carry on the following steps
» To change to “root”, type in “su -” in the terminal and input the password for “root”
Create group “hadoop user”:
groupadd hadoop use
Create user “hadoop”:
useradd -g hadoop user -s /bin/bash -d /home/hadoop hadoop
in which -g specifies user “hadoop” belongs to group “hadoop user”, -s specifies the shell to use, -d specifies the
home folder for user “hadoop”.
Set password for user “hadoop”:
passwd hadoop
Then type in the password for user “hadoop” twice.
Then type in “su - hadoop” to change to user “hadoop”.
www.edureka.co/hadoop-adminSlide 12
Passwordless ssh
www.edureka.co/hadoop-adminSlide 13
Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh
yarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and
YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
www.edureka.co/hadoop-adminSlide 14
Configuration Files (Contd.)
Deprecated Property Name New Property Name
dfs.data.dir dfs.datanode.data.dir
dfs.http.address dfs.namenode.http-address
fs.default.name fs.defaultFS
The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties
have been added and many have been deprecated
For example:
 ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml
 ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
 In Hadoop 2.2.0 release, you can use either the old or the new properties
 The old property names are now deprecated, but still work!
www.edureka.co/hadoop-adminSlide 15
Commission Data Nodes
 Add New Data Nodes
www.edureka.co/hadoop-adminSlide 16
Update the network
addresses in the
‘include’ files
dfs.include
mapred.include
Update the
NameNode:
hadoop dfsadmin
-refreshNodes
Update the Job
Tracker:
hadoop mradmin
-refreshNodes Update the
‘slaves’ file
Start the DataNode
and TaskTracker
hadoop-daemon.sh
start tasktracker
hadoop-daemon.sh
start datanode
Cross Check the Web
6 UI to ensure the
successful addition
Run Balancer to
7 move the HDFS
blocks to
DataNodes
1 2 3
4
5
Add (Commission) DataNodes
www.edureka.co/hadoop-adminSlide 17
Hadoop Upgrade from 1 to 2
Run Reports
» FSCK
» LSR
» DFSADMIN
Take backup
» Configurations
» Applications
» Data and Meta-data
Install new version of Hadoop
Upgrade
Run New Reports
» FSCK
» LSR
» DFSADMIN
Compare old and new reports
Test new cluster
Finalize upgrade
www.edureka.co/hadoop-adminSlide 18
NameNode HA
www.edureka.co/hadoop-adminSlide 19
DEMO
LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?
Questions
www.edureka.co/hadoop-adminSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
www.edureka.co/hadoop-adminSlide 22
Course Topics
 Module 1
» Hadoop Cluster Administration
 Module 2
» Hadoop Architecture and Cluster setup
 Module 3
» Hadoop Cluster: Planning and Managing
 Module 4
» Backup, Recovery and Maintenance
 Module 5
» Hadoop 2.0 and High Availability
 Module 6
» Advanced Topics: QJM, HDFS Federation and
Security
 Module 7
» Oozie, Hcatalog/Hive and HBase Administration
 Module 8
» Project: Hadoop Implementation
Top 5 Hadoop Admin Tasks

More Related Content

What's hot

Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
Santosh Nage
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
Arun Kumar
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
Edureka!
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
Amal G Jose
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
Chetan Khatri
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
Shashwat Shriparv
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
Shashwat Shriparv
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
awesomesos
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
Edureka!
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
musrath mohammad
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
Shashwat Shriparv
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
Edureka!
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
Praveen Kumar Donta
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1Vemula Ravi
 

What's hot (20)

Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Hadoop admin training
Hadoop admin trainingHadoop admin training
Hadoop admin training
 
Setting High Availability in Hadoop Cluster
Setting High Availability in Hadoop ClusterSetting High Availability in Hadoop Cluster
Setting High Availability in Hadoop Cluster
 
Deployment and Management of Hadoop Clusters
Deployment and Management of Hadoop ClustersDeployment and Management of Hadoop Clusters
Deployment and Management of Hadoop Clusters
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Apache kafka configuration-guide
Apache kafka configuration-guideApache kafka configuration-guide
Apache kafka configuration-guide
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
Upgrading hadoop
Upgrading hadoopUpgrading hadoop
Upgrading hadoop
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Introduction to apache hadoop
Introduction to apache hadoopIntroduction to apache hadoop
Introduction to apache hadoop
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
A Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop AdministratorA Day in the Life of a Hadoop Administrator
A Day in the Life of a Hadoop Administrator
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop interview quations1
Hadoop interview quations1Hadoop interview quations1
Hadoop interview quations1
 

Viewers also liked

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
jijukjoseph
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
Edureka!
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Akihiro Suda
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
Edureka!
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
Edureka!
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop
Edureka!
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
Edureka!
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
Edureka!
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
Edureka!
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Applying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and HadoopApplying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and Hadoop
Mark Johnson
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks
 
Cloud Computing with AWS
Cloud Computing with AWSCloud Computing with AWS
Cloud Computing with AWS
Edureka!
 

Viewers also liked (14)

Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Bulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduceBulk Loading Into HBase With MapReduce
Bulk Loading Into HBase With MapReduce
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Distributed Cache With MapReduce
Distributed Cache With MapReduceDistributed Cache With MapReduce
Distributed Cache With MapReduce
 
5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop5 Scenarios: When To Use & When Not to Use Hadoop
5 Scenarios: When To Use & When Not to Use Hadoop
 
Hadoop Cluster With High Availability
Hadoop Cluster With High AvailabilityHadoop Cluster With High Availability
Hadoop Cluster With High Availability
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Applying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and HadoopApplying Testing Techniques for Big Data and Hadoop
Applying Testing Techniques for Big Data and Hadoop
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Cloud Computing with AWS
Cloud Computing with AWSCloud Computing with AWS
Cloud Computing with AWS
 

Similar to Top 5 Hadoop Admin Tasks

Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
Edureka!
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Brian Enochson
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
Manaranjan Pradhan
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
AnandMHadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
Harshdeep Kaur
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
Owen O'Malley
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
Mandakini Kumari
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Leons Petražickis
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 
Unit 5
Unit  5Unit  5
Unit 5
Ravi Kumar
 
Hadoop
HadoopHadoop
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop DeveloperEdureka!
 

Similar to Top 5 Hadoop Admin Tasks (20)

Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Unit 5
Unit  5Unit  5
Unit 5
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Top 5 Hadoop Admin Tasks

  • 1. Top 5 Hadoop Admin Tasks For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co For Queries: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN View Hadoop Administration Course at www.edureka.co/hadoop-admin
  • 2. www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Objectives At the end of this module, you will be able to Understand Cluster Planning Understand Hadoop fully distributed cluster setup with two nodes Add further nodes to the running cluster Upgrade existing Hadoop Cluster from Hadoop 1 to Hadoop 2 Understand Active NameNode Failure and how passive takes over
  • 3. Slide 3 www.edureka.in/hadoop-admin » Great for testing, developing » Not a practical implementation for large amounts of data » Initially four or six nodes » As the volume of data grows, more nodes can easily be added Ways of deciding when the cluster needs to grow » Increasing amount of computation power needed » Increasing amount of data which needs to be stored » Increasing amount of memory needed to process tasks Hadoop Cluster Large Cluster Hadoop Cluster: Thinking About The Problem Small ClusterSingle Machine
  • 4. www.edureka.co/hadoop-adminSlide 4 Hadoop Cluster: A Typical Use Case NameNode Secondary NameNode DataNode RAM: 64 GB, Hard disk: 1 TB Processor: Xenon with 8 Cores Ethernet: 3 X 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 32 GB, Hard disk: 1 TB Processor: Xenon with 4 Cores Ethernet: 3 X 10 GB/s OS: 64-bit CentOS Power: Redundant Power Supply RAM: 16GB Hard disk: 6 X 2TB Processor: Xenon with 2 cores. Ethernet: 3 X 10 GB/s OS: 64-bit CentOS DataNode RAM: 16GB Hard disk: 6 X 2TB Processor: Xenon with 2 cores. Ethernet: 3 X 10 GB/s OS: 64-bit CentOS
  • 5. www.edureka.co/hadoop-adminSlide 5 Seeking cluster growth on storage capacity is often a good method to use! Cluster Growth Based On Storage Capacity Data grows by approximately 5TB per week HDFS set up to replicate each block three times Thus, 15TB of extra storage space required per week Assuming machines with 5x3TB hard drives, equating to a new machine required each week Assume Overheads to be 30%
  • 6. www.edureka.co/hadoop-adminSlide 6 Slave Nodes: Recommended Configuration Higher-performance vs lower performance components Save the Money, Buy more Nodes!  General ( Depends on requirement ‘base’ configuration for a slave Node » 4 x 1 TB or 2 TB hard drives, in a JBOD* configuration » Do not use RAID! » 2 x Quad-core CPUs » 24 -32GB RAM » Gigabit Ethernet General Configuration  Multiples of ( 1 hard drive + 2 cores + 6-8GB RAM) generally work well for many types of applications Special Configuration Slave Nodes “A cluster with more nodes performs better than one with fewer, slightly faster nodes”
  • 7. www.edureka.co/hadoop-adminSlide 7 Slave Nodes: More Details (RAM) Slave Nodes (RAM) Generally each Map or Reduce task will take 1GB to 2GB of RAM Slave nodes should not be using virtual memory RULE OF THUMB! Total number of tasks = 1.5 x number of processor core Ensure enough RAM is present to run all tasks, plus the DataNode, TaskTracker daemons, plus the operating system
  • 8. www.edureka.co/hadoop-adminSlide 8 Master Node Hardware Recommendations Carrier-class hardware (Not commodity hardware) Dual power supplies Dual Ethernet cards (Bonded to provide failover) Raided hard drives At least 32GB of RAM Master Node Requires
  • 9. www.edureka.co/hadoop-adminSlide 9 Fully Distributed Mode Cluster Hadoop requires certain ports on each nodes accessible via the network However, the default firewall iptables prohibit these ports being accessed To run hadoop applications, you must make sure that these ports are open To check the status of iptables, you can use these commands under root privilege: /etc/init.d/iptables status You can simply turn iptables off, or at least open these ports: 9000, 9001, 50010, 50020, 50030, 50060, 50070, 50075, 50090
  • 10. www.edureka.co/hadoop-adminSlide 10 Hadoop Cluster Modes Hadoop can run in any of the following three modes: Fully-Distributed Mode Pseudo-Distributed Mode  No daemons, everything runs in a single JVM  Suitable for running MapReduce programs during development  Has no DFS  Hadoop daemons run on the local machine  Hadoop daemons run on a cluster of machines Standalone (or Local) Mode
  • 11. www.edureka.co/hadoop-adminSlide 11 Hadoop Cluster Create Dedicated User and Group » Hadoop requires all the nodes in the cluster have exactly the same structure of directory in which hadoop was installed » It will be beneficial if we create a dedicated user (e.g.“hadoop”) and install hadoop in its home folder » You must have root privilege on each nodes to carry on the following steps » To change to “root”, type in “su -” in the terminal and input the password for “root” Create group “hadoop user”: groupadd hadoop use Create user “hadoop”: useradd -g hadoop user -s /bin/bash -d /home/hadoop hadoop in which -g specifies user “hadoop” belongs to group “hadoop user”, -s specifies the shell to use, -d specifies the home folder for user “hadoop”. Set password for user “hadoop”: passwd hadoop Then type in the password for user “hadoop” twice. Then type in “su - hadoop” to change to user “hadoop”.
  • 13. www.edureka.co/hadoop-adminSlide 13 Configuration Files Configuration Filenames Description of Log Files hadoop-env.sh yarn-env.sh Settings for Hadoop Daemon’s process environment. core-site.xml Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and YARN. hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes. yarn-site.xml Configuration setting for Resource Manager and Node Manager. mapred-site.xml Configuration settings for MapReduce Applications. slaves A list of machines (one per line) that each run DataNode and Node Manager.
  • 14. www.edureka.co/hadoop-adminSlide 14 Configuration Files (Contd.) Deprecated Property Name New Property Name dfs.data.dir dfs.datanode.data.dir dfs.http.address dfs.namenode.http-address fs.default.name fs.defaultFS The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties have been added and many have been deprecated For example:  ’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml  ‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html  In Hadoop 2.2.0 release, you can use either the old or the new properties  The old property names are now deprecated, but still work!
  • 16. www.edureka.co/hadoop-adminSlide 16 Update the network addresses in the ‘include’ files dfs.include mapred.include Update the NameNode: hadoop dfsadmin -refreshNodes Update the Job Tracker: hadoop mradmin -refreshNodes Update the ‘slaves’ file Start the DataNode and TaskTracker hadoop-daemon.sh start tasktracker hadoop-daemon.sh start datanode Cross Check the Web 6 UI to ensure the successful addition Run Balancer to 7 move the HDFS blocks to DataNodes 1 2 3 4 5 Add (Commission) DataNodes
  • 17. www.edureka.co/hadoop-adminSlide 17 Hadoop Upgrade from 1 to 2 Run Reports » FSCK » LSR » DFSADMIN Take backup » Configurations » Applications » Data and Meta-data Install new version of Hadoop Upgrade Run New Reports » FSCK » LSR » DFSADMIN Compare old and new reports Test new cluster Finalize upgrade
  • 20. LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions How it Works?
  • 21. Questions www.edureka.co/hadoop-adminSlide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
  • 22. www.edureka.co/hadoop-adminSlide 22 Course Topics  Module 1 » Hadoop Cluster Administration  Module 2 » Hadoop Architecture and Cluster setup  Module 3 » Hadoop Cluster: Planning and Managing  Module 4 » Backup, Recovery and Maintenance  Module 5 » Hadoop 2.0 and High Availability  Module 6 » Advanced Topics: QJM, HDFS Federation and Security  Module 7 » Oozie, Hcatalog/Hive and HBase Administration  Module 8 » Project: Hadoop Implementation