With the advent of Hadoop, there comes the need for professionals skilled in Hadoop Administration making it imperative to be skilled as a Hadoop Admin for better career, salary and job opportunities.
The following blogs will help you understand the significance of Hadoop Administration training:
http://www.edureka.co/blog/why-should-you-go-for-hadoop-administration-course/
http://www.edureka.co/blog/how-to-become-a-hadoop-administrator/
http://www.edureka.co/blog/hadoop-admin-responsibilities/
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Top 5 Hadoop Admin Tasks
1. Top 5 Hadoop Admin Tasks
For more details please contact us:
US : 1800 275 9730 (toll free)
INDIA : +91 88808 62004
Email Us : sales@edureka.co
For Queries:
Post on Twitter @edurekaIN: #askEdureka
Post on Facebook /edurekaIN
View Hadoop Administration Course at www.edureka.co/hadoop-admin
2. www.edureka.co/hadoop-adminSlide 2 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
Objectives
At the end of this module, you will be able to
Understand Cluster Planning
Understand Hadoop fully distributed cluster setup with two nodes
Add further nodes to the running cluster
Upgrade existing Hadoop Cluster from Hadoop 1 to Hadoop 2
Understand Active NameNode Failure and how passive takes over
3. Slide 3 www.edureka.in/hadoop-admin
» Great for testing,
developing
» Not a practical
implementation for
large amounts of data
» Initially four or six
nodes
» As the volume of
data grows, more
nodes can easily be
added
Ways of deciding when
the cluster needs to grow
» Increasing amount of
computation power
needed
» Increasing amount of
data which needs to be
stored
» Increasing amount of
memory needed to
process tasks
Hadoop Cluster
Large Cluster
Hadoop Cluster: Thinking About The Problem
Small ClusterSingle Machine
4. www.edureka.co/hadoop-adminSlide 4
Hadoop Cluster: A Typical Use Case
NameNode Secondary NameNode
DataNode
RAM: 64 GB,
Hard disk: 1 TB
Processor: Xenon with 8 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 32 GB,
Hard disk: 1 TB
Processor: Xenon with 4 Cores
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
Power: Redundant Power Supply
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
DataNode
RAM: 16GB
Hard disk: 6 X 2TB
Processor: Xenon with 2 cores.
Ethernet: 3 X 10 GB/s
OS: 64-bit CentOS
5. www.edureka.co/hadoop-adminSlide 5
Seeking cluster growth on storage capacity is often a good method to use!
Cluster Growth Based On Storage Capacity
Data grows by approximately
5TB per week
HDFS set up to replicate each
block three times
Thus, 15TB of extra storage
space required per week
Assuming machines with 5x3TB
hard drives, equating to a new
machine required each week
Assume Overheads to be 30%
6. www.edureka.co/hadoop-adminSlide 6
Slave Nodes: Recommended Configuration
Higher-performance vs lower performance components
Save the Money, Buy more Nodes!
General ( Depends on requirement
‘base’ configuration for a slave Node
» 4 x 1 TB or 2 TB hard drives, in a
JBOD* configuration
» Do not use RAID!
» 2 x Quad-core CPUs
» 24 -32GB RAM
» Gigabit Ethernet
General Configuration
Multiples of ( 1 hard drive + 2 cores
+ 6-8GB RAM) generally work well
for many types of applications
Special Configuration
Slave Nodes
“A cluster with more nodes performs better than one with fewer, slightly faster nodes”
7. www.edureka.co/hadoop-adminSlide 7
Slave Nodes: More Details (RAM)
Slave Nodes (RAM)
Generally each Map or Reduce task
will take 1GB to 2GB of RAM
Slave nodes should not be using
virtual memory
RULE OF THUMB!
Total number of tasks = 1.5 x number
of processor core
Ensure enough RAM is present to
run all tasks, plus the DataNode,
TaskTracker daemons, plus the
operating system
8. www.edureka.co/hadoop-adminSlide 8
Master Node Hardware Recommendations
Carrier-class hardware
(Not commodity hardware)
Dual power supplies
Dual Ethernet cards
(Bonded to provide failover)
Raided hard drives
At least 32GB of RAM
Master
Node
Requires
9. www.edureka.co/hadoop-adminSlide 9
Fully Distributed Mode Cluster
Hadoop requires certain ports on each nodes accessible via the network
However, the default firewall iptables prohibit these ports being accessed
To run hadoop applications, you must make sure that these ports are open
To check the status of iptables, you can use these commands under root privilege:
/etc/init.d/iptables status
You can simply turn iptables off, or at least open these ports:
9000, 9001, 50010, 50020, 50030, 50060, 50070, 50075, 50090
10. www.edureka.co/hadoop-adminSlide 10
Hadoop Cluster Modes
Hadoop can run in any of the following three modes:
Fully-Distributed Mode
Pseudo-Distributed Mode
No daemons, everything runs in a single JVM
Suitable for running MapReduce programs during development
Has no DFS
Hadoop daemons run on the local machine
Hadoop daemons run on a cluster of machines
Standalone (or Local) Mode
11. www.edureka.co/hadoop-adminSlide 11
Hadoop Cluster
Create Dedicated User and Group
» Hadoop requires all the nodes in the cluster have exactly the same structure of directory in which hadoop was
installed
» It will be beneficial if we create a dedicated user (e.g.“hadoop”) and install hadoop in its home folder
» You must have root privilege on each nodes to carry on the following steps
» To change to “root”, type in “su -” in the terminal and input the password for “root”
Create group “hadoop user”:
groupadd hadoop use
Create user “hadoop”:
useradd -g hadoop user -s /bin/bash -d /home/hadoop hadoop
in which -g specifies user “hadoop” belongs to group “hadoop user”, -s specifies the shell to use, -d specifies the
home folder for user “hadoop”.
Set password for user “hadoop”:
passwd hadoop
Then type in the password for user “hadoop” twice.
Then type in “su - hadoop” to change to user “hadoop”.
13. www.edureka.co/hadoop-adminSlide 13
Configuration Files
Configuration
Filenames
Description of Log Files
hadoop-env.sh
yarn-env.sh
Settings for Hadoop Daemon’s process environment.
core-site.xml
Configuration settings for Hadoop Core such as I/O settings that common to both HDFS and
YARN.
hdfs-site.xml Configuration settings for HDFS Daemons, the Name Node and the Data Nodes.
yarn-site.xml Configuration setting for Resource Manager and Node Manager.
mapred-site.xml Configuration settings for MapReduce Applications.
slaves A list of machines (one per line) that each run DataNode and Node Manager.
14. www.edureka.co/hadoop-adminSlide 14
Configuration Files (Contd.)
Deprecated Property Name New Property Name
dfs.data.dir dfs.datanode.data.dir
dfs.http.address dfs.namenode.http-address
fs.default.name fs.defaultFS
The core functionality and usage of these core configuration files are same in Hadoop 2.0 and 1.0 but many new properties
have been added and many have been deprecated
For example:
’fs.default.name’ has been deprecated and replaced with ‘fs.defaultFS’ for YARN in core-site.xml
‘dfs.nameservices’ has been added to enable NameNode High Availability in hdfs-site.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/DeprecatedProperties.html
In Hadoop 2.2.0 release, you can use either the old or the new properties
The old property names are now deprecated, but still work!
16. www.edureka.co/hadoop-adminSlide 16
Update the network
addresses in the
‘include’ files
dfs.include
mapred.include
Update the
NameNode:
hadoop dfsadmin
-refreshNodes
Update the Job
Tracker:
hadoop mradmin
-refreshNodes Update the
‘slaves’ file
Start the DataNode
and TaskTracker
hadoop-daemon.sh
start tasktracker
hadoop-daemon.sh
start datanode
Cross Check the Web
6 UI to ensure the
successful addition
Run Balancer to
7 move the HDFS
blocks to
DataNodes
1 2 3
4
5
Add (Commission) DataNodes
17. www.edureka.co/hadoop-adminSlide 17
Hadoop Upgrade from 1 to 2
Run Reports
» FSCK
» LSR
» DFSADMIN
Take backup
» Configurations
» Applications
» Data and Meta-data
Install new version of Hadoop
Upgrade
Run New Reports
» FSCK
» LSR
» DFSADMIN
Compare old and new reports
Test new cluster
Finalize upgrade
20. LIVE Online Class
Class Recording in LMS
24/7 Post Class Support
Module Wise Quiz
Project Work
Verifiable Certificate
www.edureka.co/hadoop-adminSlide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
How it Works?