Deployment and Management
of Hadoop Clusters
Amal G Jose
Big Data Analytics
http://www.coderfox.com/
http://amalgjose.word...
• Introduction
• Cluster design and deployment
• Backup and Recovery
• Hadoop Upgrade
• Routine Administration Tasks
Agenda
Introduction
• What is Hadoop ?
• What makes Hadoop different ?
• Need for a hadoop cluster ?
This has 4 parts:
• Cluster Planning.
• OS installation & Hardening.
• Cluster Software Installation.
• Cluster configurat...
Cluster Planning
Hadoop Daemon Configuration
Namenode Dedicated servers.
OS is installed on the RAID device.
The dfs.name....
Workload Patterns For Hadoop
• Balanced Workload
• Compute Intensive
• I/O Intensive
• Unknown or evolving workload patter...
Cluster Topology
Name Node
Job Tracker
Ganglia-Daemon
Name Node
Job Tracker
Ganglia-Daemon
MN
Hive
Pig
Oozie
Mahout
Ganglia-Master
Hive
Pig...
• Creating the instances based on the
requirement
Creating Instances (in case of cloud)
• We will be installing the Hadoop on the RHEL6 64-
bit servers.
• OS should be hardened based on RHEL6
hardening document...
• Choosing the distribution of Hadoop.
• Creation of Local Yum Repository.
• Java Installation in all the machines.
Cluste...
Hadoop Ecosystem
Installation Methods
• Hadoop can be installed either manually
or automatically using some tools such as
ClouderaManager, ...
Manual Installation
• Install hadoop daemons in the nodes.
• We can either use tarball or rpm for
installation.
• rpm inst...
Setting up Client Node
• What is client node ?
• Necessity of a client node ?
• How to configure a client node ?
• What al...
Cluster Configuration
• Storage location for namenode,
secondarynamenode and datanode.
• Number of task slots (map/reduce ...
Namenode - Single point of
Failure
• Why namenode is the single point of
failure?
• How to resolve this issue?
• How backu...
Implementing Schedulers
• Capacity scheduler
• Fair scheduler
Monitoring Hadoop Cluster
• For manual installation, we can use
Ganglia.
• Automated installation tools have built-in
moni...
Ganglia
Cluster Maintenance
• Managing Hadoop processes
– Starting/stopping processes
• HDFS Maintenance
– Adding /Decommissioning...
Backup and Recovery
• Data Backup
– Distributed copy (distcp)
– Parallel Ingestion
• Namenode Metadata
• Hive metastore ba...
Hadoop Upgrades
• Data Backup
• Software upgrade
• HDFS upgrade
• Finalize upgrade
Steps for Hadoop Upgrade
• Make sure that any previous upgrade is finalized before proceeding
with another upgrade.
• Shut...
Routine Administration
Procedures
• Checking every nodes
• Metadata backups
• Data backups
• File system check
• File syst...
Summary
• Hadoop Cluster design
• Hadoop Cluster Installation
• Back up and Recovery
• Hadoop Upgrade
• Routine Administra...
For more info, visit:
http://amalgjose.wordpress.com
http://coderfox.com
http://in.linkedin.com/in/amalgjose
Additional In...
Thank You
Upcoming SlideShare
Loading in...5
×

Deployment and Management of Hadoop Clusters

2,794

Published on

This presentation explains about the end to end architecture of hadoop cluster and the procedures required for the deployement of hadoop clusters.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,794
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Deployment and Management of Hadoop Clusters"

  1. 1. Deployment and Management of Hadoop Clusters Amal G Jose Big Data Analytics http://www.coderfox.com/ http://amalgjose.wordpress.com/ in.linkedin.com/in/amalgjose/
  2. 2. • Introduction • Cluster design and deployment • Backup and Recovery • Hadoop Upgrade • Routine Administration Tasks Agenda
  3. 3. Introduction • What is Hadoop ? • What makes Hadoop different ? • Need for a hadoop cluster ?
  4. 4. This has 4 parts: • Cluster Planning. • OS installation & Hardening. • Cluster Software Installation. • Cluster configuration. Cluster Installation
  5. 5. Cluster Planning Hadoop Daemon Configuration Namenode Dedicated servers. OS is installed on the RAID device. The dfs.name.dir will reside on the same RAID device. One more copy is configured to have on NFS. Secondary Namenode Dedicated Server OS is installed on RAID device Jobtracker Dedicated Server. OS installed on JBOD configuration Datanode/Tasktracker Individual servers. OS installed on JBOD configuration
  6. 6. Workload Patterns For Hadoop • Balanced Workload • Compute Intensive • I/O Intensive • Unknown or evolving workload patterns
  7. 7. Cluster Topology
  8. 8. Name Node Job Tracker Ganglia-Daemon Name Node Job Tracker Ganglia-Daemon MN Hive Pig Oozie Mahout Ganglia-Master Hive Pig Oozie Mahout Ganglia-Master CN Typical Hadoop Cluster Topology Task Tracker Data Node Ganglia-Daemon Task Tracker Data Node Ganglia-Daemon SN
  9. 9. • Creating the instances based on the requirement Creating Instances (in case of cloud)
  10. 10. • We will be installing the Hadoop on the RHEL6 64- bit servers. • OS should be hardened based on RHEL6 hardening document. • Setting iptable rules necessary for hadoop services. • In case of Amazon EC2 instances create key/value pairs for logging in. • GUI can be disabled to make more room for hadoop. • Time should be made same in all the servers. Operating System Hardening
  11. 11. • Choosing the distribution of Hadoop. • Creation of Local Yum Repository. • Java Installation in all the machines. Cluster Software Installation
  12. 12. Hadoop Ecosystem
  13. 13. Installation Methods • Hadoop can be installed either manually or automatically using some tools such as ClouderaManager, Ambari etc. • One click installation tools helps the users to install hadoop on clusters without any pain.
  14. 14. Manual Installation • Install hadoop daemons in the nodes. • We can either use tarball or rpm for installation. • rpm installation will be easier.
  15. 15. Setting up Client Node • What is client node ? • Necessity of a client node ? • How to configure a client node ? • What all services are installed ? • Need for multiuser segregation ?
  16. 16. Cluster Configuration • Storage location for namenode, secondarynamenode and datanode. • Number of task slots (map/reduce slots). – Number of task slots/node = (memory available/child jvm size) • Backup location for namenode. • Configuring mysql for hive and oozie.
  17. 17. Namenode - Single point of Failure • Why namenode is the single point of failure? • How to resolve this issue? • How backup can be achieved?
  18. 18. Implementing Schedulers • Capacity scheduler • Fair scheduler
  19. 19. Monitoring Hadoop Cluster • For manual installation, we can use Ganglia. • Automated installation tools have built-in monitoring mechanisms available.
  20. 20. Ganglia
  21. 21. Cluster Maintenance • Managing Hadoop processes – Starting/stopping processes • HDFS Maintenance – Adding /Decommissioning datanode – Checking file system integrity with fsck. – Balancing hdfs block data – Dealing with a failed disk • Mapreduce Maintenance – Adding /Decommissioning tasktracker – Killing a mapreduce Job/ Task – Dealing with a blacklisted tasktracker
  22. 22. Backup and Recovery • Data Backup – Distributed copy (distcp) – Parallel Ingestion • Namenode Metadata • Hive metastore backup.
  23. 23. Hadoop Upgrades • Data Backup • Software upgrade • HDFS upgrade • Finalize upgrade
  24. 24. Steps for Hadoop Upgrade • Make sure that any previous upgrade is finalized before proceeding with another upgrade. • Shut down MapReduce and kill any orphaned task processes on the tasktrackers. • Shut down HDFS and backup the namenode directories. • Install new versions of Hadoop HDFS and MapReduce on the cluster and on clients. • Start HDFS with the -upgrade option. • Wait until the upgrade is complete. • Perform some sanity checks on HDFS. • Start MapReduce. • Roll back or finalize the upgrade (optional).
  25. 25. Routine Administration Procedures • Checking every nodes • Metadata backups • Data backups • File system check • File system balancer
  26. 26. Summary • Hadoop Cluster design • Hadoop Cluster Installation • Back up and Recovery • Hadoop Upgrade • Routine Administration Procedures
  27. 27. For more info, visit: http://amalgjose.wordpress.com http://coderfox.com http://in.linkedin.com/in/amalgjose Additional Information
  28. 28. Thank You

×