Deployment and Management of Hadoop Clusters

  • 2,110 views
Uploaded on

This presentation explains about the end to end architecture of hadoop cluster and the procedures required for the deployement of hadoop clusters.

This presentation explains about the end to end architecture of hadoop cluster and the procedures required for the deployement of hadoop clusters.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,110
On Slideshare
0
From Embeds
0
Number of Embeds
6

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Deployment and Management of Hadoop Clusters Amal G Jose Big Data Analytics http://www.coderfox.com/ http://amalgjose.wordpress.com/ in.linkedin.com/in/amalgjose/
  • 2. • Introduction • Cluster design and deployment • Backup and Recovery • Hadoop Upgrade • Routine Administration Tasks Agenda
  • 3. Introduction • What is Hadoop ? • What makes Hadoop different ? • Need for a hadoop cluster ?
  • 4. This has 4 parts: • Cluster Planning. • OS installation & Hardening. • Cluster Software Installation. • Cluster configuration. Cluster Installation
  • 5. Cluster Planning Hadoop Daemon Configuration Namenode Dedicated servers. OS is installed on the RAID device. The dfs.name.dir will reside on the same RAID device. One more copy is configured to have on NFS. Secondary Namenode Dedicated Server OS is installed on RAID device Jobtracker Dedicated Server. OS installed on JBOD configuration Datanode/Tasktracker Individual servers. OS installed on JBOD configuration
  • 6. Workload Patterns For Hadoop • Balanced Workload • Compute Intensive • I/O Intensive • Unknown or evolving workload patterns
  • 7. Cluster Topology
  • 8. Name Node Job Tracker Ganglia-Daemon Name Node Job Tracker Ganglia-Daemon MN Hive Pig Oozie Mahout Ganglia-Master Hive Pig Oozie Mahout Ganglia-Master CN Typical Hadoop Cluster Topology Task Tracker Data Node Ganglia-Daemon Task Tracker Data Node Ganglia-Daemon SN
  • 9. • Creating the instances based on the requirement Creating Instances (in case of cloud)
  • 10. • We will be installing the Hadoop on the RHEL6 64- bit servers. • OS should be hardened based on RHEL6 hardening document. • Setting iptable rules necessary for hadoop services. • In case of Amazon EC2 instances create key/value pairs for logging in. • GUI can be disabled to make more room for hadoop. • Time should be made same in all the servers. Operating System Hardening
  • 11. • Choosing the distribution of Hadoop. • Creation of Local Yum Repository. • Java Installation in all the machines. Cluster Software Installation
  • 12. Hadoop Ecosystem
  • 13. Installation Methods • Hadoop can be installed either manually or automatically using some tools such as ClouderaManager, Ambari etc. • One click installation tools helps the users to install hadoop on clusters without any pain.
  • 14. Manual Installation • Install hadoop daemons in the nodes. • We can either use tarball or rpm for installation. • rpm installation will be easier.
  • 15. Setting up Client Node • What is client node ? • Necessity of a client node ? • How to configure a client node ? • What all services are installed ? • Need for multiuser segregation ?
  • 16. Cluster Configuration • Storage location for namenode, secondarynamenode and datanode. • Number of task slots (map/reduce slots). – Number of task slots/node = (memory available/child jvm size) • Backup location for namenode. • Configuring mysql for hive and oozie.
  • 17. Namenode - Single point of Failure • Why namenode is the single point of failure? • How to resolve this issue? • How backup can be achieved?
  • 18. Implementing Schedulers • Capacity scheduler • Fair scheduler
  • 19. Monitoring Hadoop Cluster • For manual installation, we can use Ganglia. • Automated installation tools have built-in monitoring mechanisms available.
  • 20. Ganglia
  • 21. Cluster Maintenance • Managing Hadoop processes – Starting/stopping processes • HDFS Maintenance – Adding /Decommissioning datanode – Checking file system integrity with fsck. – Balancing hdfs block data – Dealing with a failed disk • Mapreduce Maintenance – Adding /Decommissioning tasktracker – Killing a mapreduce Job/ Task – Dealing with a blacklisted tasktracker
  • 22. Backup and Recovery • Data Backup – Distributed copy (distcp) – Parallel Ingestion • Namenode Metadata • Hive metastore backup.
  • 23. Hadoop Upgrades • Data Backup • Software upgrade • HDFS upgrade • Finalize upgrade
  • 24. Steps for Hadoop Upgrade • Make sure that any previous upgrade is finalized before proceeding with another upgrade. • Shut down MapReduce and kill any orphaned task processes on the tasktrackers. • Shut down HDFS and backup the namenode directories. • Install new versions of Hadoop HDFS and MapReduce on the cluster and on clients. • Start HDFS with the -upgrade option. • Wait until the upgrade is complete. • Perform some sanity checks on HDFS. • Start MapReduce. • Roll back or finalize the upgrade (optional).
  • 25. Routine Administration Procedures • Checking every nodes • Metadata backups • Data backups • File system check • File system balancer
  • 26. Summary • Hadoop Cluster design • Hadoop Cluster Installation • Back up and Recovery • Hadoop Upgrade • Routine Administration Procedures
  • 27. For more info, visit: http://amalgjose.wordpress.com http://coderfox.com http://in.linkedin.com/in/amalgjose Additional Information
  • 28. Thank You