Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Distro-independent Hadoop cluster management

1,306 views

Published on

Hadoop Summit 2015

Published in: Technology
  • Be the first to comment

Distro-independent Hadoop cluster management

  1. 1. Distro-independent Hadoop cluster management Denis Shestakov Hadoop engineer Denis.Shestakov@BrightComputing.Com
  2. 2. Hadoop cluster dream … • Fast deployment • Easy maintenance and management • Re-use of existing IT personnel’s expertise • Cost-efficiency
  3. 3. Outline • Overview of Hadoop cluster operations • Hadoop deployment/maintenance • Architecture for distro-agnostic Hadoop cluster manager • Bright Cluster Manager for Apache Hadoop • Open issues
  4. 4. Hadoop cluster operations
  5. 5. Hadoop cluster operations • (hardware selection/network design/storage considerations/…) • Deployment • Provisioning • Hadoop distro & component selection • Initial setup and configuration • Management • Monitoring • Health-checking • Optimization
  6. 6. Hadoop cluster operations • Deployment • Provisioning • Hadoop distro & component selection • Initial setup and configuration • Management • Monitoring • Health-checking • Optimization Cluster Hadoop stack
  7. 7. Hadoop deployment/maintenance
  8. 8. Hadoop deployment • Includes cluster deployment and Hadoop-stack deployment • Without proper infrastructure setup, Hadoop will not run • Proper Hadoop setup relies on network/OS/filesystem tuning • Knowledge and expertise in both are rare
  9. 9. Hadoop deployment Challenge for cluster admins • E.g., configuring Hadoop components • Understanding of HDFS/MapReduce/YARN/HBase/… essential • Numerous configuration settings • Hadoop distribution choice • Different Hadoop modes • HDFS & HDFS HA, YARN & YARN HA, …
  10. 10. Hadoop deployment Challenge for Hadoop admins • Configuration is still hard • Numerous configuration settings • Deprecated properties • MRv1 to YARN migration • OS/network tuning essential
  11. 11. Hadoop deployment/maintenance • Hadoop validation by running test/benchmarking jobs • Monitoring and health checking on both OS and Hadoop stack levels • Upgrades
  12. 12. Hadoop deployment/maintenance • Provisioning workflow • Automation tools (chef, puppet, scripts, …) • Monitoring tools • Hadoop cluster manager
  13. 13. Hadoop deployment/maintenance Provisioning workflow Automation tools (chef, puppet, scripts, …) Monitoring tools Hadoop cluster manager Unified tool?
  14. 14. Architecture for distro-independent Hadoop manager
  15. 15. Architectural considerations • Pilot/development/production Hadoop cluster • Choose from different Hadoop distros/versions • Several types of nodes: • Master nodes • Worker nodes • Gateway nodes • Worker nodes have similar OS/software stack • Cluster growth expected: more workers added • Easy node replacement • Heterogeneous hardware • Grouping nodes by their hardware
  16. 16. Architecture Cluster head node Node-A Node-C Node-B Cluster Management Interface Third-Party Applications Cluster management daemon
  17. 17. Architecture • Cluster management daemon: • Low overhead • All nodes run the same daemon • Assigned roles define which tasks cluster management daemon can perform
  18. 18. Architecture • Role can be assigned to a node to do a task • E.g., a provisioning role makes a node to spread software images onto other nodes • HDFS NameNode role makes a node to store HDFS metadata and control nodes with HDFS DataNode roles • Assignment of HDFS DataNode role to a node: adding and starting DataNode service
  19. 19. Bright Cluster Manager for Apache Hadoop
  20. 20. Architecture Bright Cluster CMDaemon head node node001 node003 node002 JSON+SSL JSON API+SSL Cluster Management GUI Cluster Management Shell Web-Based User Portal Third-Party Applications
  21. 21. Interfaces Graphical User Interface (GUI)  Offers administrator full cluster control  Standalone desktop application  Manages multiple clusters simultaneously  Runs natively on Linux, Windows and OS X Cluster Management Shell (CMSH)  All GUI functionality also available through Cluster Management Shell  Interactive and scriptable in batch mode Cluster Management GUI Cluster Management Shell
  22. 22. Managing Clusters • Bright Cluster Manager manages several types of clusters • HPC, private cloud (OpenStack), … • Hadoop • Cluster of any type: • Deployed • Configured • Provisioned • Managed • Monitored • Health-checked
  23. 23. Hadoop support • Choice of distributions • Management/monitoring from one place • CLI and GUI: cmsh, cmgui • Hadoop stack support • Including support for Spark (Spark Standalone mode since release 7.1) • Flexible configuration
  24. 24. Hadoop configuration Hadoop configuration through roles • Nodes configured to run certain Hadoop related services by assigning roles • 15 Hadoop and 3 Spark roles: E.g., HDFS DataNode, MRv1 JobTracker, YARN ResourceManager, HBaseMaster, Zookeeper, SparkWorker, … • Assigning/unassigning role will: • Write out corresponding configuration files based on configurable role parameters • Start/stop/monitor relevant services • Hadoop configuration settings changed from inside Bright
  25. 25. Bright’s Hadoop Cluster Management Bright Cluster Manager 7.1 for Apache Hadoop • Just released • Single-pane-of-glass for managing both physical cluster and Hadoop • Easy installation of Hadoop • Apache Hadoop 1.2.1, 2.6.0 (on Bright DVD) • Cloudera CDH 4.6.x, 4.7.x, 5.2.x, 5.3.x (5.4.x soon) • HortonWorks HDP 1.3.x, 2.1.x, 2.2.x • Pivotal HD 2.1.0 (3.0.0 soon) • Configuration, monitoring and healthchecking of Hadoop instances • Graphical UI, command-line interface and API access
  26. 26. Key Features • Multiple Hadoop cluster instances on same cluster • Choice of Hadoop distributions/versions • Flexible Hadoop configuration controlled through GUI and CLI • Hadoop configuration groups address ‘cluster heterogeneity’ problem • JSON/Python API • Scriptable deployment/configuration operations • Alternative filesystems to HDFS (e.g. Hadoop on Lustre)
  27. 27. Open issues & conclusion
  28. 28. Open issues Building and running cost-efficient Hadoop clusters • Hard to optimize • Workload-specific • Tuning on all levels: OS/network/Hadoop • Bright’s architecture • All cluster/Hadoop operational data aggregated in one place • Flexible configuration of hardware/software components
  29. 29. Conclusion • Architecture of distro-agnostic Hadoop cluster manager • Bright provides tried & tested implementation of this architecture • Hundreds of clusters are being managed using Bright Cluster Manager • Complete solution for setup, management & monitoring of Hadoop clusters • Single pane of glass for cluster & Hadoop stack • Well suited for ‘multi-purpose’ clusters: e.g., supporting both HPC computations and Hadoop jobs
  30. 30. Come to our booth • Meet with Bright guys • See demo • Tell us about your cluster
  31. 31. Credits
  32. 32. Questions? BigDataTeam@brightcomputing.com

×