• Save
Deploying and Managing Hadoop Clusters with AMBARI
Upcoming SlideShare
Loading in...5
×
 

Deploying and Managing Hadoop Clusters with AMBARI

on

  • 20,573 views

Deploying, configuring, and managing large Hadoop and HBase clusters can be quite complex. Just upgrading one Hadoop component on a 2000-node cluster can take a lot of time and expertise, and there ...

Deploying, configuring, and managing large Hadoop and HBase clusters can be quite complex. Just upgrading one Hadoop component on a 2000-node cluster can take a lot of time and expertise, and there have been few tools specialized for Hadoop cluster administrators. AMBARI is an Apache incubator project to deliver Monitoring and Management functionality for Hadoop clusters. This paper presents the AMBARI tools for cluster management, specifically: Cluster pre-configuration and validation; Hadoop software deployment, installation, and smoketest; Hadoop configuration and re-config; and a basic set of management ops including start/stop service, add/remove node, etc. In providing these capabilities, AMBARI seeks to integrate with (rather than replace) existing open-source packaging and deployment technology available in most data centers, such as Puppet and Chef, Yum, Apt, and Zypper.

Statistics

Views

Total Views
20,573
Views on SlideShare
18,802
Embed Views
1,771

Actions

Likes
54
Downloads
0
Comments
0

17 Embeds 1,771

http://www.techspritz.com 665
http://www.scoop.it 467
http://msaadrashid.wordpress.com 358
http://bigdatafoundation.com 163
http://eventifier.co 82
http://www.linkedin.com 8
http://eventifier.com 7
http://localhost 4
http://192.168.11.45 3
https://hwtest.uservoice.com 3
https://www.linkedin.com 3
http://webcache.googleusercontent.com 2
http://dschool.co 2
http://hootsuite.scoop.it 1
https://twitter.com 1
http://192.168.11.49 1
https://hubcontent.sas.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Deploying and Managing Hadoop Clusters with AMBARI Deploying and Managing Hadoop Clusters with AMBARI Presentation Transcript

  • Deploying and Managing Hadoop Clusters with AMBARIMatt Foley and Hitesh ShahHortonworks, Inc.mfoley@hortonworks.comhitesh@hortonworks.com © Hortonworks Inc. 2012 Page 1
  • Matt Foley - Background•  MTS at Hortonworks Inc. – Hadoop Core contributor, part of original ~25 in Yahoo! spin-out of Hortonworks – Currently managing engineering infrastructure for Hortonworks, including build and deployment automation – My team also volunteers Build Engineering infrastructure services to ASF, for Hadoop core and several related projects within Apache – Participated in the Hortonworks team working on Ambari implementation during transitional phase – Formerly, led software development for back end of Yahoo Mail for three years – 20,000 servers in hundreds of clusters, with 30 PB of data under management, 400M active users•  Apache Hadoop, ASF – Committer and PMC member, Hadoop core – Release Manager – Hadoop-1.0 Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2012
  • Hitesh Shah - Background• MTS at Hortonworks Inc.• Committer for Apache MapReduce and Ambari• Earlier, spent 8+ years at Yahoo! building various frameworks all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2012
  • Overview• Brief history – evolution of the Ambari project• Installation• Monitoring• Management• Invitation Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2012
  • All features are available today• Apologies that screen shots are from HMC (Hortonworks Management Console) version of Ambari• Same code as current Ambari, but with Hortonworks graphic elements• You too can “skin” Ambari with your own logotype and graphic elements! Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2012
  • HistoryOf AmbariArchitecting the Future of Big Data Page 6© Hortonworks Inc. 2012
  • Brief History of the Ambari Project• Deployment, Monitoring, and Management of Hadoop and HBase clusters is: – HARD, due to massive scale and distributed services; and – DIFFERENT from other kinds of compute clusters, due to Hadoop’s intrinsic fault-tolerance• We needed an Apache opensource solution• Started Ambari as an Apache incubator project – Originally based in part on what was learned from “Hadoop Management System” project out of Yahoo! Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2012
  • History (continued)• Early work specified a full architecture, including many elements that remain today: – State-based configuration management, rather than event-based – Cluster configuration as a data object, able to be saved and manipulated – Reliable deployment, parallelized for scalability – Insightful monitoring and alerting, sharing our deep experience with the community – Take advantage of Puppet to achieve idempotence on installs, and reliable start/stop of processes – Go beyond Puppet to offer orchestrated start/stop of distributed services• The team started with a “whole cloth” design and build project• 6 months into it, we figured out we had a 2-year project on our hands! Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2012
  • Evolution•  How to get a useful tool out to the community sooner?•  Make more use of existing tech – Ganglia and Nagios for monitoring and alerting – Puppet for reliable deployment and process control•  Commit to incremental delivery – First generation won’t have all the breadth and features desirable – But will be useful and worth using•  And the team has completed the first usable version of Ambari over the last few weeks! – Offers a good, GUI-driven Deploy experience, currently limited to RHEL5/ CentOS5 and non-secure mode (but just wait a few more weeks!) – Quite nice Monitoring, based on our experience managing multi- thousand-node Hadoop clusters at Yahoo! – A beginning on Management, with several basic post-install operations Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2012
  • DeploymentWith AmbariArchitecting the Future of Big Data Page 10© Hortonworks Inc. 2012
  • Deployment and Installation Phases• Preparation• Cluster Pre-config• Hadoop Stack Configuration• Hadoop Stack Deploy / Install• Service start-up and smoke test Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2012
  • Deployment and Installation (Preparation)•  Prepare Ambari and the Ambari Agent (includes Puppet agent) –  Can follow instructions at http://svn.apache.org/viewvc/incubator/ambari/trunk/README.txt –  Or download the HMC from Hortonworks after Summit, and access its documentation•  Prepare access to ‘yum’ Repositories containing Hadoop Stack and Ambari dependencies –  If your nodes have direct internet access, can use provided RPMs to “install” the repos on each node –  Or, to avoid direct access from each node and minimize WAN traffic, can mirror the yum repositories to an internal server accessible from the nodes•  Prepare nodes for installation commands –  Set up password-less ‘ssh’ for root user (secured via public keys and agent forwarding) from Install Master node to all other cluster nodes, so can run ‘yum install’ and ‘puppet’ commands –  Take care of any other issues that may prevent root ssh during the Deployment phase, such as iptables or SELinux. Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2012
  • Deployment and Installation (Pre-config)• Start running Ambari• Provide list of hosts – Works with Amazon EC2 IP addresses too• Ambari does node Validation and Discovery – Confirms availability and access capability – Scans for node attributes and mount points• Select desired services and data directory paths• Automatic role assignments to nodes, with your approval – Based on node attributes and selected services – Currently based primarily on memory size, to be refined in future Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2012
  • Deployment and Installation (Configuration)•  Currently supported Hadoop Stack components for installation: – Hadoop Core (required) – HBase – Pig – Hive – HCatalog – Zookeeper (required for HBase, Hive, Hcat) – Sqoop – Oozie – Ganglia – Nagios•  Modify a subset of about 50 key parameters that most commonly need to be adjusted, depending on components selected Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2012
  • Deployment and Installation (Deploy)•  Final review of Cluster and Stack parameters•  Puppet agent on each node is invoked (in parallel) to reliably deploy needed packages•  Actual fetch and install is managed with ‘yum’ (for RHEL/CentOS) or comparable services•  Success / failure is reported back to Install Master and the Ambari application•  Log messages for failures are provided to assist debugging Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2012
  • Deployment and Installation (Smoke Test)After successful install:•  Ambari provides “orchestration” to start-up distributed services in dependency order•  Puppet “kicks” are used to reliably (mostly) start and stop service processes on individual nodes•  After each distributed service is started, a smoketest is run and results reported•  Each component is smoketested before dependent componentsAfter successful smoketest, you can be confident that yourselected components have been successfully installed andstarted, and are running correctly. Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2012
  • Going forward•  Multiple OS support – RHEL6/CentOS6 – Ubuntu and Debian – SUSE/SLES – Windows•  Hadoop Security support, including secure install for all components•  HA support•  Hadoop 2.0 support•  Improved GUI user interface•  Integration: Provide CLI commands for invoking Puppet scripts, and Web APIs where appropriate•  Etc. Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2012
  • MonitoringWith AmbariArchitecting the Future of Big Data Page 27© Hortonworks Inc. 2012
  • Monitoring DashboardArchitecting the Future of Big Data Page 28© Hortonworks Inc. 2012
  • Ambari Monitoring•  Basic Monitoring capabilities for Hadoop Cluster Services –  Up/Down status for installed Hadoop services –  Key Alerts configured for health, performance and usage monitoring of Hadoop services –  Consolidated summary information for Hadoop Services (HDFS, M/R & HBase) –  Key service metrics graphs for temporal analysis of service performance, utilization and health (+System metrics - Cpu/Memory/Network etc.)•  Efficient collection and visualization of monitoring metrics –  Light weight alert condition checks (mostly over network) for better scalability•  Leverage Open Source monitoring systems such as Nagios & Ganglia –  Nagios - for Alert Monitoring –  Ganglia/RRDTool for Hadoop metrics graphs•  Simple and Intuitive UI to monitor the Hadoop cluster status Architecting the Future of Big Data Page 29 © Hortonworks Inc. 2012
  • HDFS ServiceArchitecting the Future of Big Data Page 30© Hortonworks Inc. 2012
  • Map/Reduce ServiceArchitecting the Future of Big Data Page 31© Hortonworks Inc. 2012
  • HBase ServiceArchitecting the Future of Big Data Page 32© Hortonworks Inc. 2012
  • Going forward•  Rapid iterations with Ambari Open Source community to add more monitoring capabilities e.g. –  More services Alerts, Summary stats & Reporting for the Hadoop services –  Queue/Job level monitoring & Diagnostic Reporting for M/R –  Improved Visualization of service metrics graphs & reports –  Ability to customize dashboard with relevant graphs, alerts and service information•  RESTful APIs for Hadoop Monitoring –  For integration with Enterprise and Cloud Management Systems, and “powered by Ambari” products integration –  CLIs•  Ability to integrate with third party monitoring tools in place of Nagios & Ganglia•  Best practices, tips and guidelines for using Monitoring dashboard for identifying and debugging common cluster problems Architecting the Future of Big Data Page 33 © Hortonworks Inc. 2012
  • ManagementWith AmbariArchitecting the Future of Big Data Page 34© Hortonworks Inc. 2012
  • Management• “Management” can include many different post-install activities with Hadoop clusters• Ambari currently supports only a small set: – Start / Stop individual services – Dependent services will be automatically stopped also – Change configuration parameters for a service – Cannot currently change data directory paths – Add nodes to the Cluster – Decommissioning nodes is currently a manual process – Uninstall the Cluster Architecting the Future of Big Data Page 35 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 36 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 37 © Hortonworks Inc. 2012
  • . Architecting the Future of Big Data Page 38 © Hortonworks Inc. 2012
  • Going forward•  Lots more management actions supported – Security and user management – HA alerting and recovery – Extensions of current functionalities – Etc.•  Integration: RESTful APIs / web services for integration with established management tools in the data center•  Improved GUI user interface Architecting the Future of Big Data Page 39 © Hortonworks Inc. 2012
  • Invitation• Deployment, Monitoring, and Management – this is just the first generation!• If you are interested in these functionalities and want to participate in an Apache opensource project, please consider becoming a contributor to the AMBARI (incubating) project!• http://incubator.apache.org/ambari/mail-lists.html Architecting the Future of Big Data Page 40 © Hortonworks Inc. 2012
  • Thank you. Architecting the Future of Big Data Page 41 © Hortonworks Inc. 2012