Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Managing Enterprise Hadoop Clusters with Apache Ambari

1,938 views

Published on

Slides from ApacheCon

Published in: Technology
  • Be the first to comment

Managing Enterprise Hadoop Clusters with Apache Ambari

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Managing Enterprise Hadoop Clusters with Apache Ambari Jayush Luniya @ Hortonworks Apache Ambari PMC © Hortonworks Inc. 2011 – 2016. All Rights Reserved May 2016
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Ambari Overview Ambari Features Demo Q&A
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s Apache Ambari? 100% open-source platform for simplifying Hadoop cluster management and use. Highly extensible.
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved It’s a wild zoo out there! Gotta manage this efficiently.
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ambari Themes • Deliver the core operational capabilities to provision, manage and monitor Hadoop clusters at scale. Operate Hadoop at Scale • Robust API for integration with existing enterprise systems, such as Microsoft SCOM and Teradata Viewpoint. Integrate with the Enterprise • Provide extensible platform for Customers, Partners and the Community (Stacks, Views) Extend for the Ecosystem
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Ambari
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Open Source Activity
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Inception: AMBARI-1 (Sept, 2011)
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Fast forward 5 years to today…  Latest JIRA: AMBARI-16131  150+ Contributors  60+ Committers  16131 JIRAs filed  14254 JIRAs fixed At 1.5 day per JIRA ~ 90 person years!  Used by hundreds of companies
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari – 3rd Biggest Project* @ Apache * Based on total JIRAs filed on a project basis as of April 26, 2016 #2: Hadoop at ~32k as it is split across multiple JIRA Projects #1 #3 #4 #5
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Ambari 1.6.* May 2014 908 JIRAs Ambari 1.5.* Apr 2014 1218 JIRAs Ambari 1.7.* Dec 2014 1620 JIRAs Ambari 2.0.* April 2015 1804 JIRAs Current GA Version (2.2.2) Ambari 2.1.* July 2015 2674 JIRAs Ambari Stacks Resolution of 9k+ JIRAs Ambari Blueprints Ambari Views Alerts Framework Metrics System Rolling Upgrade Kerberos Automation Enhanced Dashboards Smart Configs Ambari 2.2.* Dec 2015 1542 JIRAs Express Upgrade AMS Grafana
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Ambari Overview Ambari Features Demo Q&A
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extensibility Features • To add new Services (ISV or otherwise) beyond HDP stack • To customize a Stack for customer specific environments Stacks • To use Ambari for automating cluster installations. • To share best practices on layout and cluster configuration Blueprints • To extend and customize the Ambari Web UI • Add new capabilities, customize existing capabilities Views
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Anatomy of Ambari Extension Points
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Stacks
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Terminology Term Definition Examples STACK Defines a set of Services, where to obtain the software packages and how to manage the lifecycle. HDP-2.3, HDP-2.2 SERVICE Defines the Components that make-up the service. HDFS, NAGIOS, YARN COMPONENT The building-blocks of a Service, that adhere to a certain lifecycle. NAMENODE, DATANODE, OOZIE_SERVER CATEGORY The category of Component. MASTER, SLAVE, CLIENT REPO Repository metadata where the artifacts reside http://public-repo- 1.hortonworks.com/HDP/centos6/2 .x/GA/2.3.0.0
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Stack  Stacks define Services + Repo – What is a stack, and where to get the bits  Each service has a definition – What components are part of the Service  Each service has defined lifecycle commands – start, stop, status, install, configure  Lifecycle is controlled via command scripts  Ability to define “custom” commands Ambari Server Stack Service Definitions Command Scripts xml python Ambari Agents Repos
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stacks Support Inheritance HDP 2.1 Stack HDP 2.0 Stack  Overrides any Service definitions, commands and configurations  Adds new Services specific to this Stack  Defines a set of Service definitions  Default service configurations and command scripts
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Blueprints
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Automated Cluster Deployment  Deploy clusters of any scale with ease  Two REST API calls is all it takes to provision a cluster Who uses it?  HDInsight (Microsoft Azure)  Hortonworks QA
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example: Create a 100-node Cluster { "configurations" : [ { ”hdfs-site" : { "dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : ”worker-host", "components" : [ { "name" : ”DATANODE” }, { "name" : ”NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" } } { "blueprint" : ”my-blueprint", "host_groups" :[ { "name" : ”master-host", "hosts" : [ { "fqdn" : ”master001.ambari.apache.org” } ] }, { "name" : ”worker-host", "hosts" : [ { "fqdn" : ”worker001.ambari.apache.org” }, { "fqdn" : ”worker002.ambari.apache.org” }, … { "fqdn" : ”worker099.ambari.apache.org” } ] } ] } 1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cluster Replication { "configurations" : [ { ”cluster-env" : { ”user_group" : ”hadoop" } ”hdfs-site" : { "dfs.datanode.data.dir" : ”/hadoop/1,/hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : ”master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" } ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.0" } } GET/api/v1/clusters/my- cluster?format=blueprint  Export blueprint from an existing cluster  Import blueprint to replicate the cluster
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Blueprint Features Ambari 2.0:  High availability (HA) cluster deployments  Adding hosts using blueprints (AMBARI-8458) Ambari 2.1:  Advanced cluster creation options (AMBARI-10750) Ambari 2.2:  Kerberized cluster deployments (AMBARI-13431)  Stack advisor recommendations (AMBARI-13487)
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrades
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrades  Rolling vs Express Upgrade modes  Side-by-Side Bits and Configs Bits: /usr/hdp/2.2.0.0-2041 /usr/hdp/2.2.4.2-2 /usr/hdp/2.3.0.0-3000 Configs: /etc/hive/conf/ (initial) /etc/hive/conf/v0 (HDP 2.2.4.2) /etc/hive/conf/v1 (HDP 2.3) 2.2.0.0 2.2.4.2 2.3.0.0minor jump major jump
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Express vs Rolling Upgrade Rolling Upgrade  Services are up the entire time  Upgrade one component at a time  Robust and fault-tolerant  Service checks performed frequently during the upgrade Express Upgrade  All services are brought down, upgraded and restarted  Faster upgrade mode  Planned service downtime  Relatively service checks performed less frequently during the upgrade.
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrade – Install Version  Install new version in parallel on all agents  No downtime
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrade – Orchestration  Not necessarily “one-click” but fully guided
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrade – Upgrade Catalog  Upgrades are driven by upgrade catalogs defined in stack definitions.  Defines upgrade groups and upgrade order  Provides ability to modify configurations – Set, move, delete, transform  Upgrade steps can be marked as skippable and retryable  Supports executing custom scripts during upgrade
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Upgrade – Upgrade Catalog
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Downgrade  Can trigger downgrade at any stage of the stack upgrade  Cannot downgrade once stack upgrade has been finalized
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Smart Configurations
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Configuration Challenges  Too many configurations – Which ones are important?  Too easy to mess up – What are valid/reasonable values? – What are the units? – Ok, what about dependencies?  Gets harder with combinations of services, host assignments, enabled features, CPU/RAM/disks, etc – Any recommendations? What am I doing wrong?  Smart Configurations
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Smart Configs UI Customizable layout - Tabs - Sections - Sub-sections - Simple grid layout (Advanced Tab contains remaining configurations) New Widgets - Sliders - Recommended - Minimum - Maximum - Increment Step - Combos - Enumerated values - Toggles - Binary options - Spinners - Splits value into multiple controls. Time in milliseconds split into days, hours, minutes. - Lists - Enumerated values - Single select - Multi select Implemented - HDFS - YARN - MapReduce - Hive - HBase
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stack Driven Layouts Stack has theme.json file Layout  Tabs  Sections  Sub-sections Placement  Configs placement in sub-sections Widgets  Widget type  Optional Units  Bytes (B, KB, MB, GB, TB, PB)  Time (Millis, Seconds, Minutes, Hours, Days, Months, Years) { "name": "default", "description": "Default theme for HBASE service", "configuration": { "layouts": [ { "name": "default", "tabs": [ { "name": "settings", "display-name": "Settings", "layout": { "tab-columns": "3", "tab-rows": "3", "sections": [ ... ] } } ] } ], "placement": { "configuration-layout": "default", "configs": [...] }, "widgets": [ { "config": "hbase-env/hbase_master_heapsize", "widget": { "type": "slider", "units": [ { "unit-name": "GB" } ] } }, ... ] } }
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Config Metadata and Dependencies Extended Metadata  Defined in property_value_attributes  Hold non-UI metadata about value range, increment, unit, etc Dependencies  Models bi-directional relationship between configs  Depends On (property_depends_on)  Answers “which configs do I depend on?”  Depended By (dependencies)  Answers “which configs are dependent on me?”  Ambari automatically updates dependencies { "StackConfigurations": { "final": "false", "property_depends_on": [ { "type": "yarn-site", "name": "yarn.nodemanager.resource.memory-mb" } ], "property_description": “The minimum allocation for every", "property_display_name": "Minimum Container Size (Memory)", "property_name": "yarn.scheduler.minimum-allocation-mb", "property_type": [], "property_value": "512", "property_value_attributes": { "type": "int", "maximum": "5120", "minimum": "0", "unit": "MB", "increment_step": "256" }, "type": "yarn-site.xml" }, "dependencies": [ { "StackConfigurationDependency": { "dependency_name": "hive.tez.container.size", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.map.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }, { "StackConfigurationDependency": { "dependency_name": "mapreduce.reduce.memory.mb", "property_name": "yarn.scheduler.minimum-allocation-mb” } }… ] }
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Metrics
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Metrics Service (AMS) - Goals  Ability to collect metrics from Hadoop and other Stack services  Ability to collect system level metrics  Ability to retain metrics at a high precision for a configurable time period  Ability to automatically purge metrics after retention period  Provide integration point for metrics collection and retention by external system  Trigger alerts based on metrics in Ambari
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Metrics System - Architecture
  40. 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AMS Grafana Ambari 2.2.2  Powerful dashboard builder integrated with AMS  Pre-built Grafana dashboards for host-level and service-level metrics  User can build and save custom dashboards
  41. 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AMS Grafana
  42. 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Alerts
  43. 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Alert – Types Type Description Status Thresholds Configurable? PORT Watches a port based on a configuration property such as the URI. OK, WARN, CRIT Yes (seconds) WEB Watches an HTTP or HTTPS endpoint and determines connectivity and HTTP status code. OK, WARN, CRIT No AGGREGATE Aggregate of status for another alert definition. OK, WARN, CRIT Yes (percentage) METRIC Watches a metric or series of metrics in JMX and compares a mathematical result against a threshold. OK, WARN, CRIT Yes (variable) SCRIPT Uses a custom script to handle checking. OK or CRIT No
  44. 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved UI – Current Alerts Configured by default; managed via the the web client
  45. 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved UI – Host Alerts  Automatically refreshes  Query alert history
  46. 46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved UI– Customization & Instances  Status text, thresholds, and interval
  47. 47. 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Views
  48. 48. 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Views View Framework  Provide various applications accessible from Ambari Web UI – interact with the cluster via a browser from a single place for all users (cluster operators, data analysis, developers, etc) Easy to develop  No need to understand Ambari core code – view development is just like creating any other web application Easy to deploy  Packaged as a single jar file  Auto create / auto configure
  49. 49. 49 © Hortonworks Inc. 2011 – 2016. All Rights Reserved CS Queue Manager for Cluster Operators Capacity Scheduler Queue Manager
  50. 50. 50 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS File Browser for General Users HDFS File Browser
  51. 51. 51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Job Analysis for Developers Troubleshoot Tez JobsTroubleshoot / Improve Hive queries
  52. 52. 52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Editors for Data Analysts Create, edit, execute, and analyze Hive queries Create, edit, and execute Pig scripts
  53. 53. 53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ambari Server in Views-Only mode Ambari Server Cluster managed by Ambari Ambari Server “Views-only” mode (aka “Stand-alone” mode) Cluster not managed by Ambari Management Use Views Use Views Use Views  Use Views on existing clusters not managed by Ambari  Can use Views against multiple clusters
  54. 54. 54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos Automation
  55. 55. 55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberos Automation Ambari 2.0  Ambari manage Kerberos principals and keytabs  Works with existing MIT KDC or Active Directory  Once Kerberized, seamlessly handle:  Adding new hosts  Adding new components to existing hosts  Adding new services  Moving components to different hosts
  56. 56. 56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Ambari Overview Ambari Features Demo Q&A
  57. 57. 57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Ambari Overview Ambari Features Demo Q&A
  58. 58. 58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You! Try Ambari  Follow the Ambari Quick Start Guide https://cwiki.apache.org/confluence/display/AMBARI/Quick+Start+Guide Learn more  Visit the project website http://ambari.apache.org/ Get Involved  User Mailing List: user-subscribe@ambari.apache.org  Developer Mailing List: dev-subscribe@ambari.apache.org  Use JIRA to file bugs and improvement requests https://issues.apache.org/jira/browse/AMBARI/ Jayush Luniya @ Hortonworks (Apache Ambari PMC)
  59. 59. 59 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Roadmap  AMS Grafana Integration  Ambari Management Packs  Ambari Logsearch  Patch Upgrades  Multi Service Versions  Multi Service Instances
  60. 60. 60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q&A Stats Largest production clusters managed by Ambari ~1600 nodes, ~800 nodes Largest test cluster for Ambari scale testing ~400 nodes Largest test cluster where rolling upgrade was performed ~400 nodes ~40 hours

×