• Save
Managing 2000 Node Cluster with Ambari
Upcoming SlideShare
Loading in...5
×
 

Managing 2000 Node Cluster with Ambari

on

  • 2,102 views

 

Statistics

Views

Total Views
2,102
Views on SlideShare
1,944
Embed Views
158

Actions

Likes
8
Downloads
0
Comments
0

2 Embeds 158

http://www.scoop.it 157
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Welcome to the Apache Ambari talk, your speakers today are myself Siddharth and my colleague Srimanth from Hortonworks. <br />
  • With increased adoption of Amabri throughout the enterprise the focus at the moment scale out to 1000s of node. <br /> With that in mind the focus of the talk is to demonstrate operations on a 2K node cluster with a glimpse at the future goals <br /> We will look at awesome features that are a part of 1.6.0 <br /> Along with things that truly identify Ambari as a platform, that is Views and Extensibility <br /> If you attend the birds of feather session tomorrow, we can do a further deep dive into these new development
  • This slide the represents Ambari’s position in the Hadoop technology stack and highlights key integration points with services that are either Cloud compute providers or big data analytics platforms <br /> By the end of the talk you would get a fairly good idea of how Ambari enables the integration of these providers with the Hadoop eco-system
  • Orchestrator: Ambari State machine combined with the Action scheduler and the Heartbeat handler <br /> Request Dispatcher: Service Provider interface and Resource provider layer <br /> Clusters / Stacks etc. are all resources from Ambari API standpoint <br /> Monitoring subsystem comprises of Ganglia as the metrics system and Nagios for the alerts
  • Host Component isolation for Ambari Server, Ganglia and Nagios and Masters <br /> All testing done on VM’s on the cloud
  • So now we are going to look at a video. The story here is: <br /> Let say you have sizeable cluster with need for additional compute capacity. And the new hardware that you intend to add needs to be configured differently from the existing cluster configuration. <br /> We begin by well looking at the dashboard that shows the 2000 Slave nodes and rest of the nice and customizable Ambari widgets <br /> Next step is to actually choose the groups of hosts that you want to customize. <br /> What we are doing here is grouping hosts together using Config groups and we give it a name. <br /> Lets select a few data nodes to demonstrate this. <br /> Note: Since this is a paid cluster and it expensive to keep it running so we are showing you a video. <br /> The Config group manager allows you to filter by Component and regular expressions, we make sure Datanode hosts are the only ones in the filter <br /> Next use an expression to choose hosts you want, here I just chose them at random. <br /> Now to actually making config changes <br /> Restart all will restart in one shot and apply the config <br /> The other option actually allows you to do rolling restart
  • When rubber meets the road what do we see as the performance bottleneck: <br /> The monitoring and alerting subsystems on large clusters are bogged down by the amount of I/O operations to write relatively small amount of data at a high frequency to permanent storage. <br /> These numbers for iostats are close to when we began optimizing performance, as you can see we were writing at 1GB/min
  • The most significant metric I would like to present is the load average improvements achieved through performance tuning effort <br /> It involved tuning the rrdcached daemon used to write ganglia data and also reading it back using Ambari API as well as Nagios <br /> Objective of this exercise is to certify Ambari with 2K nodes on run of the mill VMs with little to no optimization below the application stack and achieve acceptable performance for all management and monitoring operations. <br /> In theory it is possible to go above an beyond this magic number <br /> The goal is to actually scale to 10K+ nodes managed by single Ambari instance <br /> <br />
  • This is still a conceptual architecture and you can follow the discussion on the Apache Jira that is listed <br /> Quick word on the architecture, it involves scaling out of the collector daemon in proportion to cluster size <br /> The Views that you see in this picture will be part of the later slide deck and here to represent capability to extend Ambari for provide user interface of your choice to visualize data in Hadoop cluster <br />
  • Integrated with open source Quartz scheduler <br /> API to schedule batch of requests to be executed as per schedule
  • Rolling restart is the first use case for request scheduling <br /> Schedule and go home.
  • Host Configuration group is way of associating a set of configurations to a group of hosts per service <br /> This feature is supported with Blueprints as well, so the touch-less install can still incorporate heterogeneous target hosts <br />
  • - Additionally any custom property can be added to existing configuration
  • - Selective application of changed configs and know exactly when and where to apply them.
  • Blueprint as the name suggests is a declarative definition on the cluster which can be exported as a document from a live cluster or imported to create a new cluster from existing blueprint. <br /> Real word use cases: The Savanna project, Launchpad on Microsoft Azure
  • Quick look at how to create a cluster using blueprint <br /> Define, Host Groups: Can be thought of as all unique set of components and configurations that represent hosts in you cluster with cardinality from 1 to N. <br /> Capture non-default configuration overrides <br /> Point to stack name and version to use <br /> When you POST to create a cluster you get back a request id that can be used to track progress of deployment <br />
  • Real world use case of blueprints <br /> HDP Launchpad for Azure (Linux) lets you spin up HDP clusters super easily - no need to for you to spin up VMs, create images, setup ssh etc. All you need is your Azure Account (with a credit card in good standing) to get started <br /> Once you get the launchpad going it will do *everything* for you and publish Ambari URL for control entry point.  <br /> Under the hood, after running some Azure provisioning and setup scripts, all the goodness coming from Ambari Blueprint <br />
  • When you manage a cluster of size 2000 nodes, you need ability to perform operations in bulk. <br /> Bulk host operations are now available on Hosts page <br /> <br /> Basically you identify which hosts – either all, filtered or selected <br /> Then you perform operations – either host level, or component level operations <br /> <br /> Components generally tend to be slaves/workers which are larger in number <br />
  • Component operations tend to perform operations in batches. <br /> <br /> For clusters with 2000 nodes you need good filters to easily find the appropriate hosts. Ambari provides 13 filters on its hosts page to help you.
  • So lets say <br /> Hardware change/replacement on some nodes <br /> Experimenting with service configurations <br /> Turning off a service completely <br /> Deleting cluster nodes <br /> <br /> Maintenance Mode sliences alerts and skips operations.
  • Inheritance cannot be turned off on lower levels <br />
  • We support safely moving the following master components from one host to another. <br /> Even the 2 namenodes in HDFS HA.
  • Hadoop is an ecosystem with many services, many users and many many usecases. <br /> Even with all the functionality provided in Ambari, there will always be a different way to use and view your cluster. <br /> <br /> To allow users and admins to extend and contribute their own ‘view’ of the cluster, Ambari is providing the ‘Ambari Views’ framework. <br /> Developers can now create their ‘view’ using this framework. <br /> <br /> Gives users and administrators a single entry point into the cluster and allows for very interesting possibilities. <br /> <br /> Views also nicely complement stack extensibility on the backend, by providing appropriate views for them in the front end. <br /> <br /> Question: What is the admin functionality of views? <br /> <br />
  • This is Tech Preview being shown
  • view.xml – view descriptor <br /> Web-inf/lib – 3rd party libraries <br /> Web-inf/web.xml – define custom servlets (non-REST) <br /> Classes – application logic <br /> Index.html/javascripts/… - UI <br /> <br />
  • View descriptor is the central entry point. <br /> <br /> Here you can see the view Id, display label you see in the menu, version of the view. <br /> Each JAR is for a version of the view. A view version can have many instances of the view. <br /> <br /> Each view can also define the parameters it needs to work – here you see list of cities this weather view needs. <br /> You also see a REST resource defined – all you need to implement is the Java bean and a JAX-RS annotated class. <br /> Each view can optionally define instances by default… here you see Europe. HDFS view does not have any instances because location of NameNode is a runtime value – not known at packaging time.
  • Once view jar is place into Ambari, you can then see the views, versions and instances. <br /> You can create/update/delete view instances via calls. <br /> <br /> So if your 3rd party tool wants a view to HDFS, they can create instance and send user to link.
  • Something that is being worked on is administration ability for views. Admins can configure views, provide entitlement for users, etc.
  • So admins can control the cluster, and users can view the cluster and use it.
  • In Hadoop 1.0 we visualized MapReduce jobs, their depdencies, and how the map and reduce tasks performed.
  • In Hadoop 2.0 MapReduce has been made more generic in Apache Tez. <br /> <br /> Apache™ Tez generalizes the MapReduce paradigm to a more powerful framework for executing a complex DAG (directed acyclic graph) of tasks. <br /> As you can see Hive, Pig and other data processing services are being ported on top of Tez. <br /> <br /> For Hadoop 2.0 Ambari visualizes Hive queries using Tez engine.
  • Each Hive + Tez query is shown in the jobs table. Going to an individual job shows the Tez DAG mixed in with Hive information.
  • HDFS_ prefixed counters come from HDFS. They generally tend to be on first and last vertices of the DAG because that’s where they read and write from data. <br /> FILE_ prefixed counters are local disk accesses for the vertex… they represent data read/written during spilling. It does not represent data transferred between vertices. <br /> SPILLED_RECORDS – In Tez spilling of records can not only happen during vertex output (like MapReduce), but also at vertex input. For a vertex this number is for both. <br /> <br /> <br /> <br /> <br /> Tasks <br /> - FILE_BYTES_READ <br /> - FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only. <br /> = does not include transporting across tasks <br /> = Read configs <br /> - HDFS_BYTES_READ|WRITTEN <br /> = Generally on first and last vertices where HDFS is accessed. <br /> - HDFS_READ_OPS = Listing directories (Direct HDFS counters) <br /> - HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc. <br /> - SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1. <br /> - They occur in Output (when spilling locally when > memory) <br /> - They occur in Input (when collecting from multiple inputs) <br /> - If a vertex has both Input and Output - this will be sum of both. <br />
  • Summary metrics are shown for all vertices, so that you can compare relative performance of vertices. <br /> <br /> <br /> <br /> <br /> Tasks <br /> - FILE_BYTES_READ <br /> - FILE_BYTES_WRITTEN = spill bytes size (3 reads out of 3r+3w) local disk only. <br /> = does not include transporting across tasks <br /> = Read configs <br /> - HDFS_BYTES_READ|WRITTEN <br /> = Generally on first and last vertices where HDFS is accessed. <br /> - HDFS_READ_OPS = Listing directories (Direct HDFS counters) <br /> - HDFS_WRITE_OPS = FS changes (Direct HDFS counters) - create folder, concat file, mkdir, etc. <br /> - SPILLED_RECORDS = 3w+3r+1sort-w = Records in 3+1. <br /> - They occur in Output (when spilling locally when > memory) <br /> - They occur in Input (when collecting from multiple inputs) <br /> - If a vertex has both Input and Output - this will be sum of both. <br />
  • Hive and Tez have hooks to push notifications to ATS. Ambari pulls/GETs information from ATS. <br /> <br /> Other components plan to use ATS more – so Ambari should be able to show other types of Jobs.
  • To enable Hive + Tez, admins should go to Hive configurations and set “hive.execution.engine” to “tez”. Default is “mr”. <br /> Other important tez configs are shown – like YARN container size etc for Hive+Tez queries.
  • Jobs viewer can handle large queries. Like this one is approximately 70 Tez vertices 12 reduce vertices. <br /> The graph is more readable than the text above to analyze issues. <br />
  • - What truly identifies Ambari as a platform – Ability to add new services and manage and monitor a custom stack of components
  • Stack is an all inclusive and self contained definition of all services and their life cycle within Ambari <br /> Let start by encapsulating components and configuration in a stack definition <br /> Next allow a developer to define component life cycle by declaring relationships between different states of a component <br /> REST API allows you to discover what is available <br /> Last plug it into Ambari to bring it all together
  • Command scripts are way to tell Ambari what needs to be executed in order to achieve a state change, example, going from INSTALLED to STARTED entails executing a user defined start script of a component in the desired stack. <br /> Custom Commands and Custom Actions are similar to command scripts but independent of a state change and can be executed on demand using Ambari API, Example: Decommission Datanode, Run rebalancer, verify kerberos settings <br /> Extension makes it easy to add new stacks <br />
  • Command scripts are bundled with the server and downloaded to the agents. <br /> At registration time agents check to make sure the MD5 checksum of the downloaded script archive is the same on the server as in the agent cache, if not a agent downloads new definitions from the server. <br /> This makes on demand / on site modifications easy to change and verify.
  • HBASE service definition in the stack <br /> The metrics.json files defines all metrics emitted by HBASE as well as how these metrics would show up in the Ambari API <br /> Contains configuration, package of command scripts and definition of the service in metainfo.xml <br /> Metainfo.xml: Link HBASE_MASTER component to the script which defines the life cycle commands (start, stop, install, configure) and custom commands if any <br /> Package: The actual command scripts which will be executed on the agents <br /> Example of a command script. Important to mention the python resource management framework of Ambari allows developer to extend a based class called Script and define a resources similar to other languages like puppet <br /> <br />

Managing 2000 Node Cluster with Ambari Managing 2000 Node Cluster with Ambari Presentation Transcript

  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Apache Ambari Managing 2000 node Hadoop cluster Siddharth Wagle, PMC swagle (@apache, @hortonworks) Srimanth Gunturi, PMC srimanth(@apache, @hortonworks)
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Agenda • Operating at scale • Lessons learned • Beyond 2K • Ambari 1.6.0 highlights • New Management features • Blueprints • Ambari Views • Extensibility • Q & A Page 2
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari: Enterprise Hadoop Operations Apache Ambari is the only 100% open source framework for provisioning, managing and monitoring Apache Hadoop clusters AMBARI WEB Page 3 Viewpoint Others AMBARI REST APIs AMBARI SERVER PROVISION | MANAGE | MONITOR compute & storage . . . . . . . . compute & storage . . View slide
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION 100% Apache Open Source • Active Community - 70+ Contributors / 40+ Committers - 240+ Ambari User Group Members Page 4 2013 Dec Apache Ambari Graduates to Top Level Project 2014 Apr 2014 May Apache Ambari 1.5.1 Released Adds operations for Hadoop 2.1 Stack Apache Ambari 1.6.0 Released New Ambari features View slide
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Overview and Architecture Page 5
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Platform Architecture 6 DB Orchestrator Monitoring REST API Request Dispatcher Ambari Web Ambari Server Ambari Agent/s Ganglia/ Nagios/jmx AuthProvider /clusters /stacks /views … User Repo java python puppet JS RDBMS LDAP AD Cluster Configuration s and Topology resources Definitions stacks, actions, views REST API Web Client Configurable Auth Provider Bootstrap or Manual install Monitoring Providers
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Demo 2000 Nodes on commodity hardware Page 7 Process CPU RAM (process) Ambari Server 16 core 2 GB Ganglia 16 core 8 GB Nagios 8 core 8 GB Masters 8 core 8 GB Slaves 1 core 4 GB
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Demo Video • Increase compute capacity with Next Gen Slaves • Group the new hosts with Manage Config groups feature • Override a default config property for the new group • Apply the config by performing rolling restarts on the next gen slaves with 0 – little downtime expectation Page 8
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 9
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Optimizations with Ambari 1.6.1 • Better utilization of rrdcached • Tuning Nagios with recommended performance configurations • Ambari API optimizations Page 11 Process Starting point 1.6.1 Ambari Server > 10 (0.63) ~ 6.0 (0.37) Ganglia Server > 12 (0.75) ~ 0.94 (0.06) Nagios > 14 (1.75) ~ 6.8 (0.85)  Load Average comparison  iostats Process Starting point 1.6.1 Ganglia Server > 10.3 GB writes ~ 0.3 GB write > 34 MB reads cached reads
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Beyond 2K ? • Better metric collection with fan out • Ability to export metrics to existing analytics and long term metric persistence solutions like OpenTSDB • Improve the alerting subsystem to minimize I/O overhead for alerts processing • Server Scale out solution for handling heartbeats and server agent talk for 10K+ nodes Page 12
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Hadoop Daemon AmbariMetricsSink Rack-aware Ambari Metrics Collector (1…N) AmbariMetricsService MySQL Ambari Agent HostMetricsCollector Future of Ambari Metrics System ? (AMBARI-5707) Long term storage AMBARI AMBARI Views Hive Pig TEZ
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari 1.6.0 Features Page 14
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Request Scheduling • Open source quartz scheduler integration • Create a batch of requests executed in the order of creation • Expose API to allow user to create own schedules Page 15
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Rolling Restarts • Goal: minimize cluster downtime • Optionally include only hosts with configurations changes • Set host batch size + time to wait between batches • Set failure tolerance to halt restarts automatically Page 16
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Host Configuration Groups • Set custom configuration properties for one or more host groups (e.g. “host overrides”) • Important for handing “heterogeneous” HW clusters –Different memory, mount points, directories 17 HEAPSIZE= 1024 HEAPSIZE= 512
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Staged Configurations Changes • Restart indicators • Push changes without affecting liveliness of the service Page 19
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Blueprints • Blueprint defines a cluster layout and component configuration • Simplifies “Headless Installs” • Export blueprint from cluster • Boot and Save wizard with blueprint BLUEPRINT AMBARI Submit to Ambari via REST CLUSTER Ambari provisions cluster BLUEPRINT <stack> <host> <service> <component> <config> HOST MANIFEST <host> <meta> SERVICE CONFIGS <props> Page 20
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Cluster create with Blueprint Page 21 • POST /api/v1/blueprints/:blueprintName • POST /api/v1/clusters/:clusterName 201 Created 202 - Accepted
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 22
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Bulk Host Operations • Perform operations such as Stop, Start, Restart, Decommission, Maintenance Mode in “bulk” form • Perform operations on all hosts, filtered hosts or a selected group of hosts • Perform host level operations, or component type operations. Page 23
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Bulk Host Operations • 10+ ways to filter hosts - component type and state, alerts, stale configurations, maintenance mode, etc. Page 24 • Component type start, stop, restart operations are performed in batches
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Maintenance Mode • Goal: silence alerts for services, hosts and components when performing maintenance • Ability to put Service or Host “Out of Service” • Alerts will be suspended for that item • Item will not respond to bulk operations (such as restarts) Page 25
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Maintenance Mode • Components inherit maintenance mode from either service or host • Service/Host in maintenance mode –Bulk operations skipped –Host/Service operations skipped (start all, stop all and restart all) Page 26
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Moving Masters Page 27 • Move master components to different hosts – NameNode (including HA) – SecondaryNameNode – TaskTracker (Hadoop 1) – ResourceManager (Hadoop 2)
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Views Page 28
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views • Goal: Customize the Ambari Web experience • Allows creation of custom views (API and UI) of cluster • Gives users and admins a single entry point to cluster • Views compliment Stack Extensibility –Stack Extensibility makes custom Stack Services available to Ambari –Views expose custom UI features for Services • Ambari Admins can entitle “views” to Ambari Web users –Entitlements framework for finer-grained permissions control for Ambari users Page 29
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Demo Page 30
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Packaging Page 31 files-0.1.0-SNAPSHOT.jar ├── WEB-INF │ └── web.xml │ └── lib ├── index.html ├── org │ └── apache │ └── ambari │ └── view │ └── filebrowser │ ├── HdfsApi.class │ └── ... └── view.xml # ls -l /var/lib/ambari-server/resources/views/ -rw-r--r--. 1 root root 26023710 Jun 1 00:55 files-0.1.0-SNAPSHOT.jar -rw-r--r--. 1 root root 22578573 Jun 1 00:55 pig-0.1.0-SNAPSHOT.jar -rw-r--r--. 1 root root 54649972 Jun 1 00:55 slider-0.1.0-SNAPSHOT.jar
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views: view.xml Page 32 <view> <name>WEATHER</name> <label>Weather</label> <version>1.0.0</version> <parameter> <name>cities</name> <description>The list of cities.</description> <required>true</required> </parameter> <resource> <name>city</name> <plural-name>cities</plural-name> <id-property>id</id-property> <resource-class>org.apache.ambari.view.weather.CityResource</resource-class> <provider-class>org.apache.ambari.view.weather.CityResourceProvider</provider-class> <service-class>org.apache.ambari.view.weather.CityService</service-class> </resource> <instance> <name>EUROPE</name> <property> <key>cities</key> <value>London, UK;Paris;Munich</value> </property> </instance> </view>
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Framework API • GET – http://server:8080/api/v1/views – http://server:8080/api/v1/views/{view-id}/versions – http://server:8080/api/v1/views/{view-id}/versions/{view-version}/instances – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} • POST – Create new instance of view with appropriate parameters – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} – Parameter example for HDFS view – dataworker.defaultFS, dataworker.username • PUT – Update {view-instance} with modified parameters – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} • DELETE – Delete {view-instance} – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance} Page 33
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – View Instance API • GET UI – http://server:8080/views/{view-id}/{view-version}/{view-instance} • GET API – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance}/resources/{resource-name} – http://server:8080/api/v1/views/{view-id}/versions/{view- version}/instances/{view-instance}/{servlet-path} • Example: HDFS – GET: http://views-1:8080/views/FILES/0.1.0/HDFS – GET: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/l istdir?path=%2F – GET: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/download/ browse?path=%2Fuser%2Fhdfs%2FplayerYears.pig&download=true – POST: http://views- 1:8080/api/v1/views/FILES/versions/0.1.0/instances/HDFS/resources/files/fileops/r ename Page 34
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Views – Single cluster interface Page 37 Administrators can control cluster Data Workers can use cluster
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Jobs Page 38
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 39 • Hadoop 1.0: MapReduce – Visualize MapReduce jobs in swimlanes – Task scatter plots across jobs
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 40 • Hadoop 2.0: YARN + Tez
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 41 • Visualize Hive queries using Tez engine
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 42
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs - Counters Page 43 FILE_BYTES_READ + HDFS_BYTES_READ FILE_BYTES_WRITTEN + HDFS_BYTES_WRITTEN HDFS_WRITE_OPS / HDFS_BYTES_WRITTEN HDFS_READ_OPS / HDFS_BYTES_READ FILE_WRITE_OPS / FILE_BYTES_WRITTEN FILE_READ_OPS / FILE_BYTES_READ SPILLED_RECORDS
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs – DAG Graph Page 44 Summary Metrics • Input • Output • Tez Tasks • Spilled Records Vertex Types • Map Vertex • Reduce Vertex • Union Vertex Hive Operators Edge Types • Scatter Gather • Broadcast • Contains
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs Page 45 • Event notification flow ATS (Application Timeline Server – YARN) Ambari PUSH PULL
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs - Configurations Page 46
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Jobs – Scaling Page 47
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Extensibility Page 48
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ambari Stacks • Goal: Reduce time + effort to add new Services to Ambari for provisioning, management and monitoring • Ambari defines a consistent Service lifecycle management interface that can be extended • Dynamically add Stacks + Services definitions Page 49 AMBARI {rest} <ambari-web> Stack HDFS YARN MR2 Hive Pig Oozie NEW NEW NEW HDP-2.0 Stack GlusterFS YARN MR2 Hive HIVENEW 2.0-GlusterFS
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Stack Details • Stacks define Services + Repos – What is in the Stack, and where to get the bits • Each Service has a definition – What Components are part of the Service • Each Service has defined lifecycle commands – start, stop, status, install, configure • Lifecycle is controlled via command scripts • Ability to define “custom” commands • Ability to “extend” Stacks Page 50 AMBARI SERVER Stack Command Scripts Service Definitions AMBARI AGENT/S AMBARI AGENT/S AMBARI AGENT/S pythonxml Repos
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Stack Mechanics • Ambari Server reads Stack definitions on start • Ambari Server sends a command to Agents • Agents download Stack definition + command scripts • Agent executes command • If the Stack definition changes, Agent will request latest Stack definition + command scripts Page 51
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Declarative Definition Page 52
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION In closing … Page 53
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Everyone is welcome to contribute • Thank you for all the contributions • Bring your favorite Hadoop services to Ambari • Useful Links – Website – http://apache.apache.org – Mailing Lists – http://ambari.apache.org/mail-lists.html – Development Wiki – https://cwiki.apache.org/confluence/display/AMBARI • Current and Upcoming Releases – Ambari 1.6.1 (pending release) – Ambari 1.6.0 (May) Page 54
  • © Hortonworks Inc. 2014: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Thank you. Page 55