Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

1,852 views

Published on

Presented by Bala Venkatrao, Director of Products at Cloudera, during our Bay Area Cloudera User Group on 12/10/13 in San Francisco.

Published in: Technology

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

  1. 1. Cloudera Manager – API’s & Extensibility Bala Venkatrao, Products@Cloudera December 2013 1
  2. 2. Cloudera Manager End-to-End Administration for CDH Manage 1 Monitor 2 Diagnose 3 Integrate 4 Easily deploy, configure & optimize clusters Maintain a central view of all activity Easily identify and resolve issues Use Cloudera Manager with existing tools 2 ©2013 Cloudera, Inc. All Rights Reserved.
  3. 3. Integrating with your IT Mgmt tools Datacenter Operations Various options of integrating Cloudera Manager into your existing Installation, Datacenter Operations/Tools Monitoring Alerting Deployment Tools tools Tools e.g. Orion, • Cloudera Manager API e.g. Chef, e.g Nagios, Tivoli, BMC Puppet etc. SNMP etc. etc. • Introduced in CM4 (June 2012) • Installation & deployment • Monitoring • SNMP Alerts • Introduced in CM4.5 (Feb 2013) • Hadoop Operations And more… Cloudera • Monitoring ‘tsquery’ (Feb 2013) Manager • User-defined triggers/alarms (new for C5!) • Service extensibility (new for C5!) 3 ©2013 Cloudera, Inc. All Rights Reserved.
  4. 4. Cloudera Manager (CM) API • • API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. API users have the same privileges as they do in the web UI world. • Docs & Examples http://cloudera.github.io/cm_api/ https://github.com/cloudera/cm_api • Java/Python clients http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/ 4 ©2013Cloudera, Inc. All Rights Reserved.
  5. 5. Examples of integration with CM API • Installation & Deployment • • Chef/Puppet Dell Crowbar • • StackIQ • • • • http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-withdell-crowbar-and-cloudera-manager/ http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-withCloudera WANdisco – non-stop NN setup Several other customers/partners leveraging the API’s as part of their install & deployment process Monitoring & Alerting • • Oracle Enterprise Manager (via Big Data Appliance) Nagios • • https://github.com/cloudera/cm_api/tree/master/nagios https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl Develop & Contribute your plug-in’s using Cloudera • SNMP alerts integration with IBM Netcool Manager API 5 ©2013 Cloudera, Inc. All Rights Reserved.
  6. 6. Cloudera Manager – Monitoring via ‘tsquery’ • Introduced as part of CM4.5 release (Feb 2013) • Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters • The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store • Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service? select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1 • Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc. • Extending this concept to create user-defined triggers/alarms (new for C5!). • More details • 6 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-ManagerDiagnostics-Guide/cm5dg_chart_time_series_data.html ©2013 Cloudera, Inc. All Rights Reserved.
  7. 7. Examples of Cloudera Manager ‘tsquery’ Example1: How do I track the aggregate Cluster Disk IO? select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID Example2: How do I compare CPU usage across hosts? select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100 Create & Contribute your ‘tsqueries’! https://github.com/cloudera/cm_charting_scrapbook 7 ©2013 Cloudera, Inc. All Rights Reserved.
  8. 8. Cloudera as an Application Platform ISV’s view of a Database Workload Mgmt Drivers JDBC/ODBC Security Mgmt Data Access API’s ISV’s view of an OS Systems Mgmt Package Mgmt Core Database 8 Process/ Resource Mgmt Security Mgmt Data Access API’s Core OS kernel ©2013Cloudera, Inc. All Rights Reserved. Systems Mgmt
  9. 9. Cloudera as an Application Platform ISV’s view of Cloudera Package Mgmt Workload/ Process Mgmt Security Mgmt Data Access API’s Drivers JDBC/ODBC CDH 9 ©2013Cloudera, Inc. All Rights Reserved. Systems Mgmt
  10. 10. Cloudera Platform Features Features Description Examples Package Mgmt - Ability to easily package and distribute binaries/jars via “Parcels” Informatica, Syncsort, LZO libraries Workload/ Process Mgmt - Ability to deploy applications as stand-alone processes or via YARN* on the Hadoop cluster - Isolation of cluster resources SAS, 0xData, Accumulo, Spark Security Mgmt - Support for Kerberos Mgmt - Role bases access control for Tables/Views in Hive/Impala via Sentry Data Access API’s - HDFS API, HBase API, Search API, Spark API - Kite (formerly Cloudera Development Kit) Causata, Basis Tech, CounterTack, Amdocs Drivers - ODBC/JDBC drivers for Hive/Impala Zoomdata, Tableau, Microstrategy, Qlikview Systems Mgmt - End-to-End management of an application via Cloudera Manager (CM) StackIQ, Dell Crowbar, Oracle OEM Manage -Deploy and upgrade (rolling) services and pkgs -Manage configurations Monitor -Proactive health checks -Track resource utilization -Custom metrics charts Diagnose -Distributed log collection and searching -Tag and track key events Integrate -Access CM via API * Support for YARN planned as part of CM5.x in FY14 10 ©2013Cloudera, Inc. All Rights Reserved.
  11. 11. Example – Deployment via Parcels The platform for Big Data + The ETL app for hadoop • • Smarter Deployment & Administration: Seamless integration with Cloudera Manager for one-click deployment and easier administration • 11 Smarter Architecture: No code generation. ETL engine runs natively within Hadoop MapReduce, via plugin included in CDH 4.2 Smarter Monitoring: Comprehensive logging capabilities + activity monitoring through Cloudera Manager ©2013Cloudera, Inc. All Rights Reserved.
  12. 12. How it works 1. Download Syncsort DMX-h “Parcel” file to your custom repository File contains everything you need to properly deploy Syncsort DMX-h ETL Edition on Cloudera 2. Distribute & activate DMX-h parcel on your Cloudera cluster A C Find Nodes Install Components Assign Roles Enter the names of the hosts which will be included in the Hadoop cluster. Click Continue. 12 B Cloudera Manager automatically installs the CDH components on the hosts you specified. Verify the roles of the nodes within your cluster. Make changes as necessary. ©2013Cloudera, Inc. All Rights Reserved.
  13. 13. Syncsort DMX-h + Cloudera Manager Cloudera Manager CDH Cluster + ISV software Support Integration Monitoring Syncsort DMX-h A P I Management Installation CDH Nodes 13 DMX-h on every CDH node ©2013Cloudera, Inc. All Rights Reserved. 13
  14. 14. Get a 360° View of Your Cluster, Including DMX-h Logs View service health & performance Get host-level snapshots Monitor & diagnose workloads Gather, view & Distribute your own Parcels via Cloudera Manager and Build and search Hadoop & DMX-h logs …And more!! 14 share it with the community ! ©2013Cloudera, Inc. All Rights Reserved.
  15. 15. Service Extensibility • Introduced in C5 • Still in Beta! • • Similar look and feel as existing services • Easy to write (Java-free!) • Flexible • 15 Single management console for CDH, non-CDH services and ISV applications Independent release cycle ©2013Cloudera, Inc. All Rights Reserved.
  16. 16. So.. How does it work? • A JSON file that describes of your service • Set of control scripts • Packaged as a JAR file • As promised, Java-free 16 ©2013Cloudera, Inc. All Rights Reserved.
  17. 17. Example: Cloudera Manager Extensions - Spark 17 ©2013Cloudera, Inc. All Rights Reserved.
  18. 18. Cloudera Manager Extensions 18 ©2013Cloudera, Inc. All Rights Reserved.
  19. 19. Cloudera Manager Extensions: Spark 19 ©2013Cloudera, Inc. All Rights Reserved.
  20. 20. Cloudera Manager Extensions: Spark 20 ©2013Cloudera, Inc. All Rights Reserved.
  21. 21. Cloudera Manager Extensions: Spark 21 ©2013Cloudera, Inc. All Rights Reserved.
  22. 22. The Code name : “spark”, #!/bin/bash roles : [{ CMD=$1 name : "master", MASTER_PORT=<read in from ./params.properties> startRunner : { program : "scripts/control.sh", case $CMD in args : [ "start_master", (start_master) "./params.properties"] exec $SPARK_HOME/scripts/spark-start.sh master" }, ;; parameters : [{ (*) name : "master_port", echo "$timestamp Don't understand [$CMD]" type : "port", ;; default : 7077 esac }], configWriter : { generators : [{ filename : "params.properties" }] }] 22 ©2013Cloudera, Inc. All Rights Reserved.
  23. 23. Next Steps • Documentation & SDK as part of C5 Beta2 or later (definitely before GA!) • Working with select ISV’s (SAS, 0xData etc.) as part of Beta to further fine-tune this feature Develop & Contribute your Cloudera Manager service extensibility plug-in’s ! 23 ©2013Cloudera, Inc. All Rights Reserved.
  24. 24. Service Extensibility Vertical Extension Vision of CM Extensibility Horizontal Extension 0xData SAS Syncsort Informatica Revolution API Ops Apps Capacity Mgr Security ISV’s SLA Mgr Cost Optimizer CDH CM SNMP API Oracle OEM 24 Nagios Dell Chef/ Puppet ©2013Cloudera, Inc. All Rights Reserved. Accumulo Spark Giraph
  25. 25. Q&A • If you interested in learning more, participating in Beta, contributing plug-ins or Apps, contact: bala@cloudera.com 25 ©2013Cloudera, Inc. All Rights Reserved.
  26. 26. Appendix/Resources • • • • • 26 Systems Management • Cloudera Manager API • http://cloudera.github.io/cm_api/ • http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/ Package Management • Docs on Parcels • http://training.cloudera.com/elearning/Parcels/ • http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-ManagerIntroduction/cmi_primer.html • http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/ • http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/ Data Access API’s • http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/ • https://github.com/cloudera/cdk Workload/Resource Management • Cloudera Manager 5 documentation • http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-ManagingClusters/cm5mc_managing_resources.html • http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/ Security Management • http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/ ©2013Cloudera, Inc. All Rights Reserved.

×