Cloudera Manager – API’s &
Extensibility
Patrick Angeles, Director Field Technical Services
December 2013

1

CONFIDENTIAL...
Cloudera Manager
End-to-End Administration for CDH

1
Monitor
2
Diagnose
3
Integrate
4
Manage

Easily deploy, configure & ...
Integrating with your IT Mgmt tools
Datacenter Operations

Various options of integrating Cloudera Manager into your exist...
Cloudera Manager (CM) API
•

•

API access was a new feature introduced in Cloudera Manager 4.0, providing programmatic ac...
Examples of integration with CM API
•

Installation & Deployment
•
•
•

Chef
Puppet
Dell Crowbar
•

•

http://blog.clouder...
Cloudera Manager – Monitoring via ‘tsquery’
•

Introduced as part of CM4.5 release (Feb 2013)

•

Great way to add interes...
Examples of Cloudera Manager ‘tsquery’
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_...
Cloudera Manager – Service Extensibility
•

Introduced in C5
•

Still in Beta!

•

Some aspects (espcially Parcel mgmt) av...
Analogy from Operating Systems (OS) world

ISV’s view of OS

Systems Management
Package
Mgmt

Process/
Resource
Mgmt

Secu...
Bringing ISV Apps to CDH
ISV’s view of Hadoop

Cloudera Manager
Parcels

Resource
Mgmt

Security
Mgmt

CDK API’s

Core Had...
Integrating into the Cloudera Product Portfolio
Features

Description

Examples

Package
Mgmt

- Ability to easily package...
So.. How does it work?
• A JSON file that describes of your service

• Set of control scripts
• Packaged as a JAR file

• ...
Example: Cloudera Manager Extensions - Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
Cloudera Manager Extensions: Spark

©2013Cloudera, Inc. All Rights Reserved.
The Code
name : “spark”,

#!/bin/bash

roles : [{

CMD=$1

name : "master",

MASTER_PORT=<read in from ./params.properties...
Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, Sy...
Service Extensibility

Vertical Extension

Vision of CM Extensibility

Horizontal Extension

0xData

SAS

Syncsort

Inform...
Q&A

©2013Cloudera, Inc. All Rights Reserved.
Upcoming SlideShare
Loading in …5
×

Pa cloudera manager-api's_extensibility_v2

463 views
370 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
463
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution
  • Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution
  • Software: Cloudera Enterprise – The Platform for Big DataA complete data management solution that includes and expands upon Apache HadoopA collection of open source projects form the foundation of the platformCloudera has wrapped the open source core with additional software for system and data management as well as technical support5 Attributes of Cloudera Enterprise:ScalableStorage and compute in a single system – brings computation to data (rather than the other way around)Scale capacity and performance linearly – just add nodesProven at massive scale – tens of PB of data, millions of usersFlexibleStore any type of dataStructured, unstructured, semi-structuredIn it’s native format – no conversion requiredNo loss of data fidelity due to ETLFluid structuringNo single model or schema that the data must conform toDetermine how you want to look at data at the time you ask the question – if the attribute exists in the raw data, you can query against itAlter structure to optimize query performance as desired (not required) – multiple open source file formats like Avro, ParquetMultiple forms of computationBring different tools to bear on the data, depending on your skillset and what you want to doBatch processing – MapReduce, Hive, Pig, JavaInteractive SQL – Impala, BI toolsInteractive Search – for non-technical users, or helping to identify datasets for further analysisMachine learning – apply algorithms to large datasets using libraries like Apache MahoutMath – tools like SAS and R for data scientists and statisticiansMore to come…Cost-EffectiveScale out on inexpensive, industry standard hardware (vs. highly tuned, specialized hardware)Fault tolerance built-inLeverage cost structures with existing vendorsReduced data movement – can perform more operations in a single place due to flexible toolingFewer redundant copies of dataLess time spent migrating/managingOpen source software is easy acquire and prove the value/ROIOpenRapid innovationLarge development communitiesThe most talented engineers from across the worldEasy to acquire and prove valueFree to download and deployDemonstrate the value of the technology before you make a large-scale investmentNo vendor lock-in – choose your vendor based solely on meritCloudera’s open source strategyIf it stores or processes data, it’s open sourceBig commitment to open sourceLeading contributor to the Apache Hadoop ecosystem – defining the future of the platform together with the communityIntegratedWorks with all your existing investmentsDatabases and data warehousesAnalytics and BI solutionsETL toolsPlatforms and operating systemsHardware and networking equipmentOver 700 partners including all of the leaders in the market segments aboveComplements those investments by allowing you to align data and processes to the right solution
  • Pa cloudera manager-api's_extensibility_v2

    1. 1. Cloudera Manager – API’s & Extensibility Patrick Angeles, Director Field Technical Services December 2013 1 CONFIDENTIAL - RESTRICTED
    2. 2. Cloudera Manager End-to-End Administration for CDH 1 Monitor 2 Diagnose 3 Integrate 4 Manage Easily deploy, configure & optimize clusters Maintain a central view of all activity Easily identify and resolve issues Use Cloudera Manager with existing tools 2 ©2013 Cloudera, Inc. All Rights Reserved.
    3. 3. Integrating with your IT Mgmt tools Datacenter Operations Various options of integrating Cloudera Manager into your existing Installation, Datacenter Operations/Tools Monitoring Alerting Deployment Tools tools Tools e.g. Orion, • Cloudera Manager API e.g. Chef, e.g Nagios, Tivoli, BMC Puppet etc. SNMP etc. etc. • Introduced in CM4 (June 2012) • Installation & deployment • Monitoring • SNMP Alerts • Introduced in CM4.5 (Feb 2013) • Hadoop Operations And more… Cloudera • Monitoring ‘tsquery’ (Feb 2013) Manager • User-defined triggers/alarms (new for C5!) • Service extensibility (new for C5!) 3 ©2013 Cloudera, Inc. All Rights Reserved.
    4. 4. Cloudera Manager (CM) API • • API access was a new feature introduced in Cloudera Manager 4.0, providing programmatic access to cluster operations (such as configuration and restart) and monitoring information (such as health and metrics). The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host and port as the CM web UI, and does not require an extra process or extra configuration. API users have the same privileges as they do in the web UI world. • Docs & Examples http://cloudera.github.io/cm_api/ https://github.com/cloudera/cm_api • Java/Python clients http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/ 4 ©2013Cloudera, Inc. All Rights Reserved.
    5. 5. Examples of integration with CM API • Installation & Deployment • • • Chef Puppet Dell Crowbar • • http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-with-dell-crowbar-and-cloudera-manager/ StackIQ • http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-with-Cloudera • • • WANdisco – non-stop NN setup Several other customers/partners leveraging the API’s as part of their install & deployment process Monitoring & Alerting • • Oracle Enterprise Manager (via Big Data Appliance) Nagios • • • https://github.com/cloudera/cm_api/tree/master/nagios https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl SNMP alerts integration with IBM Netcool Develop & Contribute your plug-in’s using Cloudera Manager API 5 ©2013 Cloudera, Inc. All Rights Reserved.
    6. 6. Cloudera Manager – Monitoring via ‘tsquery’ • Introduced as part of CM4.5 release (Feb 2013) • Great way to add interesting charts (above & beyond what is provided by default) and monitor metrics that are relevant to your clusters • The tsquery language is used to specify statements for retrieving time-series data from the Cloudera Manager time-series data store • Example: How do I compare all disk IO for all the DataNodes that belong to a specific HDFS service? select bytes_read, bytes_written where roleType=DATANODE and serviceName=hdfs1 • Retrieved time-series data can be plotted via various options – line, bar, scatter, heat maps, table list etc. • Extending this concept to create user-defined triggers/alarms (new for C5!). • More details • http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/ClouderaManager-Diagnostics-Guide/cm5dg_chart_time_series_data.html 6 ©2013 Cloudera, Inc. All Rights Reserved.
    7. 7. Examples of Cloudera Manager ‘tsquery’ Example1: How do I track the aggregate Cluster Disk IO? select dt0(read_bytes_disk_sum), dt0(write_bytes_disk_sum) where category = CLUSTER and clusterId = $CLUSTERID Example2: How do I compare CPU usage across hosts? select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100, dt0(total_cpu_system) / getHostFact(numCores, 1) * 100, dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100, dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100, dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100, dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100 Create & Contribute your ‘tsqueries’! https://github.com/cloudera/cm_charting_scrapbook 7 ©2013 Cloudera, Inc. All Rights Reserved.
    8. 8. Cloudera Manager – Service Extensibility • Introduced in C5 • Still in Beta! • Some aspects (espcially Parcel mgmt) available in CM4.x • Example: Collaboration with Syncsort to deploy DMX-h libraries • Single management console for CDH, non-CDH services and ISV applications • Similar look and feel as existing services • Easy to write (Java-free!) • Flexible • Independent release cycle ©2013Cloudera, Inc. All Rights Reserved.
    9. 9. Analogy from Operating Systems (OS) world ISV’s view of OS Systems Management Package Mgmt Process/ Resource Mgmt Security Mgmt Core OS kernel 9 ©2013Cloudera, Inc. All Rights Reserved. Data Access Mgmt
    10. 10. Bringing ISV Apps to CDH ISV’s view of Hadoop Cloudera Manager Parcels Resource Mgmt Security Mgmt CDK API’s Core Hadoop/CDH kernel 10 ©2013Cloudera, Inc. All Rights Reserved.
    11. 11. Integrating into the Cloudera Product Portfolio Features Description Examples Package Mgmt - Ability to easily package and distribute binaries/jars via “Parcels” -Informatica -Syncsort Resource Mgmt - Ability to deploy applications as stand-alone processes or via YARN* on the Hadoop grid - Resource isolation of cluster resources -SAS -0xData -Accumulo Security Mgmt - Support for Kerberos Mgmt - Role bases access control for Tables/Views in Hive/Impala via Sentry Data Access Mgmt ISV’s - HDFS and HBase API abstraction and simplification Cloudera Manager Systems Mgmt Manage - Deploy and upgrade (rolling) services and pkgs - Manage configurations Monitor - Proactive health checks - Track resource utilization - Custom metrics charts Diagnose - Distributed log collection and searching - Tag and track key events Integrate - Access operational tools via API - Surface overall cluster metrics to ISV dashboard Non-CDH Apps… Accumulo, Spark, Giraph etc. * Support for YARN planned as part of CM5.x in FY14 11 ©2013Cloudera, Inc. All Rights Reserved.
    12. 12. So.. How does it work? • A JSON file that describes of your service • Set of control scripts • Packaged as a JAR file • As promised, Java-free ©2013Cloudera, Inc. All Rights Reserved.
    13. 13. Example: Cloudera Manager Extensions - Spark ©2013Cloudera, Inc. All Rights Reserved.
    14. 14. Cloudera Manager Extensions ©2013Cloudera, Inc. All Rights Reserved.
    15. 15. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
    16. 16. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
    17. 17. Cloudera Manager Extensions: Spark ©2013Cloudera, Inc. All Rights Reserved.
    18. 18. The Code name : “spark”, #!/bin/bash roles : [{ CMD=$1 name : "master", MASTER_PORT=<read in from ./params.properties> startRunner : { program : "scripts/control.sh", case $CMD in args : [ "start_master", (start_master) "./params.properties"] exec $SPARK_HOME/scripts/spark-start.sh master" }, ;; parameters : [{ (*) name : "master_port", echo "$timestamp Don't understand [$CMD]" type : "port", ;; default : 7077 esac }], configWriter : { generators : [{ filename : "params.properties" }] }] ©2013Cloudera, Inc. All Rights Reserved.
    19. 19. Next Steps • Documentation & SDK as part of C5 Beta2 or later (definitely before GA!) • Working with select ISV’s (SAS, Syncsort, 0xData etc.) as part of Beta to further finetune this feature Develop & Contribute your Cloudera Manager service extensibility plug-in’s ! ©2013Cloudera, Inc. All Rights Reserved.
    20. 20. Service Extensibility Vertical Extension Vision of CM Extensibility Horizontal Extension 0xData SAS Syncsort Informatica Revolution API Ops Apps Capacity Mgr Security ISV’s SLA Mgr Cost Optimizer CDH CM SNMP API Oracle OEM 20 Nagios Dell Chef/ Puppet ©2012Cloudera, Inc. All Rights Reserved. Accumulo Spark Giraph
    21. 21. Q&A ©2013Cloudera, Inc. All Rights Reserved.

    ×