Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Cloudera Manager – API’s &
Extensibility
Bala Venkatrao, Products@Cloudera
December 2013

1

Cloudera Manager
End-to-End Administration for CDH

Manage

1
Monitor
2
Diagnose
3
Integrate
4

Easily deploy, configure & optimize clusters

Maintain a central view of all activity

Easily identify and resolve issues

Use Cloudera Manager with existing tools

2

©2013 Cloudera, Inc. All Rights Reserved.

Integrating with your IT Mgmt tools
Datacenter Operations

Various options of integrating Cloudera Manager into your existing
Installation,
Datacenter Operations/Tools Monitoring
Alerting
Deployment
Tools
tools
Tools
e.g. Orion,
• Cloudera Manager API
e.g. Chef,
e.g Nagios,
Tivoli, BMC
Puppet etc.
SNMP etc.
etc.
• Introduced in CM4 (June 2012)
• Installation & deployment
• Monitoring
• SNMP Alerts
• Introduced in CM4.5 (Feb 2013)
• Hadoop Operations
And more…
Cloudera
• Monitoring ‘tsquery’ (Feb 2013)
Manager
• User-defined triggers/alarms (new for C5!)
• Service extensibility (new for C5!)

3


Cloudera Manager (CM) API
•

•

API access was a feature introduced in Cloudera Manager 4.0, providing programmatic access
to cluster operations (such as configuration and restart) and monitoring information (such as
health and metrics).
The CM API is an HTTP REST API, using JSON serialization. The API is served on the same host
and port as the CM web UI, and does not require an extra process or extra configuration. API
users have the same privileges as they do in the web UI world.
• Docs & Examples
http://cloudera.github.io/cm_api/
https://github.com/cloudera/cm_api
• Java/Python clients
http://blog.cloudera.com/blog/2013/05/how-toautomate-your-hadoop-cluster-from-java/

4

©2013Cloudera, Inc. All Rights Reserved.

Examples of integration with CM API
•

Installation & Deployment
•
•

Chef/Puppet
Dell Crowbar
•

•

StackIQ
•

•
•

•

http://blog.cloudera.com/blog/2013/08/how-to-deploy-hadoop-clusters-automatically-withdell-crowbar-and-cloudera-manager/
http://web.stackiq.com/blog/bid/312064/StackIQ-Cluster-Manager-now-integrated-withCloudera

WANdisco – non-stop NN setup
Several other customers/partners leveraging the API’s as part of their
install & deployment process

Monitoring & Alerting
•
•

Oracle Enterprise Manager (via Big Data Appliance)
Nagios
•
•

https://github.com/cloudera/cm_api/tree/master/nagios
https://github.com/harisekhon/nagiosplugins/blob/master/check_hadoop_cloudera_manager_metrics.pl

Develop & Contribute your plug-in’s using Cloudera
• SNMP alerts integration with IBM Netcool
Manager API
5


Cloudera Manager – Monitoring via ‘tsquery’
•

Introduced as part of CM4.5 release (Feb 2013)

•

Great way to add interesting charts (above & beyond what is provided by default)
and monitor metrics that are relevant to your clusters

•

The tsquery language is used to specify statements for retrieving time-series data
from the Cloudera Manager time-series data store

•

Example: How do I compare all disk IO for all the DataNodes that belong to a specific
HDFS service?
select bytes_read, bytes_written where roleType=DATANODE and
serviceName=hdfs1

•

Retrieved time-series data can be plotted via various options – line, bar, scatter, heat
maps, table list etc.

•

Extending this concept to create user-defined triggers/alarms (new for C5!).

•

More details
•

6

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-ManagerDiagnostics-Guide/cm5dg_chart_time_series_data.html


Examples of Cloudera Manager ‘tsquery’
Example1: How do I track the
aggregate Cluster Disk IO?
select dt0(read_bytes_disk_sum),
dt0(write_bytes_disk_sum) where
category = CLUSTER and clusterId =
$CLUSTERID
Example2: How do I compare CPU
usage across hosts?
select dt0(total_cpu_user) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_system) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_nice) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_iowait) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_irq) / getHostFact(numCores, 1) * 100,
dt0(total_cpu_soft_irq) / getHostFact(numCores, 1) * 100

Create & Contribute your ‘tsqueries’!
https://github.com/cloudera/cm_charting_scrapbook
7


Cloudera as an Application Platform

ISV’s view of a Database

Workload
Mgmt

Drivers
JDBC/ODBC

Security
Mgmt

Data
Access
API’s

ISV’s view of an OS

Systems
Mgmt

Package
Mgmt

Core Database

8

Process/
Resource
Mgmt

Security
Mgmt

Data
Access
API’s

Core OS kernel


Systems
Mgmt

Cloudera as an Application Platform

ISV’s view of Cloudera

Package
Mgmt

Workload/
Process
Mgmt

Security
Mgmt

Data
Access
API’s

Drivers
JDBC/ODBC

CDH

9


Systems
Mgmt

Cloudera Platform Features
Features

Description

Examples

Package Mgmt

- Ability to easily package and distribute binaries/jars via
“Parcels”

Informatica, Syncsort, LZO libraries

Workload/ Process Mgmt

- Ability to deploy applications as stand-alone processes
or via YARN* on the Hadoop cluster
- Isolation of cluster resources

SAS, 0xData, Accumulo, Spark

Security Mgmt

- Support for Kerberos Mgmt
- Role bases access control for Tables/Views in
Hive/Impala via Sentry

Data Access API’s

- HDFS API, HBase API, Search API, Spark API
- Kite (formerly Cloudera Development Kit)

Causata, Basis Tech, CounterTack, Amdocs

Drivers

- ODBC/JDBC drivers for Hive/Impala

Zoomdata, Tableau, Microstrategy, Qlikview

Systems Mgmt

- End-to-End management of an application via Cloudera
Manager (CM)

StackIQ, Dell Crowbar, Oracle OEM

Manage

-Deploy and upgrade (rolling) services and pkgs
-Manage configurations

Monitor

-Proactive health checks
-Track resource utilization
-Custom metrics charts

Diagnose

-Distributed log collection and searching
-Tag and track key events

Integrate

-Access CM via API

* Support for YARN planned as part of CM5.x in FY14

10


Example – Deployment via Parcels

The platform for Big Data

+

The ETL app for hadoop

•

•

Smarter Deployment & Administration: Seamless integration with
Cloudera Manager for one-click deployment and easier
administration

•

11

Smarter Architecture: No code generation. ETL engine runs natively
within Hadoop MapReduce, via plugin included in CDH 4.2

Smarter Monitoring: Comprehensive logging capabilities + activity
monitoring through Cloudera Manager


How it works
1. Download Syncsort DMX-h “Parcel” file to your custom repository
File contains everything you need to properly
deploy Syncsort DMX-h ETL Edition on Cloudera

2. Distribute & activate DMX-h parcel on your Cloudera cluster

A

C

Find Nodes

Install
Components

Assign Roles

Enter the names of the hosts
which will be included in the
Hadoop cluster. Click
Continue.
12

B

Cloudera Manager
automatically installs the CDH
components on the hosts you
specified.

Verify the roles of the nodes
within your cluster. Make
changes as necessary.


Syncsort DMX-h + Cloudera Manager
Cloudera Manager

CDH Cluster + ISV software

Support
Integration
Monitoring

Syncsort
DMX-h

A
P
I

Management

Installation

CDH Nodes

13

DMX-h on every CDH node


13

Get a 360° View of Your Cluster, Including DMX-h Logs

View service health
& performance
Get host-level
snapshots
Monitor &
diagnose workloads
Gather, view & Distribute your own Parcels via Cloudera Manager and
Build and search
Hadoop & DMX-h logs

…And more!!
14

share it with the community !

Service Extensibility
•

Introduced in C5
•

Still in Beta!

•
•

Similar look and feel as existing services

•

Easy to write (Java-free!)

•

Flexible

•

15

Single management console for CDH, non-CDH services and
ISV applications

Independent release cycle


So.. How does it work?
• A JSON file that describes of your service
• Set of control scripts
• Packaged as a JAR file
• As promised, Java-free

16


Example: Cloudera Manager Extensions - Spark

17


Cloudera Manager Extensions

18


Cloudera Manager Extensions: Spark

19



20



21


The Code
name : “spark”,

#!/bin/bash

roles : [{

CMD=$1

name : "master",

MASTER_PORT=<read in from ./params.properties>

startRunner : {
program : "scripts/control.sh",

case $CMD in

args : [ "start_master",

(start_master)

"./params.properties"]

exec $SPARK_HOME/scripts/spark-start.sh master"

},

;;

parameters : [{

(*)

name : "master_port",

echo "$timestamp Don't understand [$CMD]"

type : "port",

;;

default : 7077

esac

}],
configWriter : {
generators : [{
filename : "params.properties"
}]
}]
22


Next Steps
• Documentation & SDK as part of C5 Beta2
or later (definitely before GA!)
• Working with select ISV’s (SAS, 0xData
etc.) as part of Beta to further fine-tune
this feature
Develop & Contribute your Cloudera Manager service extensibility
plug-in’s !
23


Service Extensibility

Vertical Extension

Vision of CM Extensibility

Horizontal Extension

0xData

SAS

Syncsort

Informatica

Revolution

API

Ops Apps
Capacity
Mgr

Security
ISV’s

SLA Mgr

Cost
Optimizer

CDH

CM
SNMP API

Oracle
OEM

24

Nagios

Dell

Chef/
Puppet


Accumulo

Spark

Giraph

Q&A
• If you interested in learning more,
participating in Beta, contributing plug-ins
or Apps, contact: bala@cloudera.com

25


Appendix/Resources
•

•

•

•

•

26

Systems Management
•
Cloudera Manager API
•
http://cloudera.github.io/cm_api/
•
http://blog.cloudera.com/blog/2013/05/how-to-automate-your-hadoop-cluster-from-java/
Package Management
•
Docs on Parcels
•
http://training.cloudera.com/elearning/Parcels/
•
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-ManagerIntroduction/cmi_primer.html
•
http://blog.cloudera.com/blog/2013/05/faq-understanding-the-parcel-binary-distribution-format/
•
http://blog.cloudera.com/blog/2013/07/one-engineers-experience-with-parcel/
Data Access API’s
•
http://blog.cloudera.com/blog/2013/05/cloudera-development-kit-cdk/
•
https://github.com/cloudera/cdk
Workload/Resource Management
•
Cloudera Manager 5 documentation
•
http://cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-ManagingClusters/cm5mc_managing_resources.html
•
http://blog.cloudera.com/blog/2013/05/how-the-sas-and-cloudera-platforms-work-together/
Security Management
•
http://blog.cloudera.com/blog/2013/07/with-sentry-cloudera-fills-hadoops-enterprise-security-gap/


Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Cloudera User Group SF - Cloudera Manager: APIs & Extensibility

Similar to Cloudera User Group SF - Cloudera Manager: APIs & Extensibility (20)

Recently uploaded

Recently uploaded (20)

Cloudera User Group SF - Cloudera Manager: APIs & Extensibility