Self-Service Provisioning and Hadoop Management with Apache Ambari

Self-Service Provisioning and
Hadoop Management with Apache Ambari
Anant Chintamaneni
June 9th, 2015
1:45pm to 2:25pm

This session is on self-service Hadoop for:
 ON-PREMISES
 IN YOUR DATA CENTER
 USING YOUR INFRASTRUCTURE
NOT
X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)

About me
• VP of Products at BlueData
• @AnantCman on Twitter
• Former Head of Hadoop Products at Pivotal
• Championed Ambari at Pivotal
• Introduced Hadoop at Merced Systems (now NICE Systems)
Personal
• Soccer dad 
• Sports fan – go Niners!

• Self-service Hadoop – what is it, why now?
• Key building blocks for self-service Hadoop
• Why Apache Ambari
• Delivering self-service with Ambari
• Demo
• Q&A
Talk Track

Self-service is the need of the hour for Hadoop
“……while Hadoop can handle huge data sets and make them useable, the
capabilities needed to set up and run Hadoop remain scarce and expensive…..”
Self-service models are proven to simplify and drive usage

Self-service Hadoop defined
Make it work the way users want to work today…
Files
NFS
RDBMS
I can access my
desktop analysis /
BI tool of choice
Analytics
/visualization
idea!
Point at data
and analyze
Self-service analytics: from idea to insights in minutes

Self-service Hadoop defined
Make it work the way users want to work today…
Self-service Hadoop: from idea to infrastructure to insights in minutes
I can provision
my own Hadoop
‘cluster’ so I have
Hive, Pig, BI tool,
etc.
Big Data
Analytics
/visualization
idea!
Point at data
and analyze,
extract insights
NFS
RDBMS

Self-service Hadoop examples
• Ad-hoc data exploration  can I blend this data with that data?
• Fail fast experimentation  you don’t know what you don’t know
• Test multiple predictive analytics models  get a dedicated sandbox
• Bursty workload  your boss needs you do an analytics drill

Without self-service Hadoop
It may not work the way your users want to work today…
From idea to infrastructure to insights in weeks
YES
NO NO
Provision cluster Copy data to cluster
NO
Wait!
Run Hadoop
analytics
jobs
Meet … wait …
email … why isn’t
my cluster ready?
Big Data
Analytics
/visualization
idea!
Lost business
opportunity,
insights no
longer relevant
YES YESHadoop
cluster
ready
Is my
data
there?
Code/q
uery
review

Key building blocks for self-service Hadoop
End user experience
Agility, elasticity and easy access
Enterprise IT
Operational support and oversight
Easy
Access
Tech
Support

Why Apache Ambari
 RESTful APIs to automate provisioning of Apache Hadoop clusters
• Capture basic cluster parameters from user and leverage Ambari APIs
 Granular control on deployment of services (e.g. Hive, Pig)
• Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user
• Speeds up availability of cluster by eliminating overhead
 Enterprise-grade security, management and monitoring capabilities
• IT admins can support user-created clusters with familiar mgmt console

Delivering self-service with Ambari
Your physical servers
+ =
VIRTUALIZED INFRASTRUCTURE
• Big Data VMs/Containers
• Self-service web UI
• Tenant/User Management
• DataTap (HDFS abstraction)
SELF-SERVICE HDP CLUSTERS
• HDP Virtual Hadoop clusters
• Ambari management console
• ‘Compute’ services (e.g. Hive)
+ =

Self-service web interface – define cluster with a few mouse clicks
* Example screenshot from BlueData
integration with Apache Ambari

Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari

Hadoop cluster provisioning using Ambari API
Phase 1: VMs
• Self-service request
• VMs provisioned
• Ambari server & agents
pre-deployed
• HDFS dependency
removed
Phase 2: Core Stack
• Agent registration with server
• REST API call to deploy HDP stack
• REST API to create core-site.xml to
use BlueData HDFS abstraction
• Start YARN/MRv2
• Shutdown HDFS service
Phase 3: Services
• Add specific services
requested by end user via
REST API calls
• Start ‘compute’ services
(e.g. Hive, Pig) requested
by user
• Update status of cluster
Design optimized for
cluster creation speed and user feedback

curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]}
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
REST API example to deploy specific service (Pig)
Service
Configs
Install

Design choices and considerations
• Used Apache Ambari v1.7 for this example
• BlueData mgmt services orchestrate Ambari REST API calls
• Ambari Blueprints used bring up HDFS only
– Post cluster creation, services added using individual REST APIs for better control
– Blueprints/Stack Advisor do not provide REST API to track intermediate progress
• Used individual REST API calls with static configuration files
– Could not leverage Stack Advisor for individual services

Self-Service with Ambari:
Live Demo

Q&A
Contact me directly at …
Email: anant@bluedata.com
Twitter: @AnantCman

BlueData + Apache Ambari 1.7 Integration
Benefits Features
Infrastructure agility, elasticity, and efficiency – virtual HDP
clusters with the functionality and performance of physical
clusters
• Auto-provisioning of VM hosts with Ambari server and
agent components
• Automated, transparent deployment of CDH using REST
API for Stacks and Services.
Time savings for Data Scientists and Big Data
administrators
• Self-service virtual cluster creation by data scientists or
business analysts
• Troubleshooting and management by Big Data admins
using Apache Ambari
Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration,
and enterprise support

Self-Service Provisioning and Hadoop Management with Apache Ambari

More Related Content

What's hot

Similar to Self-Service Provisioning and Hadoop Management with Apache Ambari

More from DataWorks Summit

Recently uploaded

Self-Service Provisioning and Hadoop Management with Apache Ambari

Editor's Notes