Self-Service Provisioning and
Hadoop Management with Apache Ambari
Anant Chintamaneni
June 9th, 2015
1:45pm to 2:25pm
This session is on self-service Hadoop for:
 ON-PREMISES
 IN YOUR DATA CENTER
 USING YOUR INFRASTRUCTURE
NOT
X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)
About me
• VP of Products at BlueData
• @AnantCman on Twitter
• Former Head of Hadoop Products at Pivotal
• Championed Ambari at Pivotal
• Introduced Hadoop at Merced Systems (now NICE Systems)
Personal
• Soccer dad 
• Sports fan – go Niners!
• Self-service Hadoop – what is it, why now?
• Key building blocks for self-service Hadoop
• Why Apache Ambari
• Delivering self-service with Ambari
• Demo
• Q&A
Talk Track
Self-service is the need of the hour for Hadoop
“……while Hadoop can handle huge data sets and make them useable, the
capabilities needed to set up and run Hadoop remain scarce and expensive…..”
Self-service models are proven to simplify and drive usage
Self-service Hadoop defined
Make it work the way users want to work today…
Files
NFS
RDBMS
I can access my
desktop analysis /
BI tool of choice
Analytics
/visualization
idea!
Point at data
and analyze
Self-service analytics: from idea to insights in minutes
Self-service Hadoop defined
Make it work the way users want to work today…
Self-service Hadoop: from idea to infrastructure to insights in minutes
I can provision
my own Hadoop
‘cluster’ so I have
Hive, Pig, BI tool,
etc.
Big Data
Analytics
/visualization
idea!
Point at data
and analyze,
extract insights
NFS
RDBMS
Self-service Hadoop examples
• Ad-hoc data exploration  can I blend this data with that data?
• Fail fast experimentation  you don’t know what you don’t know
• Test multiple predictive analytics models  get a dedicated sandbox
• Bursty workload  your boss needs you do an analytics drill
Without self-service Hadoop
It may not work the way your users want to work today…
From idea to infrastructure to insights in weeks
YES
NO NO
Provision cluster Copy data to cluster
NO
Wait!
Run Hadoop
analytics
jobs
Meet … wait …
email … why isn’t
my cluster ready?
Big Data
Analytics
/visualization
idea!
Lost business
opportunity,
insights no
longer relevant
YES YESHadoop
cluster
ready
Is my
data
there?
Code/q
uery
review
Key building blocks for self-service Hadoop
End user experience
Agility, elasticity and easy access
Enterprise IT
Operational support and oversight
Easy
Access
Tech
Support
Why Apache Ambari
 RESTful APIs to automate provisioning of Apache Hadoop clusters
• Capture basic cluster parameters from user and leverage Ambari APIs
 Granular control on deployment of services (e.g. Hive, Pig)
• Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user
• Speeds up availability of cluster by eliminating overhead
 Enterprise-grade security, management and monitoring capabilities
• IT admins can support user-created clusters with familiar mgmt console
Delivering self-service with Ambari
Your physical servers
+ =
VIRTUALIZED INFRASTRUCTURE
• Big Data VMs/Containers
• Self-service web UI
• Tenant/User Management
• DataTap (HDFS abstraction)
SELF-SERVICE HDP CLUSTERS
• HDP Virtual Hadoop clusters
• Ambari management console
• ‘Compute’ services (e.g. Hive)
+ =
Delivering self-service with Ambari
Self-service web interface – define cluster with a few mouse clicks
* Example screenshot from BlueData
integration with Apache Ambari
Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
Delivering self-service with Ambari
Creating virtual Hadoop clusters within minutes
* Example screenshot from BlueData integration with Apache Ambari
Delivering self-service with Ambari
Hadoop cluster provisioning using Ambari API
Phase 1: VMs
• Self-service request
• VMs provisioned
• Ambari server & agents
pre-deployed
• HDFS dependency
removed
Phase 2: Core Stack
• Agent registration with server
• REST API call to deploy HDP stack
• REST API to create core-site.xml to
use BlueData HDFS abstraction
• Start YARN/MRv2
• Shutdown HDFS service
Phase 3: Services
• Add specific services
requested by end user via
REST API calls
• Start ‘compute’ services
(e.g. Hive, Pig) requested
by user
• Update status of cluster
Design optimized for
cluster creation speed and user feedback
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig-
log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]}
http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal
curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata-
71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG
Delivering self-service with Ambari
REST API example to deploy specific service (Pig)
Service
Configs
Install
Delivering self-service with Ambari
Design choices and considerations
• Used Apache Ambari v1.7 for this example
• BlueData mgmt services orchestrate Ambari REST API calls
• Ambari Blueprints used bring up HDFS only
– Post cluster creation, services added using individual REST APIs for better control
– Blueprints/Stack Advisor do not provide REST API to track intermediate progress
• Used individual REST API calls with static configuration files
– Could not leverage Stack Advisor for individual services
Self-Service with Ambari:
Live Demo
Q&A
Contact me directly at …
Email: anant@bluedata.com
Twitter: @AnantCman
BlueData + Apache Ambari 1.7 Integration
Benefits Features
Infrastructure agility, elasticity, and efficiency – virtual HDP
clusters with the functionality and performance of physical
clusters
• Auto-provisioning of VM hosts with Ambari server and
agent components
• Automated, transparent deployment of CDH using REST
API for Stacks and Services.
Time savings for Data Scientists and Big Data
administrators
• Self-service virtual cluster creation by data scientists or
business analysts
• Troubleshooting and management by Big Data admins
using Apache Ambari
Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration,
and enterprise support

Self-Service Provisioning and Hadoop Management with Apache Ambari

  • 1.
    Self-Service Provisioning and HadoopManagement with Apache Ambari Anant Chintamaneni June 9th, 2015 1:45pm to 2:25pm
  • 2.
    This session ison self-service Hadoop for:  ON-PREMISES  IN YOUR DATA CENTER  USING YOUR INFRASTRUCTURE NOT X PUBLIC CLOUD (e.g. AMAZON EMR, AZURE etc.)
  • 3.
    About me • VPof Products at BlueData • @AnantCman on Twitter • Former Head of Hadoop Products at Pivotal • Championed Ambari at Pivotal • Introduced Hadoop at Merced Systems (now NICE Systems) Personal • Soccer dad  • Sports fan – go Niners!
  • 4.
    • Self-service Hadoop– what is it, why now? • Key building blocks for self-service Hadoop • Why Apache Ambari • Delivering self-service with Ambari • Demo • Q&A Talk Track
  • 5.
    Self-service is theneed of the hour for Hadoop “……while Hadoop can handle huge data sets and make them useable, the capabilities needed to set up and run Hadoop remain scarce and expensive…..” Self-service models are proven to simplify and drive usage
  • 6.
    Self-service Hadoop defined Makeit work the way users want to work today… Files NFS RDBMS I can access my desktop analysis / BI tool of choice Analytics /visualization idea! Point at data and analyze Self-service analytics: from idea to insights in minutes
  • 7.
    Self-service Hadoop defined Makeit work the way users want to work today… Self-service Hadoop: from idea to infrastructure to insights in minutes I can provision my own Hadoop ‘cluster’ so I have Hive, Pig, BI tool, etc. Big Data Analytics /visualization idea! Point at data and analyze, extract insights NFS RDBMS
  • 8.
    Self-service Hadoop examples •Ad-hoc data exploration  can I blend this data with that data? • Fail fast experimentation  you don’t know what you don’t know • Test multiple predictive analytics models  get a dedicated sandbox • Bursty workload  your boss needs you do an analytics drill
  • 9.
    Without self-service Hadoop Itmay not work the way your users want to work today… From idea to infrastructure to insights in weeks YES NO NO Provision cluster Copy data to cluster NO Wait! Run Hadoop analytics jobs Meet … wait … email … why isn’t my cluster ready? Big Data Analytics /visualization idea! Lost business opportunity, insights no longer relevant YES YESHadoop cluster ready Is my data there? Code/q uery review
  • 10.
    Key building blocksfor self-service Hadoop End user experience Agility, elasticity and easy access Enterprise IT Operational support and oversight Easy Access Tech Support
  • 11.
    Why Apache Ambari RESTful APIs to automate provisioning of Apache Hadoop clusters • Capture basic cluster parameters from user and leverage Ambari APIs  Granular control on deployment of services (e.g. Hive, Pig) • Only deploy ‘compute’ services (e.g. Hive, BI tool) requested by user • Speeds up availability of cluster by eliminating overhead  Enterprise-grade security, management and monitoring capabilities • IT admins can support user-created clusters with familiar mgmt console
  • 12.
    Delivering self-service withAmbari Your physical servers + = VIRTUALIZED INFRASTRUCTURE • Big Data VMs/Containers • Self-service web UI • Tenant/User Management • DataTap (HDFS abstraction) SELF-SERVICE HDP CLUSTERS • HDP Virtual Hadoop clusters • Ambari management console • ‘Compute’ services (e.g. Hive) + =
  • 13.
    Delivering self-service withAmbari Self-service web interface – define cluster with a few mouse clicks * Example screenshot from BlueData integration with Apache Ambari
  • 14.
    Delivering self-service withAmbari Creating virtual Hadoop clusters within minutes * Example screenshot from BlueData integration with Apache Ambari
  • 15.
    Delivering self-service withAmbari Creating virtual Hadoop clusters within minutes * Example screenshot from BlueData integration with Apache Ambari
  • 16.
    Delivering self-service withAmbari Hadoop cluster provisioning using Ambari API Phase 1: VMs • Self-service request • VMs provisioned • Ambari server & agents pre-deployed • HDFS dependency removed Phase 2: Core Stack • Agent registration with server • REST API call to deploy HDP stack • REST API to create core-site.xml to use BlueData HDFS abstraction • Start YARN/MRv2 • Shutdown HDFS service Phase 3: Services • Add specific services requested by end user via REST API calls • Start ‘compute’ services (e.g. Hive, Pig) requested by user • Update status of cluster Design optimized for cluster creation speed and user feedback
  • 17.
    curl -kib /root/BD_Setup/cookie_jar-H 'X-Requested-By: ambari' -X POST -d {"ServiceInfo":{"service_name":"PIG"}} http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG/components/PIG curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-env.json http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-properties.json http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d@pig-log4j.json http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/configurations curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- env","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- properties","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"Clusters":{"desired_configs":{"type":"pig- log4j","tag":"bluedata"}}} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7 curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X POST -d {"host_components":[{"HostRoles":{"component_name":"PIG"}}]} http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/hosts?Hosts/host_name=bluedata-71.openstacklocal curl -kib /root/BD_Setup/cookie_jar -H 'X-Requested-By: ambari' -X PUT -d {"ServiceInfo":{"state":"INSTALLED"}} http://bluedata- 71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG curl -u admin:admin -i -H 'X-Requested-By: ambari' -X GET http://bluedata-71.openstacklocal:8080/api/v1/clusters/Ambari_vm7/services/PIG Delivering self-service with Ambari REST API example to deploy specific service (Pig) Service Configs Install
  • 18.
    Delivering self-service withAmbari Design choices and considerations • Used Apache Ambari v1.7 for this example • BlueData mgmt services orchestrate Ambari REST API calls • Ambari Blueprints used bring up HDFS only – Post cluster creation, services added using individual REST APIs for better control – Blueprints/Stack Advisor do not provide REST API to track intermediate progress • Used individual REST API calls with static configuration files – Could not leverage Stack Advisor for individual services
  • 19.
  • 20.
    Q&A Contact me directlyat … Email: anant@bluedata.com Twitter: @AnantCman
  • 21.
    BlueData + ApacheAmbari 1.7 Integration Benefits Features Infrastructure agility, elasticity, and efficiency – virtual HDP clusters with the functionality and performance of physical clusters • Auto-provisioning of VM hosts with Ambari server and agent components • Automated, transparent deployment of CDH using REST API for Stacks and Services. Time savings for Data Scientists and Big Data administrators • Self-service virtual cluster creation by data scientists or business analysts • Troubleshooting and management by Big Data admins using Apache Ambari Administrator productivity & flexibility • Apache Ambari for monitoring, fine-grained configuration, and enterprise support

Editor's Notes

  • #6 “……Skills gaps continue to be a major adoption inhibitor for 57 percent of respondents, while figuring out how to get value from Hadoop was cited by 49 percent of respondents. The absence of skills has long been a key blocker. Tooling vendors claim their products also address the skills gap. While tools are improving, they primarily support highly skilled users rather than elevate the skills already available in most enterprises.
  • #19 Extends Ambari Stacks to include a “Stack Advisor” Provides recommendations for and performs validation on component layout & configuration Improves Stack pluggability Exposes new REST endpoints: /recommendations /validations REST endpoints used during Cluster Install Wizard and Configs UI