Savanna is an OpenStack component that allows elastic provisioning of Hadoop clusters in OpenStack. It has a 3 phase roadmap - phase 1 allows basic cluster provisioning which is complete, phase 2 will add advanced configuration and tool integration currently in progress, and phase 3 will enable analytics as a service with a job execution framework. Savanna uses an extensible plugin architecture to provision Hadoop VMs and configure the clusters, integrating with other OpenStack components like Nova, Glance, and Swift.
2. ● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
3. ● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
4. ● Open source native OpenStack component
● Supports different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
● Managed through REST API
● Web UI as part of the OpenStack Dashboard
● Flexible templates of Hadoop configurations
Savanna - Elastic Hadoop on OpenStack
5. ● Project home - https://launchpad.net/savanna
○ bug tracking
○ blueprints
○ answers
● Code review (gerrit) - https://review.openstack.org
● Sources - https://github.com/stackforge/savanna
● Mailing list - savanna-all@lists.launchpad.net
● CI - https://jenkins.openstack.org and
http://jenkins.savanna.mirantis.com
Savanna - Elastic Hadoop on OpenStack
6. ● Contributors:
○ large core team from Mirantis
○ teams from RedHat, Hortonworks
○ several minor contributors
● Intel joined recently
● Several upcoming customers
Savanna - Participants
7. ● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
8. ● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
Savanna Use Cases
9. ● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling:
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
○ Intel Hadoop
● Utilization of free IaaS capacity for Hadoop tasks
Administrators Use Case
10. ● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
Dev and QA Use Cases
11. ● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
Analytics Use Cases
12. ● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
13. Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
14. Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● Web UI integrated into OpenStack Dashboard
15. Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
16. Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
17. Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
18. Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
19. Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3 [Planned - October 15]
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
20. Further Roadmap
● Autoscaling
● HA for NameNode
● Deeper HDFS and Swift integration
○ Caching of Swift data on HDFS
● Integration with logging and error handling
● HBase support
21. ● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
39. Extensible Provisioning
● get extra configs
● validate input
● launch/terminate cluster
● add/remove nodes
● launch/terminate VMs
● get VM status
● ssh/scp to VM
Instance Interop
● register image in
Savanna
● add/remove tags
● get image by tag
Image registry
Plugin
S
a
v
a
n
n
a
40. get extra parameters
add/remove nodes
Provisioning Interaction
launch cluster
launch cluster
get extra parameters
for the plugin
S
a
v
a
n
n
a
U
s
e
r
P
l
u
g
i
n
validate cluster
parameters
add/remove nodes
launch cluster
add/remove nodes
41. Provisioning: Launching a Cluster
launch VMs
P
L
U
G
I
N
Image
Registry
Instance
Interop
Helper
get image
by tag
launch VMs
install and
configure
Hadoop
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
pass
commands
via ssh, scp