Savanna is an OpenStack project that aims to provide native provisioning and management of Hadoop clusters on OpenStack. Phase 1 provides basic cluster provisioning through templates and an API. Phase 2 will add advanced configuration, integration with management tools, and monitoring. Phase 3 plans "analytics as a service" through job execution APIs and UIs. The architecture is designed for extensibility and uses plugins to interface with provisioning systems.
2. Agenda
● Savanna Overview
● Roadmap
● Phase 1 Live Demo
● Phase 2 Features and Architecture
3. Savanna - Elastic Hadoop on OpenStack
Goal is to create native OpenStack component to
provision and operate Hadoop clusters on top of
OpenStack. Key characteristics:
● Open source
● Native for OpenStack
● Support for different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
4. Savanna Architecture Principles
● Designed as an OpenStack component
● Managed through REST API with UI available as
part of Horizon
● Pluggable system of Hadoop installation engines
● Integration with Hadoop vendor specific
management tools
● Predefined templates of Hadoop configurations
with ability to modify parameters
5. Use Cases
● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
6. Administrators Use Case
● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
● Utilization of free IaaS capacity for Hadoop tasks
7. Dev and QA Use Cases
● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
8. Analytics Use Cases
● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
9. Agenda
● Savanna Overview
● Roadmap
● Phase 1 Live Demo
● Phase 2 Features and Architecture
10. Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning
Phase 2
Cluster operation support and integration with tooling
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages
11. Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● UI integrated into Horizon
12. Phase 1 - Current Status
● All code and documentation open sourced
● Phase 1 completed, v 0.1 released on 04/10
● Launchpad home page
○ https://launchpad.net/savanna
● Code on stackforge
○ Integrated with OpenStack CI/CD
○ https://github.com/stackforge/savanna
● New contributors: RedHat and Hortonworks
13. Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
14. Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
15. Further Roadmap
● Autoscaling
● HBase support
● HA for NameNode
● HDFS and Swift integration
○ Caching of Swift data on HDFS
● Mahout as a service
● Integration with logging and error handling
16. How to Contribute
● Download and install Savanna
● Provide feedback and report bugs
● Share more ideas via IRC sessions or mailing
list
More details:
https://wiki.openstack.org/wiki/Savanna/HowToParticipate
17. Agenda
● Savanna Overview
● Roadmap
● Phase 1 Live Demo
● Phase 2 Features and Architecture
18. Agenda
● Savanna Overview
● Roadmap
● Phase 1 Live Demo
● Phase 2 Features and Architecture
19. Architecture Overview
Hadoop Hadoop
VM VM
Keystone
Hadoop Hadoop
Horizon VM VM
Savanna
Pages
Auth
Swift
REST API
Savanna Cluster
Provisioning
Python Configuration
Plugin
Client Manager
Nova
VM
Manager
DAL Glance
Image
Registry
20. Extensible Provisioning
Image registry
- register image in
S Plugin Savanna
a ● get extra configs - add/remove tags
- get image by tag
v ● validate input
a ● launch/terminate
n cluster VM manager
n ● add/remove nodes - launch/terminate VMs
a - get VM status
- ssh/scp to VM
21. Provisioning Interaction
get extra parameters
for the plugin
get extra parameters
S
launch cluster
a validate cluster
parameters
P
U v l
s u
e a launch cluster
g
launch cluster
r n i
n n
add/remove nodes a
add/remove nodes add/remove nodes
22. Provisioning: Launching a Cluster
get image by tag
Image
Registry
P
l
u launch VMs
g launch VMs
i Hadoop
VM
Hadoop
VM
VM pass
n install and Manager commands Hadoop Hadoop
configure via ssh, scp VM VM
Hadoop