Savanna
Hadoop on
OpenStack
Ilya Elterman (Mirantis)
Matthew Farrellee (Red Hat)
Sergey Lukjanov (Mirantis)
Agenda
● Savanna Overview
● Current state
○ EDP overview
○ other features

● Roadmap
● Live Demo
Agenda
● Savanna Overview
● Current state
○ EDP overview
○ other features

● Roadmap
● Live Demo
OpenStack Data Processing - Savanna
Mission:
To provide the OpenStack community with an open,
cutting edge, performant and scalable data
processing stack and associated management
interfaces
● provision and operate Hadoop clusters
● schedule and operate Hadoop jobs
Hadoop - Big Data Platform
Popularity
Hadoop

OpenStack

http://www.google.com/trends/explore?q=hadoop+openstack#q=openstack%2C%20hadoop&cmpt=q
Use Cases
● Self-service provisioning of Hadoop clusters
● Utilization of unused compute capacity for
bursty workloads
● Run Hadoop workloads in few clicks without
expertise in Hadoop ops
Architecture Overview
Keystone

Hadoop
VM
Hadoop
VM

Horizon

Hadoop
VM

Auth

Savanna
Pages

Cluster
Configuration
Manager
REST API

Savanna
Python
Client

Hadoop
VM

Vendors
Plugins

Job
Sources Job
Manager

Data
Access
Layer

Swift
Nova
Trove DB

Data
Sources

Resources
Orchestration
Manager

Glance

Heat

Cinder

Neutron
Savanna Status
● Official incubated OpenStack project
● v0.3 released 17 Oct 2013
● Supported Hadoop distros:
○ Vanilla Apache Hadoop (reference implementation)
○ Hortonworks Data Platform 1.3.x
○ Intel Distribution on review
○ Cloudera Distribution in blueprint
● Included in OpenStack distros:
○ RDO - http://openstack.redhat.com
○ Mirantis OpenStack - http://software.mirantis.com
Cluster Provisioning Performance
Agenda
● Savanna Overview
● Current state
○ EDP overview
○ other features

● Roadmap
● Live Demo
EDP Overview
● End users have data and questions
○ The data lives in a data repository
○ The questions are embodied in code

● Savanna Elastic Data Processing (EDP) brings the
Hadoop ecosystem to the end user
○ Hides all cluster management behind the scenes
EDP

“Customers launch millions of
Amazon EMR clusters every
year.”
http://aws.amazon.com/elasticmapreduce/
EDP
● Variety and depth of value add offerings on top of
clouds are growing
● Offerings are rarely open, rarely allow for choice
● Examples - Google Cloud, Azure, AWS
EDP
Savanna and EDP can both match and
exceed use cases provided by most
public clouds
EDP in Savanna v0.3
● UI, integrated into Horizon, for ad-hoc analytics
queries based on Hive or Pig
● API to execute MapReduce jobs without exposing
details of underlying infrastructure
● Pluggable data sources: Swift
● Supported job types: Jar, Pig, Hive
● Integration with Oozie for workflow management
Agenda
● Savanna Overview
● Current state
○ EDP overview
○ other features

● Roadmap
● Live Demo
Cluster Ops in Savanna 0.3
REST API
Configuration templates
Manual cluster scaling
Data node anti-affinity and location control
Full support of data locality - rack and 4-level
awareness for HDFS and Swift
● Swift integration
●
●
●
●
●
OpenStack Integration in Savanna 0.3
●
●
●
●

OpenStack Dashboard plugin
Both Neutron and Nova Network support
Keystone trusts used for async operations
Python client
Agenda
● Savanna Overview
● Current state
○ EDP overview
○ other features

● Roadmap
● Live Demo
Live Demo
Icehouse Roadmap
● Integration with OpenStack ecosystem
○ Heat
○ Tempest
○ Devstack
○ Ceilometer
○ Ironic
● EDP enhancements
● Code hardening
● Polished api v2
● Performance testing
Design Summit Sessions
Friday, November 8
● 1:30pm Network and installation topologies
● 2:20pm Heat integration and scalability
● 3:10pm Further OpenStack integration
● 4:10pm Savanna in Icehouse
http://goo.gl/2iEv8u
Q&A

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack