SlideShare a Scribd company logo
Savanna -
Hadoop on
OpenStack
Mirantis, 2013Sergey Lukjanov
Savanna Technical Lead
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Open source native OpenStack component
● Supports different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
● Managed through REST API
● Web UI as part of the OpenStack Dashboard
● Flexible templates of Hadoop configurations
Savanna - Elastic Hadoop on OpenStack
● Project home - https://launchpad.net/savanna
○ bug tracking
○ blueprints
○ answers
● Code review (gerrit) - https://review.openstack.org
● Sources - https://github.com/stackforge/savanna
● Mailing list - savanna-all@lists.launchpad.net
● CI - https://jenkins.openstack.org and
http://jenkins.savanna.mirantis.com
Savanna - Elastic Hadoop on OpenStack
● Contributors:
○ large core team from Mirantis
○ teams from RedHat, Hortonworks
○ several minor contributors
● Intel joined recently
● Several upcoming customers
Savanna - Participants
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
Savanna Use Cases
● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling:
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
○ Intel Hadoop
● Utilization of free IaaS capacity for Hadoop tasks
Administrators Use Case
● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
Dev and QA Use Cases
● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
Analytics Use Cases
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● Web UI integrated into OpenStack Dashboard
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3 [Planned - October 15]
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Further Roadmap
● Autoscaling
● HA for NameNode
● Deeper HDFS and Swift integration
○ Caching of Swift data on HDFS
● Integration with logging and error handling
● HBase support
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Architecture Overview
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
DAL
Nova
Glance
Swift
Savanna
Pages
Hadoop
VM
Provisioning
Plugin
Hadoop
VM
Hadoop
VM
Hadoop
VM
Instance
Interop Helper
Image
Registry
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: single DN per host
DN
Compute
TT | DN
Compute
DN
Compute
DN
Cluster A Cluster B
HDFS Reliability: Hadoop-8468
hypervisor-awareness for HDFS scheduler
DN
Compute
DN DN
Compute
DN DN
Compute
DN
HDFS
Data Block
HDFS Reliability: Hadoop-8545
enables Swift for Hadoop
Swift
Hadoop
Job #1
HDFS
Hadoop
Job #2
...
Hadoop
Job #N
initial input
final output
● Master node(s)
● Worker nodes
Configurable topology of DN, NN, TT, JT
JT | NN JT NN+
TTTT | DN DN
10 6 8
HDFS Placement Options
● Ephemeral drive
/var/lib/nova/instances/instance-xxx/disk ->
/mnt/ephemeral
● Block storage volume
Cinder Volume -> /mnt/volume
● Bare hard drive support
/dev/sdb -> /mnt/sdb
Q&A
We are hiring!
Phase 1 deployment mechanism
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Savanna
Provision VMs with
pre-installed Hadoop
Configure Hadoop
Cluster
Tool usage scenarios
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Tool
Manage Hadoop Cluster
VMVM
VM VM
Tool
Provision &
Manage Hadoop Cluster
Scenario I
Scenario II
Extensible Provisioning
● get extra configs
● validate input
● launch/terminate cluster
● add/remove nodes
● launch/terminate VMs
● get VM status
● ssh/scp to VM
Instance Interop
● register image in
Savanna
● add/remove tags
● get image by tag
Image registry
Plugin
S
a
v
a
n
n
a
get extra parameters
add/remove nodes
Provisioning Interaction
launch cluster
launch cluster
get extra parameters
for the plugin
S
a
v
a
n
n
a
U
s
e
r
P
l
u
g
i
n
validate cluster
parameters
add/remove nodes
launch cluster
add/remove nodes
Provisioning: Launching a Cluster
launch VMs
P
L
U
G
I
N
Image
Registry
Instance
Interop
Helper
get image
by tag
launch VMs
install and
configure
Hadoop
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
pass
commands
via ssh, scp
Q&A
We are hiring!

More Related Content

What's hot

Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
OpenStack Foundation
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
Mirantis
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
spinningmatt
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
rhatr
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
Yousun Jeong
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
Wei Ting Chen
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
Yousun Jeong
 

What's hot (20)

Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 

Viewers also liked

Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance
 
Product Release Road-map Guide
Product Release Road-map GuideProduct Release Road-map Guide
Product Release Road-map Guide
Bim Akinfenwa
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2
 
Metalnox Product Overview
Metalnox Product OverviewMetalnox Product Overview
Metalnox Product Overview
Dan Barefoot
 
Share point 2010 roadmap
Share point 2010 roadmapShare point 2010 roadmap
Share point 2010 roadmap
ctc TrainCanada
 
Roadmap for successful IT budgeting
Roadmap for successful IT budgetingRoadmap for successful IT budgeting
Roadmap for successful IT budgeting
Absoft Limited
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesMobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devices
Nuxeo
 
Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014
Nuxeo
 
Windows azure overview
Windows azure overviewWindows azure overview
Windows azure overview
ctc TrainCanada
 
Gemtalk Product Roadmap
Gemtalk Product RoadmapGemtalk Product Roadmap
Gemtalk Product Roadmap
ESUG
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
Mr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in indiaMr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in india
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
dhaval2929
 
Introduction to GreenTouch
Introduction to GreenTouchIntroduction to GreenTouch
Introduction to GreenTouch
greentouch-org
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practicessarjanacoid
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writersamiable_indian
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
Puppet
 
Change Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic TemplateChange Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic Template
dmdk12
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
Puppet
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadsterdmyers1
 
Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap
Alison J. Herzog, MBA
 

Viewers also liked (20)

Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
 
Product Release Road-map Guide
Product Release Road-map GuideProduct Release Road-map Guide
Product Release Road-map Guide
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
 
Metalnox Product Overview
Metalnox Product OverviewMetalnox Product Overview
Metalnox Product Overview
 
Share point 2010 roadmap
Share point 2010 roadmapShare point 2010 roadmap
Share point 2010 roadmap
 
Roadmap for successful IT budgeting
Roadmap for successful IT budgetingRoadmap for successful IT budgeting
Roadmap for successful IT budgeting
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesMobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devices
 
Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014
 
Windows azure overview
Windows azure overviewWindows azure overview
Windows azure overview
 
Gemtalk Product Roadmap
Gemtalk Product RoadmapGemtalk Product Roadmap
Gemtalk Product Roadmap
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
Mr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in indiaMr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in india
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
 
Introduction to GreenTouch
Introduction to GreenTouchIntroduction to GreenTouch
Introduction to GreenTouch
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practices
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writers
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
 
Asap roadmap
Asap roadmapAsap roadmap
Asap roadmap
 
Change Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic TemplateChange Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic Template
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadster
 
Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap
 

Similar to Savanna - Elastic Hadoop on OpenStack

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production
Sean Cohen
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Mopuru Babu
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Anant Corporation
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
Wong Hoi Sing Edison
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSerge Pagop
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
Seungdon Choi
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
TechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANATechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANA
Jarut Nakaramaleerat
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
Lynn Langit
 

Similar to Savanna - Elastic Hadoop on OpenStack (20)

Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
TechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANATechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANA
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

More from Sergey Lukjanov

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
Sergey Lukjanov
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
Sergey Lukjanov
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
Sergey Lukjanov
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
Sergey Lukjanov
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
Sergey Lukjanov
 

More from Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Recently uploaded

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

Savanna - Elastic Hadoop on OpenStack

  • 1. Savanna - Hadoop on OpenStack Mirantis, 2013Sergey Lukjanov Savanna Technical Lead
  • 2. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 3. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 4. ● Open source native OpenStack component ● Supports different Hadoop distributions ● Solves both bare cluster provisioning use case and "analytics as a service" ● Managed through REST API ● Web UI as part of the OpenStack Dashboard ● Flexible templates of Hadoop configurations Savanna - Elastic Hadoop on OpenStack
  • 5. ● Project home - https://launchpad.net/savanna ○ bug tracking ○ blueprints ○ answers ● Code review (gerrit) - https://review.openstack.org ● Sources - https://github.com/stackforge/savanna ● Mailing list - savanna-all@lists.launchpad.net ● CI - https://jenkins.openstack.org and http://jenkins.savanna.mirantis.com Savanna - Elastic Hadoop on OpenStack
  • 6. ● Contributors: ○ large core team from Mirantis ○ teams from RedHat, Hortonworks ○ several minor contributors ● Intel joined recently ● Several upcoming customers Savanna - Participants
  • 7. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 8. ● Administrators - centralized cluster management and monitoring ● Dev and QA teams - fast clusters provisioning ● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood ● Making resources dedicated to IaaS cloud available for Hadoop workload Savanna Use Cases
  • 9. ● Central point of control over infrastructure ● Enables self-service capabilities, including choice of Hadoop distribution to be used ● Integration with vendor tooling: ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console ○ Intel Hadoop ● Utilization of free IaaS capacity for Hadoop tasks Administrators Use Case
  • 10. ● Fast on-demand provisioning of the environments ● Increase agility and speed of innovation ● Controlled access to data from production Dev and QA Use Cases
  • 11. ● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive) ● Bursty workload: ad-hoc queries requiring a significant resource only for short time period ● Utilization of free IaaS capacity for Hadoop tasks Analytics Use Cases
  • 12. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 13. Roadmap for Hadoop in Cloud Phase 1 Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 14. Phase 1 - Basic Cluster Operation ● Cluster provisioning ● Deployment Engine implementation for pre- installed images ● Templates for Hadoop cluster configuration ● REST API for cluster startup and operations ● Web UI integrated into OpenStack Dashboard
  • 15. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 16. Phase 2 - Advanced Configuration ● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters ● Integration with vendor deployment/management tooling ● Basic monitoring support
  • 17. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 18. Phase 3 - Analytics as a Service ● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR) ● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  • 19. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 [Planned - October 15] "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 20. Further Roadmap ● Autoscaling ● HA for NameNode ● Deeper HDFS and Swift integration ○ Caching of Swift data on HDFS ● Integration with logging and error handling ● HBase support
  • 21. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 23. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 24. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 25. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 26. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 27. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 28. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 29. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 30. HDFS Reliability: single DN per host DN Compute TT | DN Compute DN Compute DN Cluster A Cluster B
  • 31. HDFS Reliability: Hadoop-8468 hypervisor-awareness for HDFS scheduler DN Compute DN DN Compute DN DN Compute DN HDFS Data Block
  • 32. HDFS Reliability: Hadoop-8545 enables Swift for Hadoop Swift Hadoop Job #1 HDFS Hadoop Job #2 ... Hadoop Job #N initial input final output
  • 33. ● Master node(s) ● Worker nodes Configurable topology of DN, NN, TT, JT JT | NN JT NN+ TTTT | DN DN 10 6 8
  • 34. HDFS Placement Options ● Ephemeral drive /var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral ● Block storage volume Cinder Volume -> /mnt/volume ● Bare hard drive support /dev/sdb -> /mnt/sdb
  • 35. Q&A
  • 37. Phase 1 deployment mechanism Hadoop VM Hadoop VM Hadoop VM Hadoop VM Savanna Provision VMs with pre-installed Hadoop Configure Hadoop Cluster
  • 38. Tool usage scenarios Hadoop VM Hadoop VM Hadoop VM Hadoop VM Tool Manage Hadoop Cluster VMVM VM VM Tool Provision & Manage Hadoop Cluster Scenario I Scenario II
  • 39. Extensible Provisioning ● get extra configs ● validate input ● launch/terminate cluster ● add/remove nodes ● launch/terminate VMs ● get VM status ● ssh/scp to VM Instance Interop ● register image in Savanna ● add/remove tags ● get image by tag Image registry Plugin S a v a n n a
  • 40. get extra parameters add/remove nodes Provisioning Interaction launch cluster launch cluster get extra parameters for the plugin S a v a n n a U s e r P l u g i n validate cluster parameters add/remove nodes launch cluster add/remove nodes
  • 41. Provisioning: Launching a Cluster launch VMs P L U G I N Image Registry Instance Interop Helper get image by tag launch VMs install and configure Hadoop Hadoop VM Hadoop VM Hadoop VM Hadoop VM pass commands via ssh, scp
  • 42. Q&A