SlideShare a Scribd company logo
1 of 43
Download to read offline
Savanna -
Hadoop on
OpenStack
Mirantis, 2013Sergey Lukjanov
Savanna Technical Lead
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Open source native OpenStack component
● Supports different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
● Managed through REST API
● Web UI as part of the OpenStack Dashboard
● Flexible templates of Hadoop configurations
Savanna - Elastic Hadoop on OpenStack
● Project home - https://launchpad.net/savanna
○ bug tracking
○ blueprints
○ answers
● Code review (gerrit) - https://review.openstack.org
● Sources - https://github.com/stackforge/savanna
● Mailing list - savanna-all@lists.launchpad.net
● CI - https://jenkins.openstack.org and
http://jenkins.savanna.mirantis.com
Savanna - Elastic Hadoop on OpenStack
● Contributors:
○ large core team from Mirantis
○ teams from RedHat, Hortonworks
○ several minor contributors
● Intel joined recently
● Several upcoming customers
Savanna - Participants
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
Savanna Use Cases
● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling:
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
○ Intel Hadoop
● Utilization of free IaaS capacity for Hadoop tasks
Administrators Use Case
● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
Dev and QA Use Cases
● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
Analytics Use Cases
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● Web UI integrated into OpenStack Dashboard
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3 [Planned - October 15]
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Further Roadmap
● Autoscaling
● HA for NameNode
● Deeper HDFS and Swift integration
○ Caching of Swift data on HDFS
● Integration with logging and error handling
● HBase support
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Architecture Overview
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
DAL
Nova
Glance
Swift
Savanna
Pages
Hadoop
VM
Provisioning
Plugin
Hadoop
VM
Hadoop
VM
Hadoop
VM
Instance
Interop Helper
Image
Registry
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: single DN per host
DN
Compute
TT | DN
Compute
DN
Compute
DN
Cluster A Cluster B
HDFS Reliability: Hadoop-8468
hypervisor-awareness for HDFS scheduler
DN
Compute
DN DN
Compute
DN DN
Compute
DN
HDFS
Data Block
HDFS Reliability: Hadoop-8545
enables Swift for Hadoop
Swift
Hadoop
Job #1
HDFS
Hadoop
Job #2
...
Hadoop
Job #N
initial input
final output
● Master node(s)
● Worker nodes
Configurable topology of DN, NN, TT, JT
JT | NN JT NN+
TTTT | DN DN
10 6 8
HDFS Placement Options
● Ephemeral drive
/var/lib/nova/instances/instance-xxx/disk ->
/mnt/ephemeral
● Block storage volume
Cinder Volume -> /mnt/volume
● Bare hard drive support
/dev/sdb -> /mnt/sdb
Q&A
We are hiring!
Phase 1 deployment mechanism
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Savanna
Provision VMs with
pre-installed Hadoop
Configure Hadoop
Cluster
Tool usage scenarios
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Tool
Manage Hadoop Cluster
VMVM
VM VM
Tool
Provision &
Manage Hadoop Cluster
Scenario I
Scenario II
Extensible Provisioning
● get extra configs
● validate input
● launch/terminate cluster
● add/remove nodes
● launch/terminate VMs
● get VM status
● ssh/scp to VM
Instance Interop
● register image in
Savanna
● add/remove tags
● get image by tag
Image registry
Plugin
S
a
v
a
n
n
a
get extra parameters
add/remove nodes
Provisioning Interaction
launch cluster
launch cluster
get extra parameters
for the plugin
S
a
v
a
n
n
a
U
s
e
r
P
l
u
g
i
n
validate cluster
parameters
add/remove nodes
launch cluster
add/remove nodes
Provisioning: Launching a Cluster
launch VMs
P
L
U
G
I
N
Image
Registry
Instance
Interop
Helper
get image
by tag
launch VMs
install and
configure
Hadoop
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
pass
commands
via ssh, scp
Q&A
We are hiring!

More Related Content

What's hot

Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno EditionOpenStack Foundation
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackMirantis
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)Nicolas Poggi
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...Spark Summit
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Summit
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeDataWorks Summit
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaDataStax Academy
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 

What's hot (20)

Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Feeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and KafkaFeeding Cassandra with Spark-Streaming and Kafka
Feeding Cassandra with Spark-Streaming and Kafka
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 

Viewers also liked

Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance
 
Product Release Road-map Guide
Product Release Road-map GuideProduct Release Road-map Guide
Product Release Road-map GuideBim Akinfenwa
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2
 
Metalnox Product Overview
Metalnox Product OverviewMetalnox Product Overview
Metalnox Product OverviewDan Barefoot
 
Share point 2010 roadmap
Share point 2010 roadmapShare point 2010 roadmap
Share point 2010 roadmapctc TrainCanada
 
Roadmap for successful IT budgeting
Roadmap for successful IT budgetingRoadmap for successful IT budgeting
Roadmap for successful IT budgetingAbsoft Limited
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesMobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesNuxeo
 
Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Nuxeo
 
Gemtalk Product Roadmap
Gemtalk Product RoadmapGemtalk Product Roadmap
Gemtalk Product RoadmapESUG
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
Mr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in indiaMr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in india
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in indiadhaval2929
 
Introduction to GreenTouch
Introduction to GreenTouchIntroduction to GreenTouch
Introduction to GreenTouchgreentouch-org
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practicessarjanacoid
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writersamiable_indian
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...Puppet
 
Change Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic TemplateChange Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic Templatedmdk12
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...Puppet
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadsterdmyers1
 
Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Alison J. Herzog, MBA
 

Viewers also liked (20)

Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap StructureOpen Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
 
Product Release Road-map Guide
Product Release Road-map GuideProduct Release Road-map Guide
Product Release Road-map Guide
 
WSO2 Quarterly Technical Update
WSO2 Quarterly Technical UpdateWSO2 Quarterly Technical Update
WSO2 Quarterly Technical Update
 
Metalnox Product Overview
Metalnox Product OverviewMetalnox Product Overview
Metalnox Product Overview
 
Share point 2010 roadmap
Share point 2010 roadmapShare point 2010 roadmap
Share point 2010 roadmap
 
Roadmap for successful IT budgeting
Roadmap for successful IT budgetingRoadmap for successful IT budgeting
Roadmap for successful IT budgeting
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devicesMobile ECM: Using the Nuxeo Platform from mobile devices
Mobile ECM: Using the Nuxeo Platform from mobile devices
 
Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014Technical roadmap 2015 - Nuxeo Tour 2014
Technical roadmap 2015 - Nuxeo Tour 2014
 
Windows azure overview
Windows azure overviewWindows azure overview
Windows azure overview
 
Gemtalk Product Roadmap
Gemtalk Product RoadmapGemtalk Product Roadmap
Gemtalk Product Roadmap
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
Mr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in indiaMr. Ravi Shankar Gopal |  Roadmap for  growth in nonwovens  industry  in india
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
 
Introduction to GreenTouch
Introduction to GreenTouchIntroduction to GreenTouch
Introduction to GreenTouch
 
New Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best PracticesNew Products - Template and Roadmap Best Practices
New Products - Template and Roadmap Best Practices
 
Reverse Engineering for exploit writers
Reverse Engineering for exploit writersReverse Engineering for exploit writers
Reverse Engineering for exploit writers
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
 
Asap roadmap
Asap roadmapAsap roadmap
Asap roadmap
 
Change Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic TemplateChange Presented ad A Project Roadmap: Infographic Template
Change Presented ad A Project Roadmap: Infographic Template
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
 
Tesla roadster
Tesla roadsterTesla roadster
Tesla roadster
 
Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap Mapping the Experience: How to Plan a Career Roadmap
Mapping the Experience: How to Plan a Career Roadmap
 

Similar to Savanna - Elastic Hadoop on OpenStack

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in ProductionSean Cohen
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaMopuru Babu
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...Wong Hoi Sing Edison
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSerge Pagop
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 

Similar to Savanna - Elastic Hadoop on OpenStack (20)

Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production3-2-1 Action! Running OpenStack Shared File System Service in Production
3-2-1 Action! Running OpenStack Shared File System Service in Production
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scalaSunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Sap integration with_j_boss_technologies
Sap integration with_j_boss_technologiesSap integration with_j_boss_technologies
Sap integration with_j_boss_technologies
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 
Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
TechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANATechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANA
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

More from Sergey Lukjanov

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStackSergey Lukjanov
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkSergey Lukjanov
 

More from Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Savanna - Elastic Hadoop on OpenStack

  • 1. Savanna - Hadoop on OpenStack Mirantis, 2013Sergey Lukjanov Savanna Technical Lead
  • 2. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 3. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 4. ● Open source native OpenStack component ● Supports different Hadoop distributions ● Solves both bare cluster provisioning use case and "analytics as a service" ● Managed through REST API ● Web UI as part of the OpenStack Dashboard ● Flexible templates of Hadoop configurations Savanna - Elastic Hadoop on OpenStack
  • 5. ● Project home - https://launchpad.net/savanna ○ bug tracking ○ blueprints ○ answers ● Code review (gerrit) - https://review.openstack.org ● Sources - https://github.com/stackforge/savanna ● Mailing list - savanna-all@lists.launchpad.net ● CI - https://jenkins.openstack.org and http://jenkins.savanna.mirantis.com Savanna - Elastic Hadoop on OpenStack
  • 6. ● Contributors: ○ large core team from Mirantis ○ teams from RedHat, Hortonworks ○ several minor contributors ● Intel joined recently ● Several upcoming customers Savanna - Participants
  • 7. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 8. ● Administrators - centralized cluster management and monitoring ● Dev and QA teams - fast clusters provisioning ● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood ● Making resources dedicated to IaaS cloud available for Hadoop workload Savanna Use Cases
  • 9. ● Central point of control over infrastructure ● Enables self-service capabilities, including choice of Hadoop distribution to be used ● Integration with vendor tooling: ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console ○ Intel Hadoop ● Utilization of free IaaS capacity for Hadoop tasks Administrators Use Case
  • 10. ● Fast on-demand provisioning of the environments ● Increase agility and speed of innovation ● Controlled access to data from production Dev and QA Use Cases
  • 11. ● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive) ● Bursty workload: ad-hoc queries requiring a significant resource only for short time period ● Utilization of free IaaS capacity for Hadoop tasks Analytics Use Cases
  • 12. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 13. Roadmap for Hadoop in Cloud Phase 1 Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 14. Phase 1 - Basic Cluster Operation ● Cluster provisioning ● Deployment Engine implementation for pre- installed images ● Templates for Hadoop cluster configuration ● REST API for cluster startup and operations ● Web UI integrated into OpenStack Dashboard
  • 15. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 16. Phase 2 - Advanced Configuration ● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters ● Integration with vendor deployment/management tooling ● Basic monitoring support
  • 17. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 18. Phase 3 - Analytics as a Service ● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR) ● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  • 19. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 [Planned - October 15] "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 20. Further Roadmap ● Autoscaling ● HA for NameNode ● Deeper HDFS and Swift integration ○ Caching of Swift data on HDFS ● Integration with logging and error handling ● HBase support
  • 21. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 23. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 24. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 25. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 26. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 27. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 28. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 29. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 30. HDFS Reliability: single DN per host DN Compute TT | DN Compute DN Compute DN Cluster A Cluster B
  • 31. HDFS Reliability: Hadoop-8468 hypervisor-awareness for HDFS scheduler DN Compute DN DN Compute DN DN Compute DN HDFS Data Block
  • 32. HDFS Reliability: Hadoop-8545 enables Swift for Hadoop Swift Hadoop Job #1 HDFS Hadoop Job #2 ... Hadoop Job #N initial input final output
  • 33. ● Master node(s) ● Worker nodes Configurable topology of DN, NN, TT, JT JT | NN JT NN+ TTTT | DN DN 10 6 8
  • 34. HDFS Placement Options ● Ephemeral drive /var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral ● Block storage volume Cinder Volume -> /mnt/volume ● Bare hard drive support /dev/sdb -> /mnt/sdb
  • 35. Q&A
  • 37. Phase 1 deployment mechanism Hadoop VM Hadoop VM Hadoop VM Hadoop VM Savanna Provision VMs with pre-installed Hadoop Configure Hadoop Cluster
  • 38. Tool usage scenarios Hadoop VM Hadoop VM Hadoop VM Hadoop VM Tool Manage Hadoop Cluster VMVM VM VM Tool Provision & Manage Hadoop Cluster Scenario I Scenario II
  • 39. Extensible Provisioning ● get extra configs ● validate input ● launch/terminate cluster ● add/remove nodes ● launch/terminate VMs ● get VM status ● ssh/scp to VM Instance Interop ● register image in Savanna ● add/remove tags ● get image by tag Image registry Plugin S a v a n n a
  • 40. get extra parameters add/remove nodes Provisioning Interaction launch cluster launch cluster get extra parameters for the plugin S a v a n n a U s e r P l u g i n validate cluster parameters add/remove nodes launch cluster add/remove nodes
  • 41. Provisioning: Launching a Cluster launch VMs P L U G I N Image Registry Instance Interop Helper get image by tag launch VMs install and configure Hadoop Hadoop VM Hadoop VM Hadoop VM Hadoop VM pass commands via ssh, scp
  • 42. Q&A