SlideShare a Scribd company logo
1 of 40
Download to read offline
© MIRANTIS 2013
The State of OpenStack
Data Processing: Sahara,
Now and in Juno
Sergey Lukjanov (Mirantis)
Matthew Farrellee (Red Hat)
John Speidel (Hortonworks)
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interfaces.
• provision and operate Hadoop clusters
• schedule and operate Hadoop jobs
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Use cases
• Self-service provisioning of Hadoop clusters
• Utilization of unused compute capacity for
bursty workloads
• Dev -> Stage -> Prod lifecycle
• Run Hadoop workloads in few clicks without
expertise in Hadoop ops
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Savanna
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Data Platform
• Intel Distribution
• Cloudera Distribution in blueprint
• Included into OpenStack distros:
• RDO - openstack.redhat.com
• Mirantis OpenStack - software.mirantis.com
Contributors
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Icehouse release
142 bugs fixed
Icehouse release
57 blueprints
Icehouse release
32 people
Icehouse release
Standard process
Icehouse release
Dozens more
in the client!
Icehouse release
Tempest helps us manage our API
Icehouse release
Sahara easily deployed with DevStack
Icehouse release
Hadoop 2 available via all plugins
© http://hortonworks.com/hadoop/yarn/
Icehouse release
• HBase (and Sqoop) available via HDP plugin
• Spark images w/ diskimage-builder (full plugin in review)
• Heat for provisioning
• i18n translation started
• Neutron namespaces w/ rootwrap
• Guest agent implementation started
Elastic Data Processing (EDP) is Sahara’s take on
data processing workflow management.
Goal - let end users (those w/ high value questions
to answer) get answers about data without having
to know a single thing about cluster management.
“Customers launch millions of Amazon EMR clusters every year.”
http://aws.amazon.com/elasticmapreduce/
Elastic Data Processing update
Elastic Data Processing update
Available with the Hortonworks Data
Platform plugin
Elastic Data Processing update
Support for
external HDFS
Elastic Data Processing update
MapReduce.Streaming
and Java actions
Elastic Data Processing update
Job relaunch, with new data and parameters
Command line interface overview
If you can do it with the Dashboard, you
can do it from the command-line
Blueprint: python-savannaclient-cli
Command line interface overview
Image management
$ sahara
...
Positional arguments:
<subcommand>
image-add-tag Add a tag to an image.
image-list Print a list of available images.
image-register Register an image from the Image index.
image-remove-tag Remove a tag from an image.
image-show Show details of an image.
image-unregister Unregister an image.
Command line interface overview
Node group, cluster and job templates
$ sahara
node-group-template-create Create a node group...
node-group-template-delete Delete a node group...
node-group-template-list Print a list of available...
node-group-template-show Show details of a node...
cluster-template-create Create a cluster template.
cluster-template-delete Delete a cluster template.
cluster-template-list Print a list of available...
cluster-template-show Show details of a cluster...
job-template-create Create a job template.
job-template-delete Delete a job template.
job-template-list Print a list of job...
job-template-show Show details of a job...
Command line interface overview
Data sources and job binaries
$ sahara
...
<subcommand>
data-source-create Create a data source that provides
job input receives job output.
data-source-delete Delete a data source.
data-source-list Print a list of available data...
data-source-show Show details of a data source.
job-binary-create Record a job binary.
job-binary-delete Delete a job binary.
job-binary-list Print a list of job binaries.
job-binary-show Show details of a job binary.
Command line interface overview
Clusters and jobs
$ sahara
...
<subcommand>
cluster-create Create a cluster.
cluster-delete Delete a cluster.
cluster-list Print a list of available clusters.
cluster-show Show details of a cluster.
job-create
job-delete Delete a job.
job-list Print a list of jobs.
job-show Show details of a job.
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
HDP Plugin Overview
• Full support for all Sahara Functionality
• Nova and Neutron network
• Cluster Scaling
• Scale Up
• Swift Integration
• Cinder Support
• Data Locality
• EDP
• Apache Ambari REST API’s used for cluster
provisioning
• Monitoring/Management of clusters via Ambari
• Full support for multiple HDP stacks
• HDP pre-installed or generic VM images
HDP 1.3.2
● NameNode
● Secondary NameNode
● DataNode
● HDFS
● ZooKeeper
● Ambari Server/Agent
● HCatalog
● Sqoop
● Job Tracker
● Task Tracker
● MapReduce
● Hive
● MySQL
● Pig
● WebHCat Server
● Oozie
● Ganglia
● Nagios
● HBase
HDP Plugin Stack Support
HDP 2.0.6
● History Server
● MapReduce 2 / YARN
● Resource Manager
● YARN Client
HDP 2.1
● Storm
● Falcon
C
om
ing
Soon!
A
vailable
A
vailable
HDP 2.1 +
● SOLR
● Cascading
R
oadm
ap
HDP Disk Images
• Disk Image Builder offers consistent approach for image creation
• HDP Plugin provides images and scripts for (CentOS, RHEL):
• Plain
• 1.3.2
• 2.0.6
• 2.1 (coming soon)
• Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre-
installed for accelerated provisioning, reduced network traffic
• Image Build Scripts allow images to be customized
• Security
• Custom Packages
• O/S Settings
Ambari Blueprints
• Two primary goals of Ambari Blueprints
• Ability to export a complete description of a
running cluster
• Provide API based cluster installations based on
a self- contained cluster description
• Blueprints contain cluster topology and configuration
information
• Enables Interesting use cases between physical and
virtual, including OpenStack/Sahara
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Juno roadmap
• Further integration with OpenStack ecosystem:
• Distributed architecture
• Guest agents
• EDP enhancements
• Merge dashboard to Horizon
To be discussed and confirmed at Design Summit
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Agenda
Q&A
Cluster and EDP workflows
Rarely
Infrequently
Occasionally
Commonly
Occasionally
Frequently

More Related Content

What's hot

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)Nicolas Poggi
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackMirantis
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Databricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng ShiDatabricks
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Spark Summit
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying SparkDatabricks
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeDataWorks Summit
 

What's hot (20)

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Savanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStackSavanna: Hadoop on OpenStack
Savanna: Hadoop on OpenStack
 
20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!Scale-Out Using Spark in Serverless Herd Mode!
Scale-Out Using Spark in Serverless Herd Mode!
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Operational Tips for Deploying Spark
Operational Tips for Deploying SparkOperational Tips for Deploying Spark
Operational Tips for Deploying Spark
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 

Similar to Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now and in Juno

Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineCsaba Toth
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsNamuk Park
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big pictureJ S Jodha
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaSAppsembler
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Edureka!
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 

Similar to Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now and in Juno (20)

Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Sahara Updates - Liberty Edition
Sahara Updates - Liberty EditionSahara Updates - Liberty Edition
Sahara Updates - Liberty Edition
 
Google Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App EngineGoogle Cloud Platform, Compute Engine, and App Engine
Google Cloud Platform, Compute Engine, and App Engine
 
Review of Calculation Paradigm and its Components
Review of Calculation Paradigm and its ComponentsReview of Calculation Paradigm and its Components
Review of Calculation Paradigm and its Components
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Django deployment with PaaS
Django deployment with PaaSDjango deployment with PaaS
Django deployment with PaaS
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 
Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)Hadoop Adminstration with Latest Release (2.0)
Hadoop Adminstration with Latest Release (2.0)
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 

More from Sergey Lukjanov

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStackSergey Lukjanov
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkSergey Lukjanov
 

More from Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Recently uploaded

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now and in Juno

  • 1. © MIRANTIS 2013 The State of OpenStack Data Processing: Sahara, Now and in Juno Sergey Lukjanov (Mirantis) Matthew Farrellee (Red Hat) John Speidel (Hortonworks)
  • 2. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 3. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  • 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  • 7. Use cases • Self-service provisioning of Hadoop clusters • Utilization of unused compute capacity for bursty workloads • Dev -> Stage -> Prod lifecycle • Run Hadoop workloads in few clicks without expertise in Hadoop ops
  • 9. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  • 11. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 17. Icehouse release Tempest helps us manage our API
  • 18. Icehouse release Sahara easily deployed with DevStack
  • 19. Icehouse release Hadoop 2 available via all plugins © http://hortonworks.com/hadoop/yarn/
  • 20. Icehouse release • HBase (and Sqoop) available via HDP plugin • Spark images w/ diskimage-builder (full plugin in review) • Heat for provisioning • i18n translation started • Neutron namespaces w/ rootwrap • Guest agent implementation started
  • 21. Elastic Data Processing (EDP) is Sahara’s take on data processing workflow management. Goal - let end users (those w/ high value questions to answer) get answers about data without having to know a single thing about cluster management. “Customers launch millions of Amazon EMR clusters every year.” http://aws.amazon.com/elasticmapreduce/ Elastic Data Processing update
  • 22. Elastic Data Processing update Available with the Hortonworks Data Platform plugin
  • 23. Elastic Data Processing update Support for external HDFS
  • 24. Elastic Data Processing update MapReduce.Streaming and Java actions
  • 25. Elastic Data Processing update Job relaunch, with new data and parameters
  • 26. Command line interface overview If you can do it with the Dashboard, you can do it from the command-line Blueprint: python-savannaclient-cli
  • 27. Command line interface overview Image management $ sahara ... Positional arguments: <subcommand> image-add-tag Add a tag to an image. image-list Print a list of available images. image-register Register an image from the Image index. image-remove-tag Remove a tag from an image. image-show Show details of an image. image-unregister Unregister an image.
  • 28. Command line interface overview Node group, cluster and job templates $ sahara node-group-template-create Create a node group... node-group-template-delete Delete a node group... node-group-template-list Print a list of available... node-group-template-show Show details of a node... cluster-template-create Create a cluster template. cluster-template-delete Delete a cluster template. cluster-template-list Print a list of available... cluster-template-show Show details of a cluster... job-template-create Create a job template. job-template-delete Delete a job template. job-template-list Print a list of job... job-template-show Show details of a job...
  • 29. Command line interface overview Data sources and job binaries $ sahara ... <subcommand> data-source-create Create a data source that provides job input receives job output. data-source-delete Delete a data source. data-source-list Print a list of available data... data-source-show Show details of a data source. job-binary-create Record a job binary. job-binary-delete Delete a job binary. job-binary-list Print a list of job binaries. job-binary-show Show details of a job binary.
  • 30. Command line interface overview Clusters and jobs $ sahara ... <subcommand> cluster-create Create a cluster. cluster-delete Delete a cluster. cluster-list Print a list of available clusters. cluster-show Show details of a cluster. job-create job-delete Delete a job. job-list Print a list of jobs. job-show Show details of a job.
  • 31. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 32. HDP Plugin Overview • Full support for all Sahara Functionality • Nova and Neutron network • Cluster Scaling • Scale Up • Swift Integration • Cinder Support • Data Locality • EDP • Apache Ambari REST API’s used for cluster provisioning • Monitoring/Management of clusters via Ambari • Full support for multiple HDP stacks • HDP pre-installed or generic VM images
  • 33. HDP 1.3.2 ● NameNode ● Secondary NameNode ● DataNode ● HDFS ● ZooKeeper ● Ambari Server/Agent ● HCatalog ● Sqoop ● Job Tracker ● Task Tracker ● MapReduce ● Hive ● MySQL ● Pig ● WebHCat Server ● Oozie ● Ganglia ● Nagios ● HBase HDP Plugin Stack Support HDP 2.0.6 ● History Server ● MapReduce 2 / YARN ● Resource Manager ● YARN Client HDP 2.1 ● Storm ● Falcon C om ing Soon! A vailable A vailable HDP 2.1 + ● SOLR ● Cascading R oadm ap
  • 34. HDP Disk Images • Disk Image Builder offers consistent approach for image creation • HDP Plugin provides images and scripts for (CentOS, RHEL): • Plain • 1.3.2 • 2.0.6 • 2.1 (coming soon) • Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre- installed for accelerated provisioning, reduced network traffic • Image Build Scripts allow images to be customized • Security • Custom Packages • O/S Settings
  • 35. Ambari Blueprints • Two primary goals of Ambari Blueprints • Ability to export a complete description of a running cluster • Provide API based cluster installations based on a self- contained cluster description • Blueprints contain cluster topology and configuration information • Enables Interesting use cases between physical and virtual, including OpenStack/Sahara
  • 36. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  • 37. Juno roadmap • Further integration with OpenStack ecosystem: • Distributed architecture • Guest agents • EDP enhancements • Merge dashboard to Horizon To be discussed and confirmed at Design Summit
  • 38. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  • 40. Cluster and EDP workflows Rarely Infrequently Occasionally Commonly Occasionally Frequently