The State of OpenStack
Data Processing: Sahara,
Now and in Juno
Sergey Lukjanov (Mirantis)
Matthew Farrellee (Red Hat)
Joh...
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interface...
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Use cases
• Self-service provisioning of Hadoop clusters
• Utilization of unused compute capacity for
bursty workloads
• D...
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
...
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Da...
Contributors
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Icehouse release
142 bugs fixed
Icehouse release
57 blueprints
Icehouse release
32 people
Icehouse release
Standard process
Icehouse release
Dozens more
in the client!
Icehouse release
Tempest helps us manage our API
Icehouse release
Sahara easily deployed with DevStack
Icehouse release
Hadoop 2 available via all plugins
© http://hortonworks.com/hadoop/yarn/
Icehouse release
• HBase (and Sqoop) available via HDP plugin
• Spark images w/ diskimage-builder (full plugin in review)
...
Elastic Data Processing (EDP) is Sahara’s take on
data processing workflow management.
Goal - let end users (those w/ high...
Elastic Data Processing update
Available with the Hortonworks Data
Platform plugin
Elastic Data Processing update
Support for
external HDFS
Elastic Data Processing update
MapReduce.Streaming
and Java actions
Elastic Data Processing update
Job relaunch, with new data and parameters
Command line interface overview
If you can do it with the Dashboard, you
can do it from the command-line
Blueprint: python...
Command line interface overview
Image management
$ sahara
...
Positional arguments:
<subcommand>
image-add-tag Add a tag t...
Command line interface overview
Node group, cluster and job templates
$ sahara
node-group-template-create Create a node gr...
Command line interface overview
Data sources and job binaries
$ sahara
...
<subcommand>
data-source-create Create a data s...
Command line interface overview
Clusters and jobs
$ sahara
...
<subcommand>
cluster-create Create a cluster.
cluster-delet...
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
HDP Plugin Overview
• Full support for all Sahara Functionality
• Nova and Neutron network
• Cluster Scaling
• Scale Up
• ...
HDP 1.3.2
● NameNode
● Secondary NameNode
● DataNode
● HDFS
● ZooKeeper
● Ambari Server/Agent
● HCatalog
● Sqoop
● Job Tra...
HDP Disk Images
• Disk Image Builder offers consistent approach for image creation
• HDP Plugin provides images and script...
Ambari Blueprints
• Two primary goals of Ambari Blueprints
• Ability to export a complete description of a
running cluster...
Agenda
• Sahara overview
• Icehouse release
• HDP plugin updates
• Juno plans
Juno roadmap
• Further integration with OpenStack ecosystem:
• Distributed architecture
• Guest agents
• EDP enhancements
...
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Agenda
Q&A
Cluster and EDP workflows
Rarely
Infrequently
Occasionally
Commonly
Occasionally
Frequently
Upcoming SlideShare
Loading in …5
×

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

1,787 views

Published on

Update on Sahara as of the OpenStack Icehouse release

Published in: Software, Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,787
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
84
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

  1. 1. The State of OpenStack Data Processing: Sahara, Now and in Juno Sergey Lukjanov (Mirantis) Matthew Farrellee (Red Hat) John Speidel (Hortonworks)
  2. 2. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  3. 3. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  4. 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  5. 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  6. 6. Trends http://www.google.com/trends/
  7. 7. Use cases • Self-service provisioning of Hadoop clusters • Utilization of unused compute capacity for bursty workloads • Dev -> Stage -> Prod lifecycle • Run Hadoop workloads in few clicks without expertise in Hadoop ops
  8. 8. Architecture overview Data Sources Savanna Python Client RESTAPI Cluster Configuration Manager Horizon Keystone Auth Data Access Layer Swift Savanna Pages Hadoop VM Vendors Plugins Hadoop VM Hadoop VM Hadoop VM Resources Orchestration Manager Job Sources Job Manager Heat Nova Glance Cinder Neutron Trove DB
  9. 9. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  10. 10. Contributors
  11. 11. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  12. 12. Icehouse release 142 bugs fixed
  13. 13. Icehouse release 57 blueprints
  14. 14. Icehouse release 32 people
  15. 15. Icehouse release Standard process
  16. 16. Icehouse release Dozens more in the client!
  17. 17. Icehouse release Tempest helps us manage our API
  18. 18. Icehouse release Sahara easily deployed with DevStack
  19. 19. Icehouse release Hadoop 2 available via all plugins © http://hortonworks.com/hadoop/yarn/
  20. 20. Icehouse release • HBase (and Sqoop) available via HDP plugin • Spark images w/ diskimage-builder (full plugin in review) • Heat for provisioning • i18n translation started • Neutron namespaces w/ rootwrap • Guest agent implementation started
  21. 21. Elastic Data Processing (EDP) is Sahara’s take on data processing workflow management. Goal - let end users (those w/ high value questions to answer) get answers about data without having to know a single thing about cluster management. “Customers launch millions of Amazon EMR clusters every year.” http://aws.amazon.com/elasticmapreduce/ Elastic Data Processing update
  22. 22. Elastic Data Processing update Available with the Hortonworks Data Platform plugin
  23. 23. Elastic Data Processing update Support for external HDFS
  24. 24. Elastic Data Processing update MapReduce.Streaming and Java actions
  25. 25. Elastic Data Processing update Job relaunch, with new data and parameters
  26. 26. Command line interface overview If you can do it with the Dashboard, you can do it from the command-line Blueprint: python-savannaclient-cli
  27. 27. Command line interface overview Image management $ sahara ... Positional arguments: <subcommand> image-add-tag Add a tag to an image. image-list Print a list of available images. image-register Register an image from the Image index. image-remove-tag Remove a tag from an image. image-show Show details of an image. image-unregister Unregister an image.
  28. 28. Command line interface overview Node group, cluster and job templates $ sahara node-group-template-create Create a node group... node-group-template-delete Delete a node group... node-group-template-list Print a list of available... node-group-template-show Show details of a node... cluster-template-create Create a cluster template. cluster-template-delete Delete a cluster template. cluster-template-list Print a list of available... cluster-template-show Show details of a cluster... job-template-create Create a job template. job-template-delete Delete a job template. job-template-list Print a list of job... job-template-show Show details of a job...
  29. 29. Command line interface overview Data sources and job binaries $ sahara ... <subcommand> data-source-create Create a data source that provides job input receives job output. data-source-delete Delete a data source. data-source-list Print a list of available data... data-source-show Show details of a data source. job-binary-create Record a job binary. job-binary-delete Delete a job binary. job-binary-list Print a list of job binaries. job-binary-show Show details of a job binary.
  30. 30. Command line interface overview Clusters and jobs $ sahara ... <subcommand> cluster-create Create a cluster. cluster-delete Delete a cluster. cluster-list Print a list of available clusters. cluster-show Show details of a cluster. job-create job-delete Delete a job. job-list Print a list of jobs. job-show Show details of a job.
  31. 31. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  32. 32. HDP Plugin Overview • Full support for all Sahara Functionality • Nova and Neutron network • Cluster Scaling • Scale Up • Swift Integration • Cinder Support • Data Locality • EDP • Apache Ambari REST API’s used for cluster provisioning • Monitoring/Management of clusters via Ambari • Full support for multiple HDP stacks • HDP pre-installed or generic VM images
  33. 33. HDP 1.3.2 ● NameNode ● Secondary NameNode ● DataNode ● HDFS ● ZooKeeper ● Ambari Server/Agent ● HCatalog ● Sqoop ● Job Tracker ● Task Tracker ● MapReduce ● Hive ● MySQL ● Pig ● WebHCat Server ● Oozie ● Ganglia ● Nagios ● HBase HDP Plugin Stack Support HDP 2.0.6 ● History Server ● MapReduce 2 / YARN ● Resource Manager ● YARN Client HDP 2.1 ● Storm ● Falcon C om ing Soon! A vailable A vailable HDP 2.1 + ● SOLR ● Cascading R oadm ap
  34. 34. HDP Disk Images • Disk Image Builder offers consistent approach for image creation • HDP Plugin provides images and scripts for (CentOS, RHEL): • Plain • 1.3.2 • 2.0.6 • 2.1 (coming soon) • Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre- installed for accelerated provisioning, reduced network traffic • Image Build Scripts allow images to be customized • Security • Custom Packages • O/S Settings
  35. 35. Ambari Blueprints • Two primary goals of Ambari Blueprints • Ability to export a complete description of a running cluster • Provide API based cluster installations based on a self- contained cluster description • Blueprints contain cluster topology and configuration information • Enables Interesting use cases between physical and virtual, including OpenStack/Sahara
  36. 36. Agenda • Sahara overview • Icehouse release • HDP plugin updates • Juno plans
  37. 37. Juno roadmap • Further integration with OpenStack ecosystem: • Distributed architecture • Guest agents • EDP enhancements • Merge dashboard to Horizon To be discussed and confirmed at Design Summit
  38. 38. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  39. 39. Agenda Q&A
  40. 40. Cluster and EDP workflows Rarely Infrequently Occasionally Commonly Occasionally Frequently

×