Savanna -  Hadoop on  OpenStackIlya Elterman         Mirantis, 2013Dmitry Mescheryakov
Agenda●   Savanna Overview●   Roadmap●   Phase 1 Live Demo●   Phase 2 Features and Architecture
Savanna - Elastic Hadoop on OpenStackGoal is to create native OpenStack component toprovision and operate Hadoop clusters ...
Savanna Architecture Principles● Designed as an OpenStack component● Managed through REST API with UI available as  part o...
Use Cases● Administrators - centralized cluster management  and monitoring● Dev and QA teams - fast clusters provisioning●...
Administrators Use Case● Central point of control over infrastructure● Enables self-service capabilities, including choice...
Dev and QA Use Cases● Fast on-demand provisioning of the  environments● Increase agility and speed of innovation● Controll...
Analytics Use Cases● Simplified tasks execution - complexity of  provisioning and managing cluster hidden under  the hood ...
Agenda●   Savanna Overview●   Roadmap●   Phase 1 Live Demo●   Phase 2 Features and Architecture
Roadmap for Hadoop in CloudPhase 1Basic cluster provisioningPhase 2Cluster operation support and integration with toolingP...
Phase 1 - Basic Cluster Operation● Cluster provisioning● Deployment Engine implementation for pre-  installed images● Temp...
Phase 1 - Current Status● All code and documentation open sourced● Phase 1 completed, v 0.1 released on 04/10● Launchpad h...
Phase 2 - Advanced Configuration● Hadoop cluster configuration support:  ○ Solutions for HDFS data reliability issue  ○ Co...
Phase 3 - Analytics as a Service● API to execute Map/Reduce jobs without  exposing details of underlying infrastructure  (...
Further Roadmap● Autoscaling● HBase support● HA for NameNode● HDFS and Swift integration  ○ Caching of Swift data on HDFS●...
How to Contribute● Download and install Savanna● Provide feedback and report bugs● Share more ideas via IRC sessions or ma...
Agenda●   Savanna Overview●   Roadmap●   Phase 1 Live Demo●   Phase 2 Features and Architecture
Agenda●   Savanna Overview●   Roadmap●   Phase 1 Live Demo●   Phase 2 Features and Architecture
Architecture Overview                                                     Hadoop          Hadoop                          ...
Extensible Provisioning                                      Image registry                                  - register im...
Provisioning Interaction    get extra parameters        for the plugin                               get extra parameters ...
Provisioning: Launching a Cluster    get image by tag                        Image                       RegistryPlu     l...
Q&A
HDFS Reliability: the issue        Data Block        DN       DN           DN      DN        DN                    DN     ...
HDFS Reliability: the issue        Data Block        DN       DN           DN      DN        DN                    DN     ...
HDFS Reliability: the issue        Data Block        DN       DN           DN      DN        DN                    DN     ...
HDFS Reliability: single DN per host Compute          Compute              Compute           DN          TT | DN          ...
HDFS Reliability: Hadoop-8468hypervisor-awareness for HDFS scheduler Compute         Compute         Compute   DN      DN ...
HDFS Reliability: Hadoop-8545enables Swift for Hadoop                             t   Hadoop                    al i npu  ...
HDFS Placement Options● Ephemeral drive/var/lib/nova/instances/instance-xxx/disk ->/mnt/ephemeral● Block storage volumeCin...
Configurable topology of DN, NN, TT, JT● Master node(s)           JT | NN       JT   + NN● Worker nodes             10    ...
Upcoming SlideShare
Loading in...5
×

Savanna: Hadoop on OpenStack

12,342

Published on

More details about the project and its current state could be found there:
http://savanna.readthedocs.org

Published in: Technology
0 Comments
33 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,342
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
486
Comments
0
Likes
33
Embeds 0
No embeds

No notes for slide

Savanna: Hadoop on OpenStack

  1. 1. Savanna - Hadoop on OpenStackIlya Elterman Mirantis, 2013Dmitry Mescheryakov
  2. 2. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  3. 3. Savanna - Elastic Hadoop on OpenStackGoal is to create native OpenStack component toprovision and operate Hadoop clusters on top ofOpenStack. Key characteristics:● Open source● Native for OpenStack● Support for different Hadoop distributions● Solves both bare cluster provisioning use case and "analytics as a service"
  4. 4. Savanna Architecture Principles● Designed as an OpenStack component● Managed through REST API with UI available as part of Horizon● Pluggable system of Hadoop installation engines● Integration with Hadoop vendor specific management tools● Predefined templates of Hadoop configurations with ability to modify parameters
  5. 5. Use Cases● Administrators - centralized cluster management and monitoring● Dev and QA teams - fast clusters provisioning● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood● Making resources dedicated to IaaS cloud available for Hadoop workload
  6. 6. Administrators Use Case● Central point of control over infrastructure● Enables self-service capabilities, including choice of Hadoop distribution to be used● Integration with vendor tooling ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console● Utilization of free IaaS capacity for Hadoop tasks
  7. 7. Dev and QA Use Cases● Fast on-demand provisioning of the environments● Increase agility and speed of innovation● Controlled access to data from production
  8. 8. Analytics Use Cases● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive)● Bursty workload: ad-hoc queries requiring a significant resource only for short time period● Utilization of free IaaS capacity for Hadoop tasks
  9. 9. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  10. 10. Roadmap for Hadoop in CloudPhase 1Basic cluster provisioningPhase 2Cluster operation support and integration with toolingPhase 3"Analytics as a service": job execution framework, supportdifferent scripting languages
  11. 11. Phase 1 - Basic Cluster Operation● Cluster provisioning● Deployment Engine implementation for pre- installed images● Templates for Hadoop cluster configuration● REST API for cluster startup and operations● UI integrated into Horizon
  12. 12. Phase 1 - Current Status● All code and documentation open sourced● Phase 1 completed, v 0.1 released on 04/10● Launchpad home page ○ https://launchpad.net/savanna● Code on stackforge ○ Integrated with OpenStack CI/CD ○ https://github.com/stackforge/savanna● New contributors: RedHat and Hortonworks
  13. 13. Phase 2 - Advanced Configuration● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters● Integration with vendor deployment/management tooling● Basic monitoring support
  14. 14. Phase 3 - Analytics as a Service● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR)● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  15. 15. Further Roadmap● Autoscaling● HBase support● HA for NameNode● HDFS and Swift integration ○ Caching of Swift data on HDFS● Mahout as a service● Integration with logging and error handling
  16. 16. How to Contribute● Download and install Savanna● Provide feedback and report bugs● Share more ideas via IRC sessions or mailing listMore details:https://wiki.openstack.org/wiki/Savanna/HowToParticipate
  17. 17. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  18. 18. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  19. 19. Architecture Overview Hadoop Hadoop VM VM Keystone Hadoop HadoopHorizon VM VMSavanna Pages Auth Swift REST APISavanna Cluster Provisioning Python Configuration Plugin Client Manager Nova VM Manager DAL Glance Image Registry
  20. 20. Extensible Provisioning Image registry - register image in S Plugin Savanna a ● get extra configs - add/remove tags - get image by tag v ● validate input a ● launch/terminate n cluster VM manager n ● add/remove nodes - launch/terminate VMs a - get VM status - ssh/scp to VM
  21. 21. Provisioning Interaction get extra parameters for the plugin get extra parameters S launch cluster a validate cluster parameters PU v ls ue a launch cluster g launch clusterr n i n n add/remove nodes a add/remove nodes add/remove nodes
  22. 22. Provisioning: Launching a Cluster get image by tag Image RegistryPlu launch VMsg launch VMsi Hadoop VM Hadoop VM VM passn install and Manager commands Hadoop Hadoop configure via ssh, scp VM VM Hadoop
  23. 23. Q&A
  24. 24. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  25. 25. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  26. 26. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  27. 27. HDFS Reliability: single DN per host Compute Compute Compute DN TT | DN DN DN Cluster A Cluster B
  28. 28. HDFS Reliability: Hadoop-8468hypervisor-awareness for HDFS scheduler Compute Compute Compute DN DN DN DN DN DN HDFS Data Block
  29. 29. HDFS Reliability: Hadoop-8545enables Swift for Hadoop t Hadoop al i npu initi Job #1 Hadoop Swift HDFS Job #2 fin al ou tp ut ... Hadoop Job #N
  30. 30. HDFS Placement Options● Ephemeral drive/var/lib/nova/instances/instance-xxx/disk ->/mnt/ephemeral● Block storage volumeCinder Volume -> /mnt/volume● Bare drive support/dev/sdb -> /mnt/sdb
  31. 31. Configurable topology of DN, NN, TT, JT● Master node(s) JT | NN JT + NN● Worker nodes 10 6 8 TT | DN TT DN
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×