Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Savanna: Hadoop on OpenStack


Published on

More details about the project and its current state could be found there:

Published in: Technology
  • Be the first to comment

Savanna: Hadoop on OpenStack

  1. 1. Savanna - Hadoop on OpenStackIlya Elterman Mirantis, 2013Dmitry Mescheryakov
  2. 2. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  3. 3. Savanna - Elastic Hadoop on OpenStackGoal is to create native OpenStack component toprovision and operate Hadoop clusters on top ofOpenStack. Key characteristics:● Open source● Native for OpenStack● Support for different Hadoop distributions● Solves both bare cluster provisioning use case and "analytics as a service"
  4. 4. Savanna Architecture Principles● Designed as an OpenStack component● Managed through REST API with UI available as part of Horizon● Pluggable system of Hadoop installation engines● Integration with Hadoop vendor specific management tools● Predefined templates of Hadoop configurations with ability to modify parameters
  5. 5. Use Cases● Administrators - centralized cluster management and monitoring● Dev and QA teams - fast clusters provisioning● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood● Making resources dedicated to IaaS cloud available for Hadoop workload
  6. 6. Administrators Use Case● Central point of control over infrastructure● Enables self-service capabilities, including choice of Hadoop distribution to be used● Integration with vendor tooling ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console● Utilization of free IaaS capacity for Hadoop tasks
  7. 7. Dev and QA Use Cases● Fast on-demand provisioning of the environments● Increase agility and speed of innovation● Controlled access to data from production
  8. 8. Analytics Use Cases● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive)● Bursty workload: ad-hoc queries requiring a significant resource only for short time period● Utilization of free IaaS capacity for Hadoop tasks
  9. 9. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  10. 10. Roadmap for Hadoop in CloudPhase 1Basic cluster provisioningPhase 2Cluster operation support and integration with toolingPhase 3"Analytics as a service": job execution framework, supportdifferent scripting languages
  11. 11. Phase 1 - Basic Cluster Operation● Cluster provisioning● Deployment Engine implementation for pre- installed images● Templates for Hadoop cluster configuration● REST API for cluster startup and operations● UI integrated into Horizon
  12. 12. Phase 1 - Current Status● All code and documentation open sourced● Phase 1 completed, v 0.1 released on 04/10● Launchpad home page ○● Code on stackforge ○ Integrated with OpenStack CI/CD ○● New contributors: RedHat and Hortonworks
  13. 13. Phase 2 - Advanced Configuration● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters● Integration with vendor deployment/management tooling● Basic monitoring support
  14. 14. Phase 3 - Analytics as a Service● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR)● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  15. 15. Further Roadmap● Autoscaling● HBase support● HA for NameNode● HDFS and Swift integration ○ Caching of Swift data on HDFS● Mahout as a service● Integration with logging and error handling
  16. 16. How to Contribute● Download and install Savanna● Provide feedback and report bugs● Share more ideas via IRC sessions or mailing listMore details:
  17. 17. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  18. 18. Agenda● Savanna Overview● Roadmap● Phase 1 Live Demo● Phase 2 Features and Architecture
  19. 19. Architecture Overview Hadoop Hadoop VM VM Keystone Hadoop HadoopHorizon VM VMSavanna Pages Auth Swift REST APISavanna Cluster Provisioning Python Configuration Plugin Client Manager Nova VM Manager DAL Glance Image Registry
  20. 20. Extensible Provisioning Image registry - register image in S Plugin Savanna a ● get extra configs - add/remove tags - get image by tag v ● validate input a ● launch/terminate n cluster VM manager n ● add/remove nodes - launch/terminate VMs a - get VM status - ssh/scp to VM
  21. 21. Provisioning Interaction get extra parameters for the plugin get extra parameters S launch cluster a validate cluster parameters PU v ls ue a launch cluster g launch clusterr n i n n add/remove nodes a add/remove nodes add/remove nodes
  22. 22. Provisioning: Launching a Cluster get image by tag Image RegistryPlu launch VMsg launch VMsi Hadoop VM Hadoop VM VM passn install and Manager commands Hadoop Hadoop configure via ssh, scp VM VM Hadoop
  23. 23. Q&A
  24. 24. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  25. 25. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  26. 26. HDFS Reliability: the issue Data Block DN DN DN DN DN DN Compute Compute
  27. 27. HDFS Reliability: single DN per host Compute Compute Compute DN TT | DN DN DN Cluster A Cluster B
  28. 28. HDFS Reliability: Hadoop-8468hypervisor-awareness for HDFS scheduler Compute Compute Compute DN DN DN DN DN DN HDFS Data Block
  29. 29. HDFS Reliability: Hadoop-8545enables Swift for Hadoop t Hadoop al i npu initi Job #1 Hadoop Swift HDFS Job #2 fin al ou tp ut ... Hadoop Job #N
  30. 30. HDFS Placement Options● Ephemeral drive/var/lib/nova/instances/instance-xxx/disk ->/mnt/ephemeral● Block storage volumeCinder Volume -> /mnt/volume● Bare drive support/dev/sdb -> /mnt/sdb
  31. 31. Configurable topology of DN, NN, TT, JT● Master node(s) JT | NN JT + NN● Worker nodes 10 6 8 TT | DN TT DN