Your SlideShare is downloading. ×
0
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Savanna - Elastic Hadoop on OpenStack

2,218

Published on

Slide deck for the talk at local meetup.

Slide deck for the talk at local meetup.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,218
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
116
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Savanna -Hadoop onOpenStackMirantis, 2013Sergey LukjanovSavanna Technical Lead
  • 2. ● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. VirtualizationAgenda
  • 3. ● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. VirtualizationAgenda
  • 4. ● Open source native OpenStack component● Supports different Hadoop distributions● Solves both bare cluster provisioning use caseand "analytics as a service"● Managed through REST API● Web UI as part of the OpenStack Dashboard● Flexible templates of Hadoop configurationsSavanna - Elastic Hadoop on OpenStack
  • 5. ● Project home - https://launchpad.net/savanna○ bug tracking○ blueprints○ answers● Code review (gerrit) - https://review.openstack.org● Sources - https://github.com/stackforge/savanna● Mailing list - savanna-all@lists.launchpad.net● CI - https://jenkins.openstack.org andhttp://jenkins.savanna.mirantis.comSavanna - Elastic Hadoop on OpenStack
  • 6. ● Contributors:○ large core team from Mirantis○ teams from RedHat, Hortonworks○ several minor contributors● Intel joined recently● Several upcoming customersSavanna - Participants
  • 7. ● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. VirtualizationAgenda
  • 8. ● Administrators - centralized cluster managementand monitoring● Dev and QA teams - fast clusters provisioning● Data Scientists/Analysts - API to run the analyticjobs with infrastructure provisioning happeningunder the hood● Making resources dedicated to IaaS cloudavailable for Hadoop workloadSavanna Use Cases
  • 9. ● Central point of control over infrastructure● Enables self-service capabilities, including choiceof Hadoop distribution to be used● Integration with vendor tooling:○ Ambari for Apache/HortonWorks○ Cloudera Management Console○ Intel Hadoop● Utilization of free IaaS capacity for Hadoop tasksAdministrators Use Case
  • 10. ● Fast on-demand provisioning of theenvironments● Increase agility and speed of innovation● Controlled access to data from productionDev and QA Use Cases
  • 11. ● Simplified tasks execution - complexity ofprovisioning and managing cluster hidden underthe hood○ Access to higher level interfaces (e.g. pig, hive)● Bursty workload: ad-hoc queries requiring asignificant resource only for short time period● Utilization of free IaaS capacity for Hadoop tasksAnalytics Use Cases
  • 12. ● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. VirtualizationAgenda
  • 13. Roadmap for Hadoop in CloudPhase 1Basic cluster provisioning of Apache HadoopPhase 2Cluster operation support and integration with tooling,advanced configuration (HDFS, Swift, etc.)Phase 3"Analytics as a service": job execution framework, supportdifferent scripting languages, deeper integration with OS
  • 14. Phase 1 - Basic Cluster Operation● Cluster provisioning● Deployment Engine implementation for pre-installed images● Templates for Hadoop cluster configuration● REST API for cluster startup and operations● Web UI integrated into OpenStack Dashboard
  • 15. Roadmap for Hadoop in CloudPhase 1 [Released - April, 10]Basic cluster provisioning of Apache HadoopPhase 2Cluster operation support and integration with tooling,advanced configuration (HDFS, Swift, etc.)Phase 3"Analytics as a service": job execution framework, supportdifferent scripting languages, deeper integration with OS
  • 16. Phase 2 - Advanced Configuration● Hadoop cluster configuration support:○ Solutions for HDFS data reliability issue○ Configurable DN storage location○ Configurable topology of DN, NN, TT, JT○ Add/remove nodes○ More Hadoop parameters● Integration with vendordeployment/management tooling● Basic monitoring support
  • 17. Roadmap for Hadoop in CloudPhase 1 [Released - April, 10]Basic cluster provisioning of Apache HadoopPhase 2 [In progress - July 15]Cluster operation support and integration with tooling,advanced configuration (HDFS, Swift, etc.)Phase 3"Analytics as a service": job execution framework, supportdifferent scripting languages, deeper integration with OS
  • 18. Phase 3 - Analytics as a Service● API to execute Map/Reduce jobs withoutexposing details of underlying infrastructure(similar to AWS EMR)● User-friendly UI for ad-hoc analytics queriesbased on Hive or Pig
  • 19. Roadmap for Hadoop in CloudPhase 1 [Released - April, 10]Basic cluster provisioning of Apache HadoopPhase 2 [In progress - July 15]Cluster operation support and integration with tooling,advanced configuration (HDFS, Swift, etc.)Phase 3 [Planned - October 15]"Analytics as a service": job execution framework, supportdifferent scripting languages, deeper integration with OS
  • 20. Further Roadmap● Autoscaling● HA for NameNode● Deeper HDFS and Swift integration○ Caching of Swift data on HDFS● Integration with logging and error handling● HBase support
  • 21. ● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. VirtualizationAgenda
  • 22. Architecture OverviewSavannaPythonClientRESTAPIClusterConfigurationManagerHorizonKeystoneAuthDALNovaGlanceSwiftSavannaPagesHadoopVMProvisioningPluginHadoopVMHadoopVMHadoopVMInstanceInterop HelperImageRegistry
  • 23. ● HDFS Reliability● Data Persistence● I/O Performance● etc.Hadoop vs. Virtualization
  • 24. ● HDFS Reliability● Data Persistence● I/O Performance● etc.Hadoop vs. Virtualization
  • 25. ● HDFS Reliability● Data Persistence● I/O Performance● etc.Hadoop vs. Virtualization
  • 26. ● HDFS Reliability● Data Persistence● I/O Performance● etc.Hadoop vs. Virtualization
  • 27. HDFS Reliability: the issueComputeDN DNDNDN DNDNData BlockCompute
  • 28. HDFS Reliability: the issueComputeDN DNDNDN DNDNData BlockCompute
  • 29. HDFS Reliability: the issueComputeDN DNDNDN DNDNData BlockCompute
  • 30. HDFS Reliability: single DN per hostDNComputeTT | DNComputeDNComputeDNCluster A Cluster B
  • 31. HDFS Reliability: Hadoop-8468hypervisor-awareness for HDFS schedulerDNComputeDN DNComputeDN DNComputeDNHDFSData Block
  • 32. HDFS Reliability: Hadoop-8545enables Swift for HadoopSwiftHadoopJob #1HDFSHadoopJob #2...HadoopJob #Ninitial inputfinal output
  • 33. ● Master node(s)● Worker nodesConfigurable topology of DN, NN, TT, JTJT | NN JT NN+TTTT | DN DN10 6 8
  • 34. HDFS Placement Options● Ephemeral drive/var/lib/nova/instances/instance-xxx/disk ->/mnt/ephemeral● Block storage volumeCinder Volume -> /mnt/volume● Bare hard drive support/dev/sdb -> /mnt/sdb
  • 35. Q&A
  • 36. We are hiring!
  • 37. Phase 1 deployment mechanismHadoopVMHadoopVMHadoopVMHadoopVMSavannaProvision VMs withpre-installed HadoopConfigure HadoopCluster
  • 38. Tool usage scenariosHadoopVMHadoopVMHadoopVMHadoopVMToolManage Hadoop ClusterVMVMVM VMToolProvision &Manage Hadoop ClusterScenario IScenario II
  • 39. Extensible Provisioning● get extra configs● validate input● launch/terminate cluster● add/remove nodes● launch/terminate VMs● get VM status● ssh/scp to VMInstance Interop● register image inSavanna● add/remove tags● get image by tagImage registryPluginSavanna
  • 40. get extra parametersadd/remove nodesProvisioning Interactionlaunch clusterlaunch clusterget extra parametersfor the pluginSavannaUserPluginvalidate clusterparametersadd/remove nodeslaunch clusteradd/remove nodes
  • 41. Provisioning: Launching a Clusterlaunch VMsPLUGINImageRegistryInstanceInteropHelperget imageby taglaunch VMsinstall andconfigureHadoopHadoopVMHadoopVMHadoopVMHadoopVMpasscommandsvia ssh, scp
  • 42. Q&A
  • 43. We are hiring!

×