Hadoop on OpenStack


Published on

Deploying Hadoop on Openstack

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop on OpenStack

  1. 1. Cloud Computing CourseFinal Project AssessmentGuided by Dr. Dinkar Sitaram
  2. 2. Problem StatementThe specifics of the problem include, Interoperability between Hadoop and OpenStack. Hadoop assumes that it has the direct control over resources.But when installed on OpenStack, the compute and storageresources of a Hadoop node may be distributed remotely overthe network.This introduces latency between the storage andthe compute components. Minimizing the data transfer over iSCSI.
  3. 3. Literature Survey Moving to the Cloud (Dr. Dinkar Sitaram et al.) http://www.hastexo.com/resources/docs/installing-openstack-essex-20121-ubuntu-1204-precise-pangolin http://devstack.org/guides/multinode-lab.html https://github.com/mseknibilel/OpenStack-Folsom-Install-guide OpenStack Compute Administration Manual(docs.openstack.org) StackGeek OpenStack Guide(http://www.stackgeek.com/blog/kordless/guides/gettingstarted.html) Hadoop Installation Guide (http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/)
  4. 4. Proposed Solution Description The solution consists of following stages Using MRLU / Simple (Max Resource, Least Usage) schedulingalgorithm for allocatingVMs. Disabling the option for Live Migration. Using OpenStack root-disk for creating HDFS. Using Swift service to store User input data and results. Writing Bootstrap scripts to setup the IP address and otherinitialization tasks.
  5. 5. Solution Description MRLUTheVMs spawned by Nova should be on the machine withmaximum resource and least utility. Live MigrationIn order to minimize the traffic via iSCSI, the solution demandsthat we disable the live migration ofVMs on OpenStack. Root DiskInstead of allocating Cinder storage for HDFS, we plan to useroot-disk located at /var/lib/nova/instances/ on the localmachine.This would impose that the HDFS is not connectedover iSCSI.
  6. 6. Solution Description SwiftTo provide flexibility and abstraction for the user to interactwith the service, we use Swift to store the user input. Hadoopuses this data to compute and store the results back on Swift. BootstrappingWe define a set of tasks that need to be performedbefore/after spawning theVMs. Some of these tasks includeassigning IP address to Hadoop nodes etc.This can be achievedby simple bootstrap scripts.
  7. 7. Overview of the Solution32 GB 32 GB 32 GB 32 GBVM VM VM VMMaster Slave Slave SlaveHDFS HDFS HDFS HDFSNovaControllerHorizonSwift10.10.10.32/27
  8. 8. Network Configuration of the setupNovaControllerNovaCompute 1NovaCompute 2PublicSwitchPrivateSwitchCollege NetworkRouter192.168.0.6610.10.10.5192.168.0.6710.10.10.9192.168.0.6510.10.10.6
  9. 9. Hadoop deployment on OpenStackNova Controller Nova Compute 1 Nova Compute 2Hadoop Master192.168.0.3310.10.10.34Hadoop Slave 1192.168.0.3410.10.10.35Hadoop Slave 2192.168.0.3610.10.10.36Hadoop Slave 3192.168.0.3510.10.10.37Hadoop Slave 4192.168.0.3810.10.10.38
  10. 10. Future Enhancements Explore Swift as the backend storage for HDFS. Bootstrap scripts to auto configure the Hadoop clusterusing snapshots of the images.
  11. 11. Team Members Akshay MS (1PI09IS010) Sandeep Raju P (1PI09CS081) Suhas Mohan (1PI09IS104) Vijesh M (1PI09CS119) Vivek P (1PI09IS119)