What we do● Run an Indian Languages Search Engine● Research ○ Information Extraction ○ Information Retrieval ○ Information Access ○ Virtualization and Cloud● Users of ○ OpenStack ○ Hadoop ○ and lot of other FOSS
Before OpenStack source: http://www.codeproject.com/KB/threads/hxgrid/image4.jpg
Problems● Provisioning ○ Adhoc ○ Time consuming ○ Unmanaged● User Management ○ No resource accounting ○ Access Control ○ Usage Restriction● Storage ○ Data reliability ○ Duplication
More Problems...● Cluster ○ Terrible Resource Utilization ○ New deployment => Too much time ○ Data Redundancy ○ Non-optimal deployments● Academic ○ No cloud platform for experimentation ○ Large Scale sandboxed resource provisioning for students.
Provisioning● Pre-configured images to quickly get started.● VM of any capacity available at any time( 2 a.m. Sunday morning)● Snapshots
User Management● Resource restrictions using Quota● Project based collaboration and private resources● Usage monitoring
StorageThis wasnt easy. We experimented with● nova-volume● Swift(diablo)● GlusterFS● Swift(Folsom)(current)
Storage● Hadoop compatible distributed storage● Glance image store● Desktop backup utility using CloudFuse● Data reliability● No more Data Fragmentation
OpenStack in Academia● Research ○ Inter cloud migration ○ Inter cloud scheduling ○ Performance Evaluation● Resource provisioning for course assignments and projects. ○ 3 courses ○ 350+ students ○ 20+ projects
HadoopStack● Big Data processing on Demand● Entire ecosystem for Big Data - Hadoop Family, Spark, Mahout, R● Multi-Cloud - OpenStack and AWS.
Conclusion● Using OpenStack● Working with and around OpenStack● OpenStack is Awesome !!