2. What we do
● Run an Indian Languages Search Engine
● Research
○ Information Extraction
○ Information Retrieval
○ Information Access
○ Virtualization and Cloud
● Users of
○ OpenStack
○ Hadoop
○ and lot of other FOSS
4. Before OpenStack
source: http://www.codeproject.com/KB/threads/hxgrid/image4.jpg
5. Problems
● Provisioning
○ Adhoc
○ Time consuming
○ Unmanaged
● User Management
○ No resource accounting
○ Access Control
○ Usage Restriction
● Storage
○ Data reliability
○ Duplication
6. More Problems...
● Cluster
○ Terrible Resource Utilization
○ New deployment => Too much time
○ Data Redundancy
○ Non-optimal deployments
● Academic
○ No cloud platform for experimentation
○ Large Scale sandboxed resource provisioning for
students.
10. User Management
● Resource restrictions using Quota
● Project based collaboration and private
resources
● Usage monitoring
11. Storage
This wasn't easy. We experimented with
● nova-volume
● Swift(diablo)
● GlusterFS
● Swift(Folsom)(current)
12. Storage
● Hadoop compatible distributed storage
● Glance image store
● Desktop backup utility using CloudFuse
● Data reliability
● No more Data Fragmentation
13. OpenStack in Academia
● Research
○ Inter cloud migration
○ Inter cloud scheduling
○ Performance Evaluation
● Resource provisioning for course
assignments and projects.
○ 3 courses
○ 350+ students
○ 20+ projects
14. HadoopStack
● Big Data processing on Demand
● Entire ecosystem for Big Data - Hadoop
Family, Spark, Mahout, R
● Multi-Cloud - OpenStack and AWS.