3. Painpoint
Wastage of idle resources
Manual scaling up and scaling down
No flexible framework for scaling
4. Objective
Rapidly provision Hadoop Clusters on
OpenStack by automating provisioning and
configuring of the machines
To provide a framework for scaling up and
scaling down the hadoop cluster
6. Hadoop
Hadoop is a framework that allows for the
distributed processing of large data sets across
clusters of computers using simple
programming model.
13. Distributed Systems Analytics
Ganglia Monitoring System
Ganglia is a scalable distributed system used to monitoring for
high performance systems such as clusters and grids.
Ceilometer
Ceilometer is a native software of Openstack that is used to
measure usage of system resources by client’s instances to
make billing simpler.
Ambari
The Apache Project aimed at making Hadoop management
simpler for developing software for provisioning, managing and
monitoring Hadoop.
14. Where are we now?
Auto clustering scripts
Scaling up scripts
Ganglia collecting metrics
Setup multinode Devstack and Openstack
Production environments(Grizzly)
15. What’s ahead?
GUI frontend for auto scaling
Support for Custom Hadoop Clusters
Abstraction of the cluster provisioning
framework to deploy any sort of cluster,
not just Hadoop
To allow easy integration with multiple
frameworks for enhanced monitoring