Apache WhirrOn demand clusters in the cloud Andrei Savu / @andreisavu / email@example.com TechTuesday, Bucharest @ Adobe
Overview● What is Apache Whirr?● How can I use Whirr?● Typical Cluster Config● Whats next?● Using Whirr for Fault Injection Testing● Test Cycle● Resources
What is Apache Whirr?● A set of libraries for running cloud services● Cloud-neutral● Common service API● Provides smart defaults● “The code that would become Whirr started out in 2007 as some bash scripts in Apache Hadoop for running Hadoop clusters on EC2.” http://incubator.apache.org/whirr/
How can I use Whirr?● Deploy clusters on demand for processing or for testing. Ideal if you are building applications on top of components of the Hadoop stack.● Supported services: Cassandra, Hadoop, Hbase, ZooKeeper (0.3.0)● Cloud providers: EC2, RackspaceCloud (using jclouds)
More Recipes● Check the recipes folder in the release● Contains recipes for all the supported services and plenty of comments.
Whats next?● Support for private clouds: Eucalyptus or OpenStack● New services: Flume, Kafka, MongoDB● Many improvements and bug fixes● Integration with Hudson CI for Hadoop and Hbase (running YCSB)
Fault Injection Testing● Discover bugs in existing systems by simulating generally faulty hardware and networking● Inject on small test cluster and if it can make progress without corruption or unrecoverable errors it will also be free of errors on large clusters where error occur by natural causes.
Test Cycle● Setup: use Apache Whirr to bring a cluster up● Inject: faults based on a scenario● Monitor: continuously – collect data for diagnostics on failure● This is work in progress (M.Sc. research)
Resources● http://incubator.apache.org/whirr/● Deploy Hbase in minutes:● philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes● Deploy Cassandra in minutes:● philwhln.com/quickly-launch-a-cassandra-cluster-on-amazon-ec2