Apache Whirr
On demand clusters in the cloud



 Andrei Savu / @andreisavu / asavu@apache.org
               TechTuesday, Bucharest @ Adobe
Overview
●   What is Apache Whirr?
●   How can I use Whirr?
●   Typical Cluster Config
●   What's next?
●   Using Whirr for Fault Injection Testing
●   Test Cycle
●   Resources
What is Apache Whirr?
●   A set of libraries for running cloud services
●   Cloud-neutral
●   Common service API
●   Provides smart defaults

●   “The code that would become Whirr
    started out in 2007 as some bash scripts
    in Apache Hadoop for running Hadoop
    clusters on EC2.”
    http://incubator.apache.org/whirr/
How can I use Whirr?
●   Deploy clusters on demand for processing
    or for testing. Ideal if you are building
    applications on top of components of the
    Hadoop stack.

●   Supported services: Cassandra, Hadoop,
    Hbase, ZooKeeper (0.3.0)

●   Cloud providers: EC2, RackspaceCloud
    (using jclouds)
Typical Cluster Config
whirr.cluster-name=hadoop
whirr.instance-templates=1 hadoop-
namenode+hadoop-jobtracker,5 hadoop-
datanode+hadoop-tasktracker


whirr.provider=ec2
whirr.identity=AWS_ACCESS_KEY_ID
whirr.credential=AWS_SECRET
whirr.hardware-id=c1.xlarge
More Recipes
●   Check the recipes folder in the release

●   Contains recipes for all the supported
    services and plenty of comments.
What's next?
●   Support for private clouds: Eucalyptus or
    OpenStack

●   New services: Flume, Kafka, MongoDB

●   Many improvements and bug fixes

●   Integration with Hudson CI for Hadoop
    and Hbase (running YCSB)
Using Apache Whirr for
 Fault injection testing
Fault Injection Testing
●   Discover bugs in existing systems by
    simulating generally faulty hardware and
    networking

●   Inject on small test cluster and if it can
    make progress without corruption or
    unrecoverable errors it will also be free of
    errors on large clusters where error occur
    by natural causes.
Test Cycle
●   Setup: use Apache Whirr to bring a
    cluster up

●   Inject: faults based on a scenario

●   Monitor: continuously – collect data for
    diagnostics on failure

●   This is work in progress (M.Sc. research)
Resources
●   http://incubator.apache.org/whirr/

●   Deploy Hbase in minutes:
●   philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes



●   Deploy Cassandra in minutes:
●   philwhln.com/quickly-launch-a-cassandra-cluster-on-amazon-ec2
Resources (2)
●   http://hadoop.apache.org/

●   http://hbase.apache.org/

●   http://zookeeper.apache.org/
Thanks! Questions?

Andrei Savu – Whir Committer
     asavu@apache.org

Apache Whirr

  • 1.
    Apache Whirr On demandclusters in the cloud Andrei Savu / @andreisavu / asavu@apache.org TechTuesday, Bucharest @ Adobe
  • 2.
    Overview ● What is Apache Whirr? ● How can I use Whirr? ● Typical Cluster Config ● What's next? ● Using Whirr for Fault Injection Testing ● Test Cycle ● Resources
  • 3.
    What is ApacheWhirr? ● A set of libraries for running cloud services ● Cloud-neutral ● Common service API ● Provides smart defaults ● “The code that would become Whirr started out in 2007 as some bash scripts in Apache Hadoop for running Hadoop clusters on EC2.” http://incubator.apache.org/whirr/
  • 4.
    How can Iuse Whirr? ● Deploy clusters on demand for processing or for testing. Ideal if you are building applications on top of components of the Hadoop stack. ● Supported services: Cassandra, Hadoop, Hbase, ZooKeeper (0.3.0) ● Cloud providers: EC2, RackspaceCloud (using jclouds)
  • 5.
    Typical Cluster Config whirr.cluster-name=hadoop whirr.instance-templates=1hadoop- namenode+hadoop-jobtracker,5 hadoop- datanode+hadoop-tasktracker whirr.provider=ec2 whirr.identity=AWS_ACCESS_KEY_ID whirr.credential=AWS_SECRET whirr.hardware-id=c1.xlarge
  • 6.
    More Recipes ● Check the recipes folder in the release ● Contains recipes for all the supported services and plenty of comments.
  • 7.
    What's next? ● Support for private clouds: Eucalyptus or OpenStack ● New services: Flume, Kafka, MongoDB ● Many improvements and bug fixes ● Integration with Hudson CI for Hadoop and Hbase (running YCSB)
  • 8.
    Using Apache Whirrfor Fault injection testing
  • 9.
    Fault Injection Testing ● Discover bugs in existing systems by simulating generally faulty hardware and networking ● Inject on small test cluster and if it can make progress without corruption or unrecoverable errors it will also be free of errors on large clusters where error occur by natural causes.
  • 10.
    Test Cycle ● Setup: use Apache Whirr to bring a cluster up ● Inject: faults based on a scenario ● Monitor: continuously – collect data for diagnostics on failure ● This is work in progress (M.Sc. research)
  • 11.
    Resources ● http://incubator.apache.org/whirr/ ● Deploy Hbase in minutes: ● philwhln.com/run-the-latest-whirr-and-deploy-hbase-in-minutes ● Deploy Cassandra in minutes: ● philwhln.com/quickly-launch-a-cassandra-cluster-on-amazon-ec2
  • 12.
    Resources (2) ● http://hadoop.apache.org/ ● http://hbase.apache.org/ ● http://zookeeper.apache.org/
  • 13.
    Thanks! Questions? Andrei Savu– Whir Committer asavu@apache.org