Your SlideShare is downloading. ×
0
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Apache Whirr
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Whirr

3,577

Published on

By Lars George

By Lars George

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,577
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
59
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Apache Whirr Cloud Services Made Easy Lars George, Cloudera HUGUK Meetup 12 May 2011Friday, May 20, 2011
  • 2. What is Apache Whirr? ▪ An Apache Incubator project for running services in the cloud ▪ A community for sharing code and configuration for running clusters ▪ People: 9 committers (6 orgs), more contributors and users ▪ Projects: Hadoop, HBase, ZooKeeper, Cassandra ▪ More coming: Voldemort, ElasticSearch, Mesos, Oozie ▪ Whirr is used for ▪ Testing ▪ Evaluation and proof of concept ▪ Production (future)Friday, May 20, 2011
  • 3. High Level Overview ▪ Whirr uses low-level API libraries to talk to cloud providers ▪ Orchestrates clusters based on roles ▪ Configuration files define entire cluster ▪ Override anything you need to change on the command line ▪ Spins up minimal images, only base OS and SSH ▪ Ships your credentials ▪ Services are a combination of roles ▪ Define bootstrapping and configuration actions ▪ Adding your own service is (nearly) trivial ▪ No dealings with cloud provider internalsFriday, May 20, 2011
  • 4. Steps in writing a Whirr service ▪ 1. Identify service roles ▪ 2. Write a ClusterActionHandler for each role ▪ 3. Write scripts that run on cloud nodes ▪ 4. Package and install ▪ 5. RunFriday, May 20, 2011
  • 5. 1. Identify service roles ▪ Flume, a service for collecting https://github.com/cloudera/flume and moving large amounts of data ▪ Flume Master ▪ The head node, for coordination ▪ Whirr role name: flumedemo-master ▪ Flume Node ▪ Runs agents (generate logs) or collectors (aggregate logs) ▪ Whirr role name: flumedemo-nodeFriday, May 20, 2011
  • 6. 2. Write a ClusterActionHandler for each role public class FlumeNodeHandler extends ClusterActionHandlerSupport { public static final String ROLE = "flumedemo-node"; @Override public String getRole() { return ROLE; } @Override protected void beforeBootstrap(ClusterActionEvent event) throws IOException, InterruptedException { addStatement(event, call("install_java")); addStatement(event, call("install_flumedemo")); } // more ... }Friday, May 20, 2011
  • 7. Handlers can interact... public class FlumeNodeHandler extends ClusterActionHandlerSupport { // continued ... @Override protected void beforeConfigure(ClusterActionEvent event) throws IOException, InterruptedException { // firewall ingress authorization omitted Instance master = cluster.getInstanceMatching(role(FlumeMasterHandler.ROLE)); String masterAddress = master.getPrivateAddress().getHostAddress(); addStatement(event, call("configure_flumedemo_node", masterAddress)); } }Friday, May 20, 2011
  • 8. 3. Write scripts that run on cloud nodes ▪ install_java is built in ▪ Other functions are specified in individual files function install_flumedemo() { curl -O http://cloud.github.com/downloads/cloudera/flume/flume-0.9.3.tar.gz tar -C /usr/local/ -zxf flume-0.9.3.tar.gz echo "export FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf" >> /etc/profile }Friday, May 20, 2011
  • 9. You can run as many scripts as you want ▪ This script takes an argument to specify the master function configure_flumedemo_node() { MASTER_HOST=$1 cat > /usr/local/flume-0.9.3/conf/flume-site.xml <<EOF <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>flume.master.servers</name> <value>$MASTER_HOST</value> </property> </configuration> EOF FLUME_CONF_DIR=/usr/local/flume-0.9.3/conf nohup /usr/local/flume-0.9.3/bin/flume master > /var/log/flume.log 2>&1 & }Friday, May 20, 2011
  • 10. 4. Package and install ▪ Each service is a self-contained JAR: functions/configure_flumedemo_master.sh functions/configure_flumedemo_node.sh functions/install_flumedemo.sh META-INF/services/org.apache.whirr.service.ClusterActionHandler org/apache/whirr/service/example/FlumeMasterHandler.class org/apache/whirr/service/example/FlumeNodeHandler.class ▪ Discovered using java.util.ServiceLoader facility ▪ META-INF/services/org.apache.whirr.service.ClusterActionHandler: org.apache.whirr.service.example.FlumeMasterHandler org.apache.whirr.service.example.FlumeNodeHandler ▪ Place JAR in Whirr’s lib directoryFriday, May 20, 2011
  • 11. 5. Run ▪ Create a cluster spec file whirr.cluster-name=flumedemo whirr.instance-templates=1 flumedemo-master,1 flumedemo-node whirr.provider=aws-ec2 whirr.identity=${env:AWS_ACCESS_KEY_ID} whirr.credential=${env:AWS_SECRET_ACCESS_KEY} ▪ Then launch from the CLI % whirr launch-cluster --config flumedemo.properties ▪ or Java Configuration conf = new PropertiesConfiguration("flumedemo.properties"); ClusterSpec spec = new ClusterSpec(conf); Service s = new Service(); Cluster cluster = s.launchCluster(spec); // interact with cluster s.destroyCluster(spec);Friday, May 20, 2011
  • 12. DemoFriday, May 20, 2011
  • 13. Orchestration ▪ Instance templates are acted on independently in parallel ▪ Bootstrap phase ▪ start 1 instance for the flumedemo-master role and run its bootstrap script ▪ start 1 instance for the flumedemo-node role and run its bootstrap script ▪ Configure phase ▪ run the configure script on the flumedemo-master instance ▪ run the configure script on the flumedemo-node instance ▪ Note there is a barrier between the two phases, so nodes can get the master address in the configure phaseFriday, May 20, 2011
  • 14. Going further: Hadoop ▪ Service configuration ▪ Flume example was very simple ▪ In practice, you want to be able to override any service property ▪ For Hadoop, Whirr generates the service config file and copies to cluster ▪ E.g. hadoop-common.fs.trash.interval=1440 ▪ Sets fs.trash.interval in the cluster configuration ▪ Service version ▪ Tarball is parameterized by whirr.hadoop.tarball.urlFriday, May 20, 2011
  • 15. Going further: HBase Web REST HTable Browser Client Public UI RPC UI RPC UI RPC HTTP Master RegionServer RegionServer RestServer listen listen listen listen PrivateFriday, May 20, 2011
  • 16. Going further: HBase ▪ Service composition whirr.instance-templates=1 zookeeper+hadoop-namenode+hadoop-jobtracker+hbase-master, 5 hadoop-datanode+hadoop-tasktracker+hbase-regionserver ▪ HBase handlers will do the following in beforeConfigure(): ▪ open ports ▪ pass in ZooKeeper quorum from zookeeper role ▪ for non-master nodes: pass in master address ▪ Notice that the Hadoop cluster is overlaid on the HBase clusterFriday, May 20, 2011
  • 17. Challenges ▪ Complexity ▪ Degrees of freedom ▪ #clouds × #OS × #hardware × #locations × #services × #configs = a big number! ▪ Known good configurations, recipes ▪ Regular automated testing ▪ Debugging ▪ What to do when the service hasn’t come up? ▪ Logs ▪ RecoverabilityFriday, May 20, 2011
  • 18. What’s next? ▪ Release 0.5.0 coming soon ▪ Support local tarball upload (WHIRR-220) ▪ Allow to run component tests in memory (WHIRR-243) ▪ More services (ElasticSearch, Voldemort) ▪ More (tested) cloud providers ▪ Pool provider/BYON ▪ Cluster resizing ▪ https://cwiki.apache.org/confluence/display/WHIRR/RoadMapFriday, May 20, 2011
  • 19. Questions ▪ Find out more at ▪ http://incubator.apache.org/whirr ▪ lars@cloudera.comFriday, May 20, 2011
  • 20. provision Public hbase-master hbase-regionserver hbase-regionserver hbase-restserver Ubuntu 10.04 Ubuntu 10.04 Ubuntu 10.04 Ubuntu 10.04 8GB RAM 8GB RAM 8GB RAM 2GB RAM 8 Cores 8 Cores 8 Cores 1 CPU PrivateFriday, May 20, 2011
  • 21. install Public hbase-master hbase-regionserver hbase-regionserver hbase-restserver login: hbase login: hbase login: hbase login: hbase java: 1.6.0_16 java: 1.6.0_16 java: 1.6.0_16 java: 1.6.0_16 hbase: 0.90.1 hbase: 0.90.1 hbase: 0.90.1 hbase: 0.90.1 PrivateFriday, May 20, 2011
  • 22. configure Public UI RPC UI RPC UI RPC HTTP Master RegionServer RegionServer RestServer listen listen listen listen PrivateFriday, May 20, 2011
  • 23. manage Public UI RPC UI RPC UI RPC Master #2 Master RegionServer listen listen listen PrivateFriday, May 20, 2011

×