Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. Taverna + ARC <ul><li>Workflow management, command line tools and the grid </li></ul>Saturday, June 27, 2009 Hajo Nils Krabbenhöft, University of Lübeck
  2. 2. Why use a grid? <ul><li>cpu power </li></ul><ul><li>storage space </li></ul><ul><li>node redundancy </li></ul>
  3. 3. Why use a grid? <ul><li>resource sharing </li></ul><ul><ul><li>tit-for-tat </li></ul></ul><ul><ul><li>buy cpu power </li></ul></ul><ul><li>shared maintenance </li></ul><ul><ul><li>software development </li></ul></ul><ul><ul><li>package deployment </li></ul></ul><ul><ul><li>configuration </li></ul></ul>
  4. 4. Why use Taverna 2 ? <ul><li>knowledge sharing </li></ul><ul><ul><li>web services </li></ul></ul><ul><ul><li>plug-ins </li></ul></ul><ul><ul><li>myExperiment </li></ul></ul>
  5. 5. Why use Taverna 2 ? <ul><ul><li>myExperiment </li></ul></ul>
  6. 6. Why use Taverna 2 ? <ul><li>dependency management </li></ul><ul><li>data management & conversion </li></ul><ul><li>easy tweaking </li></ul><ul><li>database integration </li></ul>
  7. 7. shell script pitfalls <ul><li>command line ambiguities </li></ul><ul><ul><li>no specific program version </li></ul></ul><ul><ul><li>unaware of changes to syntax </li></ul></ul>
  8. 8. manual grid usage <ul><li>upload & download data </li></ul><ul><li>xrsl difficult to write </li></ul><ul><li>no failure recovery </li></ul>
  9. 9. proposed solution Workflow plug-in submits ARC executes use cases runtime environments out of through
  10. 10. proposed solution <ul><li>ARC grid middleware </li></ul><ul><ul><li>homogeneous interface </li></ul></ul><ul><ul><li>security certificates </li></ul></ul><ul><ul><li>data management </li></ul></ul><ul><ul><li>scalable </li></ul></ul>
  11. 11. proposed solution <ul><li>runtime environments </li></ul><ul><ul><li>specify program & version </li></ul></ul><ul><ul><li>installation on demand </li></ul></ul><ul><li>use cases </li></ul><ul><ul><li>require RE </li></ul></ul><ul><ul><li>no command line </li></ul></ul><ul><ul><li>shared repository </li></ul></ul>
  12. 12. proposed solution <ul><li>Taverna plug-in </li></ul><ul><ul><li>job submission </li></ul></ul><ul><ul><ul><li>use cases => run in parallel </li></ul></ul></ul><ul><ul><li>storage management </li></ul></ul><ul><ul><ul><li>data references => fast </li></ul></ul></ul><ul><ul><li>silent failover </li></ul></ul><ul><ul><li>SSH + local for testing </li></ul></ul>
  13. 13. proposed solution <ul><li>Taverna </li></ul><ul><ul><li>easy to present </li></ul></ul><ul><ul><li>embedded workflows </li></ul></ul><ul><ul><li>parameter tweaking </li></ul></ul><ul><ul><li>managed dependencies </li></ul></ul><ul><ul><li>easy retry </li></ul></ul><ul><ul><li>easy parallelization </li></ul></ul>X. Zhou et al.: An Easy Setup for Parallel Medical Image Processing: Using Taverna and ARC
  14. 14. reality check <ul><li>dynamic RE still experimental </li></ul><ul><ul><li>use common tools </li></ul></ul><ul><ul><li>send binaries </li></ul></ul><ul><ul><li>call administrators </li></ul></ul><ul><li>firewall </li></ul><ul><ul><li>need LDAP and GSIFTP ports </li></ul></ul><ul><ul><li>proxy support </li></ul></ul>
  15. 15. reality check <ul><li>disk caching since Taverna 2.0 </li></ul><ul><li>programs not locally installable </li></ul><ul><ul><li>use as web service </li></ul></ul><ul><li>upload is slow </li></ul><ul><ul><li>upload static files to SE </li></ul></ul>
  16. 16. neat toys <ul><li>Taverna as web service </li></ul><ul><li>Taverna on grid node </li></ul><ul><li>embedded Taverna </li></ul><ul><li>NestedVM </li></ul><ul><ul><li>package arbitrary C program into JAR </li></ul></ul>
  17. 17. neat toys <ul><li>Amazon S3 </li></ul><ul><ul><li>upload from Taverna </li></ul></ul><ul><ul><li>grid URL </li></ul></ul><ul><ul><li>good for static data </li></ul></ul><ul><li>Amazon EC </li></ul><ul><ul><li>on-demand grid nodes </li></ul></ul><ul><ul><li>control from Taverna </li></ul></ul>
  18. 18. neat toys <ul><li>use case java API </li></ul><ul><ul><li>submit, receive, monitor </li></ul></ul><ul><ul><li>data references </li></ul></ul><ul><ul><li>silent failover </li></ul></ul><ul><ul><li>but NO dependency management </li></ul></ul><ul><li>GridRunnable </li></ul><ul><ul><li>e.g. clustering based on different criteria </li></ul></ul>
  19. 19. Thank you for your attention http://