Successfully reported this slideshow.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Taverna workflows in the cloud

  1. 1. Taverna workflows in the Cloud Robert Haines University of Manchester rhaines@manchester.ac.uk
  2. 2. Taverna* and workflows *Other workflow systems are available
  3. 3. Taverna workflows • Sophisticated analysis pipelines • A set of services to analyze or manage data (local or remote) • Workflows run through the workbench or via a server • Automation of data flow through services • Control of service invocation • Iteration over data sets • Provenance collection • Extensible and open source
  4. 4. Taverna Workbench • Desktop application • GUI • Plug-in Framework • Intermediate results views • Search for Web Services in catalogues • Search and publish to myExperiment
  5. 5. Taverna Server family • Taverna Server – Multiple clients, Multi-user – Local and large scale infrastructures – Site Replication • Taverna Server Amazon Image – Local R server – Multiple instances in Amazon Cloud and as required, for multiple users/uses and different security scenarios • Taverna Virtual Machine • Taverna Command Line • Bundled Servers, Services and Tools
  6. 6. Users are not the same…. any one individual can be all of these • Pro Makers: Technical Experts – Rich power tools – Control, flexibility, expressivity • In the Field Users – Re-modellers • Simplified though limited tools • Revise variants, tweaking • Inspection and guidance – Vanilla Users: Pre-cooked workflows • Point and click / form fill / ambient configuration • Web based / Bespoke / Embedded launch Workbench Lite
  7. 7. Taverna Tool Spectrum Technical Computational Scientist Domain Scientist Workbench Workbench Components Lite Domain-Specific Website / Tool Workflow Visibility Concept KnowledgeTaverna Domain High Low Player Command Line
  8. 8. The Taverna Suite of Tools Client User Interfaces User InterfacesWorkflow Repository Service Catalogue Third Party Tools Web Portals Activity and Service Plug-in Manager Workflow Provenance Workflow Server Secure Service Access Credential Manager Workflow Engine Virtual Machine Prog & APIs Command Line Taverna Lite Player Taverna Workbench
  9. 9. Freely available open source Current Version 2.4 80,000+ downloads across versions Part of the myGrid Toolkit Windows/Mac OS X/ Linux/unix Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble: “The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, Web or in the Cloud”, Nucleic Acids Res., May 2013. doi:10.1093/nar/gkt328 Taverna – www.taverna.org.uk
  10. 10. Workflows in the Cloud Biodiversity Virtual e-Laboratory
  11. 11. Biodiversity Virtual e-Laboratory • BioVeL is an international network of experts – Connects two scientific communities: IT and biodiversity • “Pals” system – Roughly a three-way split between: • Biodiversity scientists, Biodiversity Informaticians, Computer Scientists – Shares expertise in workflow studies among BioVeL’s users and friends – Fosters an international community of researchers and partners on biodiversity issues
  12. 12. Biodiversity Virtual e-Laboratory • BioVeL users want to be able to: – Import data from own research and/or from existing libraries – Use workflows to process vast amounts of data. – Build their own workflows – Access a library of workflows and re-use existing workflows – Cut down research time and overhead expenses – Contribute to other such initiatives, such as LifeWatch and GEO BON
  13. 13. Species occurrence Environmental layers Salinity Temp bottom Ice conc Primary production Ecological niche modeling of an invasive species
  14. 14. Model projection Model test Create model Select parameter values for the chosen algorithm Select algorithm Test the performance of the parameter in the model Test performance of the distribution prediction on the model Assemble the model on CRIA server Project Model with prediction layers High quality occurrence data set Select layers with environmental factors that are likely to influence the distribution of the speciesChangingalgorithm,parameter values,andsetoflayers Select prediction layers (e.g. 2050) Project Model with original layers Statistical analysis of the raster data Semi-automatized ecological niche modeling workflow • Scientist’s PowerPoint workflow – Used everyday • Came to Manchester • Two days with a Taverna developer – Not Scalable! • First iteration of workflow produced
  15. 15. Ecological niche modeling workflow Scary!
  16. 16. Ecological niche modeling workflow Better?
  17. 17. Population Modelling • Is the population growing or declining? • What effect has exploitation or other stimulus had on the population? • Which stage should be the focus of conservation? Year 1 • Stage • # flowers/fruits • Other variable S J V G D Year 2 • Survival • Stage 2 • # flowers/fruits 2 • # of seedlings recruited • Other variables 2 SURVIVAL GROWTH RATE FECUNDITY RECRUITMENT
  18. 18. Population Modelling Workflow
  19. 19. Simplifications for users • Pre-cooked workflows – In myExperiment • Run from the Web – Taverna Player • Wire into familiar tools – Spreadsheets – Community portals, e.g. ViBRANT Scratchpads • Packaging – Taverna VM
  20. 20. Making it “too simple” for users!? • Portal – Can handle many users – Makes it very easy to run workflows • So we see lots of workflow runs! – Which is GREAT! • Taverna has big requirements – BioVeL workflows are BIG – High CPU/Memory – Per running workflow • Taverna becomes the bottleneck
  21. 21. Scale workflows: More Taverna! • Scale and load-balance Taverna – Now we can run loads more workflows • Users are happy • Service providers are NOT! – Using services – Good – Overloading services – Bad * * Please imagine loads of arrows here!
  22. 22. Scale workflows: More services? • We need to replicate services – Bundle local to Taverna? • But we don’t “own” all services – Too big/complex for us to replicate? (Data) – Closed source? • BioVeL has (some) funds to help service providers – Scale, redesign, re-engineer? • Partnerships/MOUs
  23. 23. Data: Local services • Data can be uploaded once • It is: – Within your firewall/DMZ/VPC – Secure – Easy to access by services – In the right place at the right time • Data can be read/written by services – Quickly – Without worrying about security – At no cost (£)
  24. 24. Data: Remote services • Data should be uploaded once • It is: – Within your firewall/DMZ/VPC – Secure • It is not: – Easy to access by services – In the right place at the right time • To pass data between services it must be moved – Need secure third-party access – Bandwidth costs in to and out of the Cloud – Need “pass by reference”
  25. 25. Workflows in the Cloud Cloud Analytics for Life Sciences
  26. 26. SNP annotation Annotation task • Location, Gene, Transcript • Present in public databases, dbSNP, etc • Frequency in e.g. 1000 genome data • Conservation data (cross species)
  27. 27. Infrastructure Requirements • Execute analysis workflows • Accessible to clinicians and genetic testers • Cope with expanding demands on compute • Provide a secure environment • Collect provenance
  28. 28. Architecture overview Web interface Inputs Results Storage (S3) Ensembl (mySQL) Cache (S3) Taverna Server Taverna Server Taverna Server Workflow engine orchestrator e-Hive Other? Taverna CommonAPI Application specific tools and Web Services Application specific tools and Web Services Application specific tools and Web Services WS WS ToolToolWS All user interaction via web interface User data stored in the Cloud Data for all tools and Web Services stored in the Cloud Unified access to different workflow engines with our common REST API Tools and Web Services for each workflow are installed together for easy replication
  29. 29. Orchestrating workflows in the Cloud Input Workflow Data store Find virtual machine for this workflow Is one running? Start one Is there space on it? Wait until ready Run workflow Yes No Yes No Delete run Is this instance empty? Done Terminate it Yes No Status updates
  30. 30. The user’s view • Curated set of workflows – Designed, built and tested by domain experts – Quality assurance tested (if appropriate) • Workflows are presented as applications – The workflows themselves are hidden – Configured and run via a web interface • All user data stored securely in the Cloud – User separation • Workflows as a Service
  31. 31. Web interface: Getting started
  32. 32. Web interface: Creating a Run
  33. 33. Web interface: Checking run progress
  34. 34. Conclusions
  35. 35. The user’s view • “Science”, “Tools”, “Applications”, “Data” – Not workflows – Not infrastructure • But they ALL have workflows – On paper – In PowerPoint – In scripts – Run “by hand” – Too personal/specific – cannot share them – “Works on my machine”
  36. 36. Workflow as a Service • The workflow IS the service – Users do not see the Workflows – Run restricted sets of Taverna workflows in the cloud • Connects to other cloud based resources – storage, tools, etc. • Scale everything behind the scenes – Users can tweak parameters, but not design their own – Web portal access for scientists – Data passed by reference instead of by file – Pay as you go – cheap at the point of use
  37. 37. Supporting end-users • Make it easy – Automate workflows they are already using – Don’t get in the way of the science – Hide the infrastructure where possible • But it is really hard – So much has to be co-ordinated – Scale everything – Stay secure
  38. 38. Acknowledgements/Partners • University of Manchester • Cardiff University • European Commission 7th Framework Programme – 283359 - BioVeL • Eagle Genomics • Technology Strategy Board – 100932 - Cloud Analytics for Life Sciences • National Health Service • Amazon Web Services
  39. 39. Thanks • myGrid Team – Carole Goble (PI) – Shoaib Sufi – Alan Williams – Katy Wolstencroft • CA4LS – Abel Ureta-Vidal (PI) – Mike Cornell – Madhu Donapudi – Helen Hulme – Nick James • BioVeL – Alex Hardisty (PI) – Renato De Giovanni – Jonathan Giddy – Norman Morrison – Abraham Nieva de la Hidalga – Matthias Obst – Maria Paula Balcazar Vargas – Elisabeth Paymal – Hannu Saarenmaa …and many, many more…

×