Taverna workflows can be run in the cloud to automate complex analysis pipelines and access remote data and services. This allows sophisticated computational analyses to be shared as web services. The BioVeL and CA4LS projects are developing cloud-based workflow systems to support life scientists and clinical researchers. Workflows are hidden from users, who access pre-configured analyses via a web interface. This "workflow as a service" approach scales easily and provides a secure environment for data-intensive biomedical research.
3. Taverna workflows
• Sophisticated analysis pipelines
• A set of services to analyze or
manage data (local or remote)
• Workflows run through the
workbench or via a server
• Automation of data flow
through services
• Control of service invocation
• Iteration over data sets
• Provenance collection
• Extensible and open source
4. Taverna Workbench
• Desktop application
• GUI
• Plug-in Framework
• Intermediate results
views
• Search for Web
Services in catalogues
• Search and publish to
myExperiment
5. Taverna Server family
• Taverna Server
– Multiple clients, Multi-user
– Local and large scale infrastructures
– Site Replication
• Taverna Server Amazon Image
– Local R server
– Multiple instances in Amazon Cloud and as
required, for multiple users/uses and different
security scenarios
• Taverna Virtual Machine
• Taverna Command Line
• Bundled Servers, Services and Tools
6. Users are not the same….
any one individual can be all of these
• Pro Makers: Technical Experts
– Rich power tools
– Control, flexibility, expressivity
• In the Field Users
– Re-modellers
• Simplified though limited tools
• Revise variants, tweaking
• Inspection and guidance
– Vanilla Users: Pre-cooked workflows
• Point and click / form fill / ambient configuration
• Web based / Bespoke / Embedded launch
Workbench
Lite
7. Taverna Tool Spectrum
Technical Computational
Scientist
Domain
Scientist
Workbench Workbench
Components
Lite
Domain-Specific
Website / Tool
Workflow Visibility
Concept KnowledgeTaverna Domain
High Low
Player Command Line
8. The Taverna Suite of Tools
Client User Interfaces
User InterfacesWorkflow Repository
Service Catalogue
Third Party Tools
Web Portals
Activity and Service
Plug-in Manager
Workflow
Provenance
Workflow
Server
Secure Service Access
Credential Manager
Workflow
Engine
Virtual
Machine
Prog & APIs
Command
Line
Taverna Lite
Player
Taverna Workbench
9. Freely available
open source
Current Version 2.4
80,000+ downloads
across versions
Part of the myGrid Toolkit
Windows/Mac OS X/
Linux/unix
Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian
Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex
Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble:
“The Taverna workflow suite: designing and executing workflows of Web Services on the
desktop, Web or in the Cloud”, Nucleic Acids Res., May 2013. doi:10.1093/nar/gkt328
Taverna – www.taverna.org.uk
11. Biodiversity Virtual e-Laboratory
• BioVeL is an international network of experts
– Connects two scientific communities: IT and
biodiversity
• “Pals” system
– Roughly a three-way split between:
• Biodiversity scientists, Biodiversity Informaticians, Computer
Scientists
– Shares expertise in workflow studies among BioVeL’s
users and friends
– Fosters an international community of researchers and
partners on biodiversity issues
12. Biodiversity Virtual e-Laboratory
• BioVeL users want to be able to:
– Import data from own research and/or from existing
libraries
– Use workflows to process vast amounts of data.
– Build their own workflows
– Access a library of workflows and re-use existing
workflows
– Cut down research time and overhead expenses
– Contribute to other such initiatives, such as LifeWatch
and GEO BON
13. Species occurrence Environmental layers
Salinity
Temp bottom
Ice conc
Primary production
Ecological niche modeling of an
invasive species
14. Model projection
Model test
Create model
Select parameter values
for the chosen algorithm
Select algorithm
Test the performance of
the parameter in the model
Test performance of the
distribution prediction on the
model
Assemble the model on
CRIA server
Project Model with
prediction layers
High quality occurrence
data set
Select layers with environmental
factors that are likely to influence
the distribution of the speciesChangingalgorithm,parameter
values,andsetoflayers
Select prediction layers (e.g. 2050)
Project Model with original
layers
Statistical analysis of the
raster data
Semi-automatized ecological niche
modeling workflow
• Scientist’s PowerPoint
workflow
– Used everyday
• Came to
Manchester
• Two days with a
Taverna developer
– Not Scalable!
• First iteration of
workflow produced
17. Population Modelling
• Is the population growing or declining?
• What effect has exploitation or other stimulus
had on the population?
• Which stage should be the focus of
conservation?
Year 1
• Stage
• # flowers/fruits
• Other variable
S
J
V
G
D
Year 2
• Survival
• Stage 2
• # flowers/fruits 2
• # of seedlings recruited
• Other variables 2
SURVIVAL
GROWTH RATE
FECUNDITY
RECRUITMENT
19. Simplifications for users
• Pre-cooked workflows
– In myExperiment
• Run from the Web
– Taverna Player
• Wire into familiar tools
– Spreadsheets
– Community portals,
e.g. ViBRANT
Scratchpads
• Packaging
– Taverna VM
20. Making it “too simple” for users!?
• Portal
– Can handle many users
– Makes it very easy to run workflows
• So we see lots of workflow runs!
– Which is GREAT!
• Taverna has big requirements
– BioVeL workflows are BIG
– High CPU/Memory
– Per running workflow
• Taverna becomes the bottleneck
21. Scale workflows: More Taverna!
• Scale and load-balance Taverna
– Now we can run loads more
workflows
• Users are happy
• Service providers are NOT!
– Using services – Good
– Overloading services – Bad
*
* Please imagine loads of arrows here!
22. Scale workflows: More services?
• We need to replicate services
– Bundle local to Taverna?
• But we don’t “own” all services
– Too big/complex for us to
replicate? (Data)
– Closed source?
• BioVeL has (some) funds to
help service providers
– Scale, redesign, re-engineer?
• Partnerships/MOUs
23. Data: Local services
• Data can be uploaded once
• It is:
– Within your firewall/DMZ/VPC
– Secure
– Easy to access by services
– In the right place at the right time
• Data can be read/written by
services
– Quickly
– Without worrying about security
– At no cost (£)
24. Data: Remote services
• Data should be uploaded once
• It is:
– Within your firewall/DMZ/VPC
– Secure
• It is not:
– Easy to access by services
– In the right place at the right time
• To pass data between services it must
be moved
– Need secure third-party access
– Bandwidth costs in to and out of the
Cloud
– Need “pass by reference”
26. SNP annotation
Annotation task
• Location, Gene, Transcript
• Present in public databases, dbSNP, etc
• Frequency in e.g. 1000 genome data
• Conservation data (cross species)
27. Infrastructure Requirements
• Execute analysis workflows
• Accessible to clinicians and genetic testers
• Cope with expanding demands on compute
• Provide a secure environment
• Collect provenance
29. Orchestrating workflows in the Cloud
Input
Workflow
Data
store
Find virtual
machine for
this workflow
Is one
running?
Start
one
Is there
space on
it?
Wait until
ready
Run
workflow
Yes
No
Yes
No
Delete run
Is this
instance
empty?
Done
Terminate
it
Yes
No
Status updates
30. The user’s view
• Curated set of workflows
– Designed, built and tested by domain experts
– Quality assurance tested (if appropriate)
• Workflows are presented as applications
– The workflows themselves are hidden
– Configured and run via a web interface
• All user data stored securely in the Cloud
– User separation
• Workflows as a Service
35. The user’s view
• “Science”, “Tools”, “Applications”, “Data”
– Not workflows
– Not infrastructure
• But they ALL have workflows
– On paper
– In PowerPoint
– In scripts
– Run “by hand”
– Too personal/specific – cannot share them
– “Works on my machine”
36. Workflow as a Service
• The workflow IS the service
– Users do not see the Workflows
– Run restricted sets of Taverna workflows in the cloud
• Connects to other cloud based resources – storage, tools, etc.
• Scale everything behind the scenes
– Users can tweak parameters, but not design their own
– Web portal access for scientists
– Data passed by reference instead of by file
– Pay as you go – cheap at the point of use
37. Supporting end-users
• Make it easy
– Automate workflows they are already using
– Don’t get in the way of the science
– Hide the infrastructure where possible
• But it is really hard
– So much has to be co-ordinated
– Scale everything
– Stay secure
38. Acknowledgements/Partners
• University of Manchester
• Cardiff University
• European Commission 7th
Framework Programme
– 283359 - BioVeL
• Eagle Genomics
• Technology Strategy
Board
– 100932 - Cloud Analytics for
Life Sciences
• National Health Service
• Amazon Web Services
39. Thanks
• myGrid Team
– Carole Goble (PI)
– Shoaib Sufi
– Alan Williams
– Katy Wolstencroft
• CA4LS
– Abel Ureta-Vidal (PI)
– Mike Cornell
– Madhu Donapudi
– Helen Hulme
– Nick James
• BioVeL
– Alex Hardisty (PI)
– Renato De Giovanni
– Jonathan Giddy
– Norman Morrison
– Abraham Nieva de la Hidalga
– Matthias Obst
– Maria Paula Balcazar Vargas
– Elisabeth Paymal
– Hannu Saarenmaa
…and many, many more…