SlideShare a Scribd company logo
Taverna workflows in the Cloud
Robert Haines
University of Manchester
rhaines@manchester.ac.uk
Taverna* and workflows
*Other workflow systems are available
Taverna workflows
• Sophisticated analysis pipelines
• A set of services to analyze or
manage data (local or remote)
• Workflows run through the
workbench or via a server
• Automation of data flow
through services
• Control of service invocation
• Iteration over data sets
• Provenance collection
• Extensible and open source
Taverna Workbench
• Desktop application
• GUI
• Plug-in Framework
• Intermediate results
views
• Search for Web
Services in catalogues
• Search and publish to
myExperiment
Taverna Server family
• Taverna Server
– Multiple clients, Multi-user
– Local and large scale infrastructures
– Site Replication
• Taverna Server Amazon Image
– Local R server
– Multiple instances in Amazon Cloud and as
required, for multiple users/uses and different
security scenarios
• Taverna Virtual Machine
• Taverna Command Line
• Bundled Servers, Services and Tools
Users are not the same….
any one individual can be all of these
• Pro Makers: Technical Experts
– Rich power tools
– Control, flexibility, expressivity
• In the Field Users
– Re-modellers
• Simplified though limited tools
• Revise variants, tweaking
• Inspection and guidance
– Vanilla Users: Pre-cooked workflows
• Point and click / form fill / ambient configuration
• Web based / Bespoke / Embedded launch
Workbench
Lite
Taverna Tool Spectrum
Technical Computational
Scientist
Domain
Scientist
Workbench Workbench
Components
Lite
Domain-Specific
Website / Tool
Workflow Visibility
Concept KnowledgeTaverna Domain
High Low
Player Command Line
The Taverna Suite of Tools
Client User Interfaces
User InterfacesWorkflow Repository
Service Catalogue
Third Party Tools
Web Portals
Activity and Service
Plug-in Manager
Workflow
Provenance
Workflow
Server
Secure Service Access
Credential Manager
Workflow
Engine
Virtual
Machine
Prog & APIs
Command
Line
Taverna Lite
Player
Taverna Workbench
Freely available
open source
Current Version 2.4
80,000+ downloads
across versions
Part of the myGrid Toolkit
Windows/Mac OS X/
Linux/unix
Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian
Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex
Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble:
“The Taverna workflow suite: designing and executing workflows of Web Services on the
desktop, Web or in the Cloud”, Nucleic Acids Res., May 2013. doi:10.1093/nar/gkt328
Taverna – www.taverna.org.uk
Workflows in the Cloud
Biodiversity Virtual e-Laboratory
Biodiversity Virtual e-Laboratory
• BioVeL is an international network of experts
– Connects two scientific communities: IT and
biodiversity
• “Pals” system
– Roughly a three-way split between:
• Biodiversity scientists, Biodiversity Informaticians, Computer
Scientists
– Shares expertise in workflow studies among BioVeL’s
users and friends
– Fosters an international community of researchers and
partners on biodiversity issues
Biodiversity Virtual e-Laboratory
• BioVeL users want to be able to:
– Import data from own research and/or from existing
libraries
– Use workflows to process vast amounts of data.
– Build their own workflows
– Access a library of workflows and re-use existing
workflows
– Cut down research time and overhead expenses
– Contribute to other such initiatives, such as LifeWatch
and GEO BON
Species occurrence Environmental layers
Salinity
Temp bottom
Ice conc
Primary production
Ecological niche modeling of an
invasive species
Model projection
Model test
Create model
Select parameter values
for the chosen algorithm
Select algorithm
Test the performance of
the parameter in the model
Test performance of the
distribution prediction on the
model
Assemble the model on
CRIA server
Project Model with
prediction layers
High quality occurrence
data set
Select layers with environmental
factors that are likely to influence
the distribution of the speciesChangingalgorithm,parameter
values,andsetoflayers
Select prediction layers (e.g. 2050)
Project Model with original
layers
Statistical analysis of the
raster data
Semi-automatized ecological niche
modeling workflow
• Scientist’s PowerPoint
workflow
– Used everyday
• Came to
Manchester
• Two days with a
Taverna developer
– Not Scalable!
• First iteration of
workflow produced
Ecological niche modeling workflow
Scary!
Ecological niche modeling workflow
Better?
Population Modelling
• Is the population growing or declining?
• What effect has exploitation or other stimulus
had on the population?
• Which stage should be the focus of
conservation?
Year 1
• Stage
• # flowers/fruits
• Other variable
S
J
V
G
D
Year 2
• Survival
• Stage 2
• # flowers/fruits 2
• # of seedlings recruited
• Other variables 2
SURVIVAL
GROWTH RATE
FECUNDITY
RECRUITMENT
Population Modelling Workflow
Simplifications for users
• Pre-cooked workflows
– In myExperiment
• Run from the Web
– Taverna Player
• Wire into familiar tools
– Spreadsheets
– Community portals,
e.g. ViBRANT
Scratchpads
• Packaging
– Taverna VM
Making it “too simple” for users!?
• Portal
– Can handle many users
– Makes it very easy to run workflows
• So we see lots of workflow runs!
– Which is GREAT!
• Taverna has big requirements
– BioVeL workflows are BIG
– High CPU/Memory
– Per running workflow
• Taverna becomes the bottleneck
Scale workflows: More Taverna!
• Scale and load-balance Taverna
– Now we can run loads more
workflows
• Users are happy
• Service providers are NOT!
– Using services – Good
– Overloading services – Bad
*
* Please imagine loads of arrows here!
Scale workflows: More services?
• We need to replicate services
– Bundle local to Taverna?
• But we don’t “own” all services
– Too big/complex for us to
replicate? (Data)
– Closed source?
• BioVeL has (some) funds to
help service providers
– Scale, redesign, re-engineer?
• Partnerships/MOUs
Data: Local services
• Data can be uploaded once
• It is:
– Within your firewall/DMZ/VPC
– Secure
– Easy to access by services
– In the right place at the right time
• Data can be read/written by
services
– Quickly
– Without worrying about security
– At no cost (£)
Data: Remote services
• Data should be uploaded once
• It is:
– Within your firewall/DMZ/VPC
– Secure
• It is not:
– Easy to access by services
– In the right place at the right time
• To pass data between services it must
be moved
– Need secure third-party access
– Bandwidth costs in to and out of the
Cloud
– Need “pass by reference”
Workflows in the Cloud
Cloud Analytics for Life Sciences
SNP annotation
Annotation task
• Location, Gene, Transcript
• Present in public databases, dbSNP, etc
• Frequency in e.g. 1000 genome data
• Conservation data (cross species)
Infrastructure Requirements
• Execute analysis workflows
• Accessible to clinicians and genetic testers
• Cope with expanding demands on compute
• Provide a secure environment
• Collect provenance
Architecture overview
Web
interface
Inputs
Results
Storage
(S3)
Ensembl
(mySQL)
Cache
(S3)
Taverna
Server
Taverna
Server
Taverna
Server
Workflow
engine
orchestrator
e-Hive
Other?
Taverna
CommonAPI
Application specific tools and
Web Services
Application specific tools and
Web Services
Application specific tools
and Web Services
WS WS ToolToolWS
All user interaction
via web interface
User data stored in
the Cloud
Data for all tools and Web Services
stored in the Cloud
Unified access to different
workflow engines with our
common REST API
Tools and Web Services for each
workflow are installed together
for easy replication
Orchestrating workflows in the Cloud
Input
Workflow
Data
store
Find virtual
machine for
this workflow
Is one
running?
Start
one
Is there
space on
it?
Wait until
ready
Run
workflow
Yes
No
Yes
No
Delete run
Is this
instance
empty?
Done
Terminate
it
Yes
No
Status updates
The user’s view
• Curated set of workflows
– Designed, built and tested by domain experts
– Quality assurance tested (if appropriate)
• Workflows are presented as applications
– The workflows themselves are hidden
– Configured and run via a web interface
• All user data stored securely in the Cloud
– User separation
• Workflows as a Service
Web interface: Getting started
Web interface: Creating a Run
Web interface: Checking run progress
Conclusions
The user’s view
• “Science”, “Tools”, “Applications”, “Data”
– Not workflows
– Not infrastructure
• But they ALL have workflows
– On paper
– In PowerPoint
– In scripts
– Run “by hand”
– Too personal/specific – cannot share them
– “Works on my machine”
Workflow as a Service
• The workflow IS the service
– Users do not see the Workflows
– Run restricted sets of Taverna workflows in the cloud
• Connects to other cloud based resources – storage, tools, etc.
• Scale everything behind the scenes
– Users can tweak parameters, but not design their own
– Web portal access for scientists
– Data passed by reference instead of by file
– Pay as you go – cheap at the point of use
Supporting end-users
• Make it easy
– Automate workflows they are already using
– Don’t get in the way of the science
– Hide the infrastructure where possible
• But it is really hard
– So much has to be co-ordinated
– Scale everything
– Stay secure
Acknowledgements/Partners
• University of Manchester
• Cardiff University
• European Commission 7th
Framework Programme
– 283359 - BioVeL
• Eagle Genomics
• Technology Strategy
Board
– 100932 - Cloud Analytics for
Life Sciences
• National Health Service
• Amazon Web Services
Thanks
• myGrid Team
– Carole Goble (PI)
– Shoaib Sufi
– Alan Williams
– Katy Wolstencroft
• CA4LS
– Abel Ureta-Vidal (PI)
– Mike Cornell
– Madhu Donapudi
– Helen Hulme
– Nick James
• BioVeL
– Alex Hardisty (PI)
– Renato De Giovanni
– Jonathan Giddy
– Norman Morrison
– Abraham Nieva de la Hidalga
– Matthias Obst
– Maria Paula Balcazar Vargas
– Elisabeth Paymal
– Hannu Saarenmaa
…and many, many more…

More Related Content

Similar to Taverna workflows in the cloud

Integrating Taverna Player into Scratchpads
Integrating Taverna Player into ScratchpadsIntegrating Taverna Player into Scratchpads
Integrating Taverna Player into Scratchpads
Robert Haines
 

Similar to Taverna workflows in the cloud (20)

Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Integrating Taverna Player into Scratchpads
Integrating Taverna Player into ScratchpadsIntegrating Taverna Player into Scratchpads
Integrating Taverna Player into Scratchpads
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments2016-10-20 BioExcel: Advances in Scientific Workflow Environments
2016-10-20 BioExcel: Advances in Scientific Workflow Environments
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debugging
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
 
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...Pachube: an open, easy to use, secure & scalable platform for building the 'I...
Pachube: an open, easy to use, secure & scalable platform for building the 'I...
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Bertenthal
BertenthalBertenthal
Bertenthal
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Desktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'OmicsDesktop as a Service supporting Environmental 'Omics
Desktop as a Service supporting Environmental 'Omics
 

More from myGrid team

More from myGrid team (20)

2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 
2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example2014 Taverna Tutorial Biodiversity example
2014 Taverna Tutorial Biodiversity example
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial Components
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions
 
2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows
 
2014 Taverna Tutorial R script
2014 Taverna Tutorial R script2014 Taverna Tutorial R script
2014 Taverna Tutorial R script
 
2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool service2014 Taverna tutorial Tool service
2014 Taverna tutorial Tool service
 
2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts
 
2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue
 
2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath
 
2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import
 
2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow
 
2014 Taverna tutorial REST services
2014 Taverna tutorial REST services2014 Taverna tutorial REST services
2014 Taverna tutorial REST services
 
2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperiment2014 Taverna tutorial myExperiment
2014 Taverna tutorial myExperiment
 
2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows2014 Taverna tutorial introduction to Taverna workflows
2014 Taverna tutorial introduction to Taverna workflows
 
SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice Description
 
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Taverna workflows in the cloud

  • 1. Taverna workflows in the Cloud Robert Haines University of Manchester rhaines@manchester.ac.uk
  • 2. Taverna* and workflows *Other workflow systems are available
  • 3. Taverna workflows • Sophisticated analysis pipelines • A set of services to analyze or manage data (local or remote) • Workflows run through the workbench or via a server • Automation of data flow through services • Control of service invocation • Iteration over data sets • Provenance collection • Extensible and open source
  • 4. Taverna Workbench • Desktop application • GUI • Plug-in Framework • Intermediate results views • Search for Web Services in catalogues • Search and publish to myExperiment
  • 5. Taverna Server family • Taverna Server – Multiple clients, Multi-user – Local and large scale infrastructures – Site Replication • Taverna Server Amazon Image – Local R server – Multiple instances in Amazon Cloud and as required, for multiple users/uses and different security scenarios • Taverna Virtual Machine • Taverna Command Line • Bundled Servers, Services and Tools
  • 6. Users are not the same…. any one individual can be all of these • Pro Makers: Technical Experts – Rich power tools – Control, flexibility, expressivity • In the Field Users – Re-modellers • Simplified though limited tools • Revise variants, tweaking • Inspection and guidance – Vanilla Users: Pre-cooked workflows • Point and click / form fill / ambient configuration • Web based / Bespoke / Embedded launch Workbench Lite
  • 7. Taverna Tool Spectrum Technical Computational Scientist Domain Scientist Workbench Workbench Components Lite Domain-Specific Website / Tool Workflow Visibility Concept KnowledgeTaverna Domain High Low Player Command Line
  • 8. The Taverna Suite of Tools Client User Interfaces User InterfacesWorkflow Repository Service Catalogue Third Party Tools Web Portals Activity and Service Plug-in Manager Workflow Provenance Workflow Server Secure Service Access Credential Manager Workflow Engine Virtual Machine Prog & APIs Command Line Taverna Lite Player Taverna Workbench
  • 9. Freely available open source Current Version 2.4 80,000+ downloads across versions Part of the myGrid Toolkit Windows/Mac OS X/ Linux/unix Katherine Wolstencroft, Robert Haines, Donal Fellows, Alan Williams, David Withers, Stuart Owen, Stian Soiland-Reyes, Ian Dunlop, Aleksandra Nenadic, Paul Fisher, Jiten Bhagat, Khalid Belhajjame, Finn Bacall, Alex Hardisty, Abraham Nieva de la Hidalga, Maria P. Balcazar Vargas, Shoaib Sufi, and Carole Goble: “The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, Web or in the Cloud”, Nucleic Acids Res., May 2013. doi:10.1093/nar/gkt328 Taverna – www.taverna.org.uk
  • 10. Workflows in the Cloud Biodiversity Virtual e-Laboratory
  • 11. Biodiversity Virtual e-Laboratory • BioVeL is an international network of experts – Connects two scientific communities: IT and biodiversity • “Pals” system – Roughly a three-way split between: • Biodiversity scientists, Biodiversity Informaticians, Computer Scientists – Shares expertise in workflow studies among BioVeL’s users and friends – Fosters an international community of researchers and partners on biodiversity issues
  • 12. Biodiversity Virtual e-Laboratory • BioVeL users want to be able to: – Import data from own research and/or from existing libraries – Use workflows to process vast amounts of data. – Build their own workflows – Access a library of workflows and re-use existing workflows – Cut down research time and overhead expenses – Contribute to other such initiatives, such as LifeWatch and GEO BON
  • 13. Species occurrence Environmental layers Salinity Temp bottom Ice conc Primary production Ecological niche modeling of an invasive species
  • 14. Model projection Model test Create model Select parameter values for the chosen algorithm Select algorithm Test the performance of the parameter in the model Test performance of the distribution prediction on the model Assemble the model on CRIA server Project Model with prediction layers High quality occurrence data set Select layers with environmental factors that are likely to influence the distribution of the speciesChangingalgorithm,parameter values,andsetoflayers Select prediction layers (e.g. 2050) Project Model with original layers Statistical analysis of the raster data Semi-automatized ecological niche modeling workflow • Scientist’s PowerPoint workflow – Used everyday • Came to Manchester • Two days with a Taverna developer – Not Scalable! • First iteration of workflow produced
  • 15. Ecological niche modeling workflow Scary!
  • 16. Ecological niche modeling workflow Better?
  • 17. Population Modelling • Is the population growing or declining? • What effect has exploitation or other stimulus had on the population? • Which stage should be the focus of conservation? Year 1 • Stage • # flowers/fruits • Other variable S J V G D Year 2 • Survival • Stage 2 • # flowers/fruits 2 • # of seedlings recruited • Other variables 2 SURVIVAL GROWTH RATE FECUNDITY RECRUITMENT
  • 19. Simplifications for users • Pre-cooked workflows – In myExperiment • Run from the Web – Taverna Player • Wire into familiar tools – Spreadsheets – Community portals, e.g. ViBRANT Scratchpads • Packaging – Taverna VM
  • 20. Making it “too simple” for users!? • Portal – Can handle many users – Makes it very easy to run workflows • So we see lots of workflow runs! – Which is GREAT! • Taverna has big requirements – BioVeL workflows are BIG – High CPU/Memory – Per running workflow • Taverna becomes the bottleneck
  • 21. Scale workflows: More Taverna! • Scale and load-balance Taverna – Now we can run loads more workflows • Users are happy • Service providers are NOT! – Using services – Good – Overloading services – Bad * * Please imagine loads of arrows here!
  • 22. Scale workflows: More services? • We need to replicate services – Bundle local to Taverna? • But we don’t “own” all services – Too big/complex for us to replicate? (Data) – Closed source? • BioVeL has (some) funds to help service providers – Scale, redesign, re-engineer? • Partnerships/MOUs
  • 23. Data: Local services • Data can be uploaded once • It is: – Within your firewall/DMZ/VPC – Secure – Easy to access by services – In the right place at the right time • Data can be read/written by services – Quickly – Without worrying about security – At no cost (£)
  • 24. Data: Remote services • Data should be uploaded once • It is: – Within your firewall/DMZ/VPC – Secure • It is not: – Easy to access by services – In the right place at the right time • To pass data between services it must be moved – Need secure third-party access – Bandwidth costs in to and out of the Cloud – Need “pass by reference”
  • 25. Workflows in the Cloud Cloud Analytics for Life Sciences
  • 26. SNP annotation Annotation task • Location, Gene, Transcript • Present in public databases, dbSNP, etc • Frequency in e.g. 1000 genome data • Conservation data (cross species)
  • 27. Infrastructure Requirements • Execute analysis workflows • Accessible to clinicians and genetic testers • Cope with expanding demands on compute • Provide a secure environment • Collect provenance
  • 28. Architecture overview Web interface Inputs Results Storage (S3) Ensembl (mySQL) Cache (S3) Taverna Server Taverna Server Taverna Server Workflow engine orchestrator e-Hive Other? Taverna CommonAPI Application specific tools and Web Services Application specific tools and Web Services Application specific tools and Web Services WS WS ToolToolWS All user interaction via web interface User data stored in the Cloud Data for all tools and Web Services stored in the Cloud Unified access to different workflow engines with our common REST API Tools and Web Services for each workflow are installed together for easy replication
  • 29. Orchestrating workflows in the Cloud Input Workflow Data store Find virtual machine for this workflow Is one running? Start one Is there space on it? Wait until ready Run workflow Yes No Yes No Delete run Is this instance empty? Done Terminate it Yes No Status updates
  • 30. The user’s view • Curated set of workflows – Designed, built and tested by domain experts – Quality assurance tested (if appropriate) • Workflows are presented as applications – The workflows themselves are hidden – Configured and run via a web interface • All user data stored securely in the Cloud – User separation • Workflows as a Service
  • 33. Web interface: Checking run progress
  • 35. The user’s view • “Science”, “Tools”, “Applications”, “Data” – Not workflows – Not infrastructure • But they ALL have workflows – On paper – In PowerPoint – In scripts – Run “by hand” – Too personal/specific – cannot share them – “Works on my machine”
  • 36. Workflow as a Service • The workflow IS the service – Users do not see the Workflows – Run restricted sets of Taverna workflows in the cloud • Connects to other cloud based resources – storage, tools, etc. • Scale everything behind the scenes – Users can tweak parameters, but not design their own – Web portal access for scientists – Data passed by reference instead of by file – Pay as you go – cheap at the point of use
  • 37. Supporting end-users • Make it easy – Automate workflows they are already using – Don’t get in the way of the science – Hide the infrastructure where possible • But it is really hard – So much has to be co-ordinated – Scale everything – Stay secure
  • 38. Acknowledgements/Partners • University of Manchester • Cardiff University • European Commission 7th Framework Programme – 283359 - BioVeL • Eagle Genomics • Technology Strategy Board – 100932 - Cloud Analytics for Life Sciences • National Health Service • Amazon Web Services
  • 39. Thanks • myGrid Team – Carole Goble (PI) – Shoaib Sufi – Alan Williams – Katy Wolstencroft • CA4LS – Abel Ureta-Vidal (PI) – Mike Cornell – Madhu Donapudi – Helen Hulme – Nick James • BioVeL – Alex Hardisty (PI) – Renato De Giovanni – Jonathan Giddy – Norman Morrison – Abraham Nieva de la Hidalga – Matthias Obst – Maria Paula Balcazar Vargas – Elisabeth Paymal – Hannu Saarenmaa …and many, many more…