Your SlideShare is downloading. ×
  • Like


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Taverna workflow management system (2010 11-30 Bath Workflow Tools)


Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30 …

Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30

PPTX version:

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :  
    Are you sure you want to
    Your message goes here
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK Gridmy UKOLN DevSci: Workflow Tools Bath, 2010-11-30
  • 2. What is myGrid?  An e-Science Collaboration Since 2001  Not a grid!  Numerous partners involved:  University of Manchester  University of Southampton  University of Oxford  EMBL-EBI  Provides sustainable and production quality software  Supported by OMII-UK, EPSRC and BBSRC  Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community Gridmy
  • 3. Motivation  Challenge: Bioinformatics  Large amounts of data  Many open questions  Numerous freely available public datasets and analysis tools Gridmy
  • 4. Huge amounts of data Microarray 1000+ Genes QTL regions 100+ Genes How do I look Next Gen at all the genes systematically? Sequencing 10,000+ Genes Gridmy
  • 5. Manual approach  Search using public web sites and databases  Pubmed  Uniprot  EBI BioMart  Copy and paste to web tools for analysis  NCBI Blast  EBI InterPro  Further processing locally  R  Perl  Python Gridmy
  • 6. Manual: disadvantages • Scale of analysis task overwhelms researchers – lots of data • User bias and premature filtering of datasets – cherry picking • Hypothesis-Driven approach to data analysis • Constant changes in data - problems with re- analysis of data • Implicit methodologies (hyper-linking through web pages) • Error proliferation from any of the listed issues – notably human error Gridmy
  • 7. Web services and workflows  Web services  Technology and standards for exposing code and data resources that can be programmatically consumed by a remote third party  Description on how to interact with the service, parameters, documentation  Workflows  General technique for describing and executing a process  Describe what you want to do running which services Gridmy
  • 8. Taverna workflows Workflow Inputs start_position chromosome_name end_position genes_in_qtl A set of (local and remote) mmusculus_gene_ensembl remove_entrez_duplicates remove_uniprot_duplicates create_report  services to analyze or manage merge_entrez_genes merge_uniprot_ids remove_Nulls REMOVE_NULLS_2 data add_ncbi_to_string add_uniprot_to_string Kegg_gene_ids_2 Kegg_gene_ids concat_kegg_genes  Nested workflows are also split_gene_ids regex_2 split_for_duplicates Get_pathways remove_duplicate_kegg_genes Workflow Inputs services  Data-links connects services regex gene_ids split_by_regex lister get_pathways_by_genes1  i.e. output from service A is input to service B and C Merge_pathways concat_ids concat_gene_pathway_ids Merge_gene_pathways Workflow Outputs  Describes the desired dataflow pathway_genes pathway_ids merge_pathway_list_1 instead of process coordination merge_pathway_list_2 split_for_duplicate_pathways  Automatic iterations  Can customize list handling and remove_duplicate_ids pathway_descriptions control links gene_descriptions merge_genes_and_pathways remove_pathway_duplicates merge_gene_desc merge_genes_and_pathways_2 merge_pathway_desc remove_nulls_3 merge_genes_and_pathways_3 remove_pathway_nulls merge_patwhay_ids species kegg_pathway_release flatten_pathway_files remove_pathway_nulls_2 merge_kegg_references merge_reports getcurrentdatabase binfo Workflow Outputs gene_descriptions genes_pathways merged_pathways pathway_descriptions pathway_ids kegg_external_gene_reference report ensembl_database_release kegg_pathway_releasemy Grid
  • 9. What types of services?  Public/private/secured WSDL/SOAP web services  RESTful web services  Spreadsheet import  Command line tools (local/ssh)  Inline scripts (Beanshell, R)  Java APIs  Customizations:  BioMart, BioMoby / SADI  Soaplab  Grid services (Globus, EGEE gLite, caGrid)  … your tool (Plugin tutorial on wiki) Gridmy
  • 10. Which services?  Taverna is general, can connect to standard web services for any domain  Bioinformatics:  From professional third-party organisations providing robust & open data/analysis services  under-the-desk web services for one particular purpose, ran by PhD students   - 1730 services from 130 providers – crowd sourced and quality monitored Gridmy
  • 11. Gridmy
  • 12. Taverna workbench  Graphical desktop tool  No server installation required  Drag-and-drop services into diagram  Connect services, run, reconnect, rerun  Integrates diverse set of tools Gridmy
  • 13. Gridmy
  • 14. Gridmy
  • 15. Gridmy
  • 16. Sharing workflows  allows users to share, find, download and rate workflows  “Facebook for the scientist”  3000 members, 1100 workflows Gridmy
  • 17. Extensible UI and engine  Plugins can provide new “perspectives”  i.e.: BioCatalogue, myExperiment  Provide service-specific customization  BioMart interface replicates web site  Adding new functionality  Looping, branching, dynamic service resolution  New service types  Design helpers, “Find matching service” Gridmy
  • 18. Taverna 3 “Next-gen”  Under development for 2011  Interactive, component-centric and data-centric workflow design  Pre-packaged workflow components  Searching for workflow components from BioCatalogue and myExperiment  New myGrid workflow components library Gridmy
  • 19. Taverna command line  Executes from a Windows/Linux/OSX shells  Takes a predefined workflow with files as inputs and outputs  Quick way to “productionize” a workflow Gridmy
  • 20. Taverna Server  REST/SOAP interface to execute workflows  Client libraries for Ruby and Java  Two demonstration web interfaces  Ruby  Java Portlets  Future  Detailed execution support and control  Security delegation Gridmy
  • 21. Taverna portlet  Example portlet implementation  Executes workflows using Taverna Server Gridmy
  • 22. Gridmy
  • 23. Ruby web interface  Example customized  Uses Ruby gem web interface t2-server Gridmy
  • 24. Taverna on the cloud  Use-case:  SNP analysis and annotation of genome sequenced from breeds of cows in Africa – why are some of them resistent to X?  Amazon EC2 with Taverna Server and local services  Custom (built-in-a-week) Ruby on Rails web interface  Runs through 31 chromosomes in 6.5 hours using 10 instances - $26 Gridmy
  • 25. Gridmy
  • 26. Open source, open development  Taverna suite of tools are all open source and free to use  Large user community, active mailing lists  Lead developers: myGrid in Manchester  Contributors from across the world  PAL programme  myGrid provides training, tutorials and documentation Gridmy
  • 27. Acknowledgements Gridmy
  • 28. Gridmy
  • 29. More information     Gridmy