Your SlideShare is downloading. ×
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Taverna workflow management system (2010 11-30 Bath Workflow Tools)


Published on

Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30 …

Presentation of Taverna from UKOLN DevSci "Workflow Tools" event in Bath, 2010-11-30

PPTX version:

Published in: Technology

1 Comment
1 Like
  • The            setup            in            the            video            no            longer            works.           
    And            all            other            links            in            comment            are            fake            too.           
    But            luckily,            we            found            a            working            one            here (copy paste link in browser) :  
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath, 2010-11-30Gridmy
  • 2. Gridmy What is myGrid?  An e-Science Collaboration Since 2001  Not a grid!  Numerous partners involved:  University of Manchester  University of Southampton  University of Oxford  EMBL-EBI  Provides sustainable and production quality software  Supported by OMII-UK, EPSRC and BBSRC  Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community
  • 3. Gridmy Motivation  Challenge: Bioinformatics  Large amounts of data  Many open questions  Numerous freely available public datasets and analysis tools
  • 4. Gridmy Huge amounts of data 100+ Genes QTL regions Microarray 1000+ Genes Next Gen Sequencing 10,000+ Genes How do I look at all the genes systematically?
  • 5. Gridmy Manual approach  Search using public web sites and databases  Pubmed  Uniprot  EBI BioMart  Copy and paste to web tools for analysis  NCBI Blast  EBI InterPro  Further processing locally  R  Perl  Python
  • 6. Gridmy Manual: disadvantages • Scale of analysis task overwhelms researchers – lots of data • User bias and premature filtering of datasets – cherry picking • Hypothesis-Driven approach to data analysis • Constant changes in data - problems with re- analysis of data • Implicit methodologies (hyper-linking through web pages) • Error proliferation from any of the listed issues – notably human error
  • 7. Gridmy Web services and workflows  Web services  Technology and standards for exposing code and data resources that can be programmatically consumed by a remote third party  Description on how to interact with the service, parameters, documentation  Workflows  General technique for describing and executing a process  Describe what you want to do running which services
  • 8. Gridmy Taverna workflows  A set of (local and remote) services to analyze or manage data  Nested workflows are also services  Data-links connects services  i.e. output from service A is input to service B and C  Describes the desired dataflow instead of process coordination  Automatic iterations  Can customize list handling and control links Get_pathways Workflow Inputs Workflow Outputs Workflow Inputs Workflow Outputs remove_uniprot_duplicates merge_uniprot_ids species getcurrentdatabase kegg_pathway_release binfo regex_2 split_for_duplicates split_for_duplicate_pathways remove_duplicate_kegg_genes merge_genes_and_pathways_3 flatten_pathway_files merged_pathways merge_genes_and_pathways merge_genes_and_pathways_2 merge_kegg_references kegg_external_gene_reference remove_pathway_duplicates merge_pathway_desc merge_pathway_list_1 merge_pathway_list_2 remove_duplicate_ids merge_patwhay_ids pathway_descriptions merge_reports report merge_gene_desc remove_nulls_3 gene_descriptions gene_ids REMOVE_NULLS_2 remove_entrez_duplicates merge_entrez_genes remove_pathway_nulls remove_Nulls concat_kegg_genes split_gene_ids remove_pathway_nulls_2 add_uniprot_to_string gene_descriptions pathway_descriptions add_ncbi_to_string Kegg_gene_ids_2 pathway_ids Kegg_gene_ids genes_in_qtl mmusculus_gene_ensembl create_report ensembl_database_releasegenes_pathways kegg_pathway_release Merge_pathways concat_ids pathway_ids regex split_by_regex lister Merge_gene_pathways pathway_genes concat_gene_pathway_ids get_pathways_by_genes1 chromosome_namestart_position end_position
  • 9. Gridmy What types of services?  Public/private/secured WSDL/SOAP web services  RESTful web services  Spreadsheet import  Command line tools (local/ssh)  Inline scripts (Beanshell, R)  Java APIs  Customizations:  BioMart, BioMoby / SADI  Soaplab  Grid services (Globus, EGEE gLite, caGrid)  … your tool (Plugin tutorial on wiki)
  • 10. Gridmy Which services?  Taverna is general, can connect to standard web services for any domain  Bioinformatics:  From professional third-party organisations providing robust & open data/analysis services  under-the-desk web services for one particular purpose, ran by PhD students   - 1730 services from 130 providers – crowd sourced and quality monitored
  • 11. Gridmy
  • 12. Gridmy Taverna workbench  Graphical desktop tool  No server installation required  Drag-and-drop services into diagram  Connect services, run, reconnect, rerun  Integrates diverse set of tools
  • 13. Gridmy
  • 14. Gridmy
  • 15. Gridmy
  • 16. Gridmy Sharing workflows  allows users to share, find, download and rate workflows  “Facebook for the scientist”  3000 members, 1100 workflows
  • 17. Gridmy Extensible UI and engine  Plugins can provide new “perspectives”  i.e.: BioCatalogue, myExperiment  Provide service-specific customization  BioMart interface replicates web site  Adding new functionality  Looping, branching, dynamic service resolution  New service types  Design helpers, “Find matching service”
  • 18. Gridmy Taverna 3 “Next-gen”  Under development for 2011  Interactive, component-centric and data-centric workflow design  Pre-packaged workflow components  Searching for workflow components from BioCatalogue and myExperiment  New myGrid workflow components library
  • 19. Gridmy Taverna command line  Executes from a Windows/Linux/OSX shells  Takes a predefined workflow with files as inputs and outputs  Quick way to “productionize” a workflow
  • 20. Gridmy Taverna Server  REST/SOAP interface to execute workflows  Client libraries for Ruby and Java  Two demonstration web interfaces  Ruby  Java Portlets  Future  Detailed execution support and control  Security delegation
  • 21. Gridmy Taverna portlet  Example portlet implementation  Executes workflows using Taverna Server
  • 22. Gridmy
  • 23. Gridmy Ruby web interface  Example customized web interface  Uses Ruby gem t2-server
  • 24. Gridmy Taverna on the cloud  Use-case:  SNP analysis and annotation of genome sequenced from breeds of cows in Africa – why are some of them resistent to X?  Amazon EC2 with Taverna Server and local services  Custom (built-in-a-week) Ruby on Rails web interface  Runs through 31 chromosomes in 6.5 hours using 10 instances - $26
  • 25. Gridmy
  • 26. Gridmy Open source, open development  Taverna suite of tools are all open source and free to use  Large user community, active mailing lists  Lead developers: myGrid in Manchester  Contributors from across the world  PAL programme  myGrid provides training, tutorials and documentation
  • 27. Gridmy Acknowledgements
  • 28. Gridmy
  • 29. Gridmy More information    