Your SlideShare is downloading. ×
0
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013

242

Published on

Slides on Taverna www.tvaerna.org.uk from the talk given at STFC/NERC workshop "Workflow approaches to investigation of biological complexity", 15-16 October 2013.

Slides on Taverna www.tvaerna.org.uk from the talk given at STFC/NERC workshop "Workflow approaches to investigation of biological complexity", 15-16 October 2013.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
242
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • {"11":"http://purl.org/wf4ever/model\nResearch Objects (RO) aggregate related resources, their provenance and annotations\nConveys “everything you need to know” about a study/experiment/analysis/dataset/workflow\nShareable, evolvable, contributable, citable ROs have their own provenance and lifecycles\n","12":"Hosted resource – no installation tears\nSelf-hosting distribution – locality fears \nServices and/or workflow engine hosted locally or remotely\nHPC/cloud installations avoid cost of local installations on local infrastructures. Some like the comfort of local ownership.\nDeployment Infrastructure of BioVeL\n","1":"Title: Time well spent: Workflows for Environmental Omic Analysis.\nThe contextual analysis of Environmental Omics data is computationally intensive (involving the processing and management of large quantities of data), highly integrative (spanning data from many different disciplines) and rapidly evolving (involving the continuous development of novel methods and technologies). This poses a number of challenges for researchers in the field, including access to appropriate infrastructure, taking advantage of recent advancements and communicating research activities.\nScientific Workflow Management Systems, such as the Taverna Workflow Suite, are a particular class of computer application that manage the design, configuration and execution of repetitive, multi-step analysis processes that are particularly prevalent in Environmental Omics. The system handles the awkward work of accessing the different software and platforms, managing the data and security, handling errors and documenting the process.\nUtilising HPC or cloud installations of Taverna also means that there is no requirement to install tools and data sources locally, which reduces local infrastructure and maintenance costs and enables rapid workflow development and testing. Consequently, large-scale analyses can be performed regardless of local infrastructure.\n The Taverna Workflow Suite is currently powering the Biodiversity Virtual eLaboratory project (www.biovel.eu), the project is beginning to release a number of useful Environmental Omic workflows in collaboration with Genomic Observatories (http://genomicobservatories.blogspot.co.uk/) and MicroB3 (Ocean Sampling Day. http://www.microb3.eu/news/new-axis-collaboration-biovel-workflows-micro-b3-ocean-sampling-day).\nThis talk will discuss aspects of workflows and the benefits that adopting workflows as an integral part of Environmental Omics analysis can offer to the community including, reproducibility, knowledge exchange and easier access to high performance infrastructure.\n","14":"2001, run by manchester and oxford\n"}
  • Transcript

    • 1. Taverna workflows: provenance and reproducibility Aleksandra Pawlik The University of Manchester Workflow approaches to investigation of biological complexity STFC/NERC Workshop 15-16 October 2013
    • 2. Workflows for improvement Workflows are more than just pipelines… Scaling up automated execution Bringing together distributed and continually changing resources Dealing with different standards, interfaces and implementation Support for repeatable analysis
    • 3. Taverna Engine Execution Workflows in Scufl2  Functional dataflow, simple control flows, implicit  iteration Linking services and tools  Different data resources and formats  “In Workflow Programming” (eg. Beanshell scripting)  Provenance collection: W3C PROV-O, OPM  Plug-in Framework     Infrastructures: Web Services (SOAP, REST), Grid, HPC Common Tools: Excel Spreadsheets, Google Refine, R OAuth security plug-in
    • 4. Taverna Workbench • Customizable for domains (eg. expose services only for biodiversity) • Desktop application • Intermediate results views • Plug-in framework List of services Workflow engine to run workflows Construct and visualise workflows
    • 5. Taverna User Spectrum Taverna Concept Knowledge Workflow Engineer Workbench Computational Scientist Workbench Components Lite Domain Domain Scientist (Workflow User) Domain-Specific Website / Tool / Portal Player High Workflow Visibility Low
    • 6. reuse   Right apps, right users Commodity apps:          Web. Spreadsheets. R. Customisation Mixed workflow / scripting Deployment / Portability  Apps Apps Web based / desktop Virtualised deployments Cloud hosted service A cloud-enabled local host Local ownership Capability building Workflow Workflow WFMS WFMS middleware middleware Infrastructure Infrastructure Domain/task specific apps that incorporate (an ecosystem of) workflows. Integrate Parameterised, integrative, multi-step (data) pipelines, analytics, computational protocols. Can be repetitively reused. Support design, config. and execution of workflows. manage utility actions for data, logging, security, compute, error. Shield incompatibilities & complexity. Legacy, others and your own software, datasets, services, codes, and platforms. Optimise and manage use of computing infrastructure.
    • 7. Reuse and Reproducibility
    • 8. ~6,000 members over 300 groups, over 3,000 workflows
    • 9. Taverna Components Workflow Blocks made of a workflow  Well described  Well behaved  Well looked after  Agreed fail  Agreed formats in and out  Agreed provenance Deposited in myExperiment Grouped into families
    • 10. Provenance: how did you do it?  The link between computation and results d1  Reporting at different scales/ levels d2 S1 S0 S1 w -> Using Provenance d1' S0  Collecting d2 z w S2 S'2 y y' S4 S4 df df' (i) Trace A (ii) Trace B PDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]
    • 11. Research Objects http://www.researchobject.org/ http://www.w3.org/community/rosc/ bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
    • 12. Taverna in Galaxy Wrap as Tool Tools Workflow in Upload Galaxy execution Taverna server
    • 13. The Taverna Suite of Tools Workflow Repository User Interfaces Workbench Service Catalogue Workflow Engine Workflow Provenance Activity and Service Plug-in Manager Taverna Lite Workflow Server Web Portals / Gateways Client User Interfaces Third Party Tools Player Virtual Machine Workflow Components Command Interaction Line Server Prog APIs
    • 14. Sustainability and user support Freely available Open source Current version 2.4 80,000+ downloads across version Windows/Mac OS X/ Linux/Unix Tutorials and Workshops Active user forum & support www.taverna.org.uk
    • 15. Taverna in other projects BioDiversity Virtual e-Laboratory www.biovel.eu SCAPE www.scape-project.eu Wf4Ever www.wf4ever-project.org VPH-Share www.vph-share.eu HELIO www.helio-vo.eu iPlant Collaborative www.iplantcollaborative.or g HELIO www.helio-vo.eu Pacific Northwest National Laboratory www.pnnl.gov KBase www.kbase.us Scientific Workflows and Provenance Working Group www.dataone.org SHIWA www.shiwa-workflow.eu
    • 16. Products Methods Data-centric Computation Scientific workflows over Distributed Cyber-Infrastructure. Data sharing libraries and catalogues for all types of scientific artefacts and all types of scientists. Knowledge Management Metadata, semantics digital exchange, preservation, publishing Software Engineering Software sustainability, software and data policy, training
    • 17. For more information  Taverna   myExperiment   http://www.taverna.org.uk http://www.myexperiment.org myGrid  http://www.mygrid.org.uk

    ×