2013-05-29 Taverna Provenance (pptx source)


Published on

Slide deck presenting the Provenance support of Taverna workflow system, detailing architecture, ontologies and how results are exported as Research Object bundles, including the PROV-O provenance of the workflow run.

This is the original PPTX version (PowerPoint 2013), for PDF version see http://www.slideshare.net/soilandreyes/20130529-taverna-provenance

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • * Provenance is captured in Taverna by plugging into the execution stack of processors (See Taverna, Reloaded)* While running, data values and provenance traces are stored in internal database. * Provenance is captured for workflow run (including a copy of the workflow definition), process iterations (start/stop) and parameter input/output bindings to value references.
  • PROV-O: Standard W3C ontology for provenance - we use it directly to record activity start/stop and generation of valueswfprov: An extension of PROV-O for tracking workflow execution, parameter bindings and subprocesses. Relates execution to an higher-level view of workflow structure (wfdesc)tavernaprov: An extension of wfprov for tracking Taverna-specific features, such as lists, error documents, and embed (not-so-large) byte content in RDF
  • Within the Taverna Workbench, the internal provenance database is consulted to look up intermediate input/output values (individual processor invocations). This is used for debugging and verification. Workflow runs within the workbench can also be loaded up from the database at a later stage.
  • Saving workflow results to a folder structureGenerates a file per value in the workflow outputs, named after portsNested folders for list outputs.Provenance trace (in RDF according using the ontology stack) relates output files to execution, links to intermediate values. (Values between processors who did not make it to an output port).
  • The structure is similar to the folder, but is now inside a ZIP file. Augments the previous structure by also including the workflow definition, the inputs used for execution, a description of the execution environment, external URI references (such as the project homepage) and attribution to scientists who contributed to the bundle. This effectively forms a Research Object, all tied together by the RO Bundle Manifest, which is in JSON-LD format. (normal JSON that is also valid RDF).
  • 2013-05-29 Taverna Provenance (pptx source)

    1. 1. TAVERNAPROVENANCEStian Soiland-Reyes, University of Manchesterhttps://github.com/wf4ever/taverna-provThis work is licensed under aCreative Commons Attribution 3.0 UnportedLicense2013-05-29
    2. 2. ARCHITECTUREProvenanceWorkflowWorkflow runProcess run (iteration)Parameter bindingsDataListsValuesReferencesErrorsProcess1portA B CD EProcess2portA B CD EInvokeRetryFailoverLoopError bounceProvenanceParalleliseProcessordispatch stacklayer injected by pluginP Missier, S Soiland-Reyes, S Owen, W Tan, A Nenadic, I Dunlop, CGoble: (2010, January). Taverna, reloaded. In Scientific andStatistical Database Management (pp. 471-481). Springer BerlinHeidelberg. DOI 10.1007/978-3-642-13818-8_33captures provenance traceWorkflow execution
    3. 3. ONTOLOGY STACKtavernaprov• Lists, errors, byte content, checksumswfprov + wfdesc• Workflow execution, parameters, processesPROV-O• Activity start/stop, generation of valueshttp://purl.org/wf4ever/wfprov#http://www.w3.org/ns/prov-o#http://ns.taverna.org.uk/2012/tavernaprov/
    4. 4. INTERMEDIATE RESULTS• Within the Taverna Workbench, the provenance database isused for showing intermediate results and previous runsClicking a processorInputs and outputs of individual invocations
    5. 5. WORKFLOW RESULTS (FOLDER)workflowrun.prov.ttl(RDF)outputA.txtoutputC.jpgoutputB/Folder structureintermediates/1.txt2.txt3.txtde/def2e58b-50e2-4949-9980-fd310166621a.txtWorkflow outputs, one file per valueProvenance traceValues from intermediate steps in workflow
    6. 6. WORKFLOW RESULTS (BUNDLE)workflowrun.prov.ttl(RDF)outputA.txtoutputC.jpgoutputB/https://w3id.org/bundleintermediates/1.txt2.txt3.txtde/def2e58b-50e2-4949-9980-fd310166621a.txt .ro/manifest.jsoninputA.txtworkflowURIreferencesattributionexecutionenvironmentAggregating in Research ObjectZIP folder structure (RO Bundle)mimetypeapplication/vnd.wf4ever.robundle+zip
    7. 7. ACKNOWLEDGEMENTS• Paolo Missier – initial provenance engine for Taverna 2• Ian Dunlop – provenance capture execution layer• Khalid Belhajjame – ontologies• Alexandra Nenadic – intermediates, folder structure• W3C Provenance working group – PROV-O• Funded by European Commission’s 7th FWP FP7-ICT-2007-6270192 and ESPRC platform grant EP/G026238/1
    8. 8. QUESTIONS?Twitter: @soilandreyesSkype: soilandEmail: support@mygrid.org.ukhttp://soiland-reyes.com/stian/work/http://practicalprovenance.wordpress.com/