2013-03-21 What can provenance do for me?


Published on

At "Metagenomics, metagenetics and Pylogenetic workflows for Ocean Sampling Day" Workshop
Max Planck Institute for Marine Microbiology, Bremen, Germany 2013-03-21

For PPTX source - download http://www.wf4ever-project.org/wiki/download/attachments/2064544/2013-03-21-OSD-Bremen-Stian-What+can+provenance+do+for+me.pptx

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2013-03-21 What can provenance do for me?

  1. 1. What can provenance do for me? Stian Soiland-Reyes myGrid, University of ManchesterThis work is licensed under a Ocean Sampling Day planning Bremen 2013-03-21Creative Commons Attribution 3.0 Unported License
  2. 2. Provenance of Stian Soiland-Reyes• Developer/researcher in myGrid team, School of Computer Science, University of Manchester since 2006• Involved with: • Taverna - Scientific workflow system What can provenance do for me? • myExperiment – sharing workflows and artefacts • Wf4Ever - digital preservation (of workflows and workflow runs) • W3C Provenance WG – standards for describing provenance • Open Annotation – standard for tracking who said what about something 2 http://soiland-reyes.com/stian/work/
  3. 3. OverviewWhat is provenance?• Attribution• Derivation What can provenance do for me?• Activities• PROV modelAggregating and sharingWhy you want provenance 3
  4. 4. What is provenance? Attribution who did it? Abstraction levels Activity shallots, sign, photo or flickr page? what happens to it? Date and tool when was it made? using what? Derivation Origin how did it change? where is it from? Aggregation what is it part of? Annotations Attributes what do others say about it? what is it? Licensing 4 can I use it?By Dr Stephen Dannlicensed under Creative Commons Attribution-ShareAlike 2.0 Generichttp://www.flickr.com/photos/stephendann/3375055368/
  5. 5. Attribution actedOnBehalfOf The Alice lab• Who collected this sample? Who helped?• Which lab performed the sequencing? wasAttributedTo• Who did the data analysis? Data• Who curated the results?• What can provenance do for me? Who produced the raw data this analysis is based on?• Who wrote the analysis workflow?Why do I need this? Roles Agent typesi. To be recognized for my work prov:wasAttributedTo Person prov:actedOnBehalfOf Organizationii. Who should I give credits to? dct:creator SoftwareAgent dct:publisheriii. Who should I complain to? pav:authoredBy pav:contributedByiv. Can I trust them? pav:curatedBy pav:createdBy 5 pav:importedByv. Who should I make friends with? pav:providedBy ...
  6. 6. Sample Derivation wasDerivedFrom• Which sample was this metagenome sequenced from? Meta -• Which meta-genomes was this sequence extracted from? genome• Which sequence was the basis for the results?• What is the previous revision of the new results? wasQuotedFrom What can provenance do for me? SequenceWhy do I need this?i. To verify consistency (did I use wasInfluencedBy wasDerivedFrom the correct sequence?)ii. To find the latest revision Old wasRevisionOf Newiii. To backtrack where a diversion results results appeared after a changeiv. To credit work I depend on 6v. Auditing and defence for peer review
  7. 7. Lab Alice technician Activities Sample hadRole wasAssociatedWith used• What happened? When? Who? "2012-06-21" Sequencing• What was used and generated? wasStartedAt• Why was this workflow started? wasGeneratedBy wasInformedBy• Which workflow ran? Where? Metagenome Workflow What can provenance do for me? server wasStartedByWhy do I need this? wasAssociatedWithi. To see which analysis was performed Workflowii. To find out who did what run hadPlaniii. What was the metagenome used for? wasGeneratedByiv. To understand the whole process Workflow “make me a Methods section” Results definition Results 7v. To track down inconsistencies
  8. 8. Core PROV model Provenance Working Group What can provenance do for me? Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. 8 http://www.w3.org/TR/prov-primer/
  9. 9. How to find provenance data has_provenance resource provenance• Tracking provenance data has_query_service• Querying provenance “What was derived from X?” What can provenance do for me?• Pingback of provenance “Here’s new provenance data about X” Provenance serviceWhy do I need this?i. To propagate provenance data (e.g. when integrating data)ii. To include external provenance (e.g. for reference datasets)iii. To avoid black-box provenance (e.g. in workflows)iv. To merge provenance at different abstraction levels 9v. To see what has used the data (“Has someone done the analysis?”) http://www.w3.org/TR/prov-aq/
  10. 10. Let’s talk about it Open Annotation Data Model What can provenance do for me? Copyright © 2012-2013 the Contributors to the Open Annotation Core Data Model Specification, published by the Open Annotation Community Group under the W3C Community Contributor License Agreement (CLA).• The body is somewhat about or related to the target• Provenance: Who said that? When? Why?• E.g. describing, commenting, highlighting, bookmarking, 10 tagging, classifying, identifying http://www.openannotation.org/spec/core/
  11. 11. Gathering everything • Research Objects (RO) aggregate related resources, their provenance and annotations • Conveys “everything you need to know” about a study/experiment/analysis/dataset/workflow • Shareable, evolvable, contributable, citable What can provenance do for me? • ROs have their own provenance and lifecycles Hypothesis Provenance Raw data aggregates Research Object Annotations WorkflowAnalysis tools 11 Results http://purl.org/wf4ever/model Paper Reference literature
  12. 12. Research Objects Hypothesis Provenance Raw data aggregates Research Object Annotations Workflow Analysis tools Results What can provenance do for me? Paper Reference literatureWhy do I need them?i. To share your research materials (RO as a social object)ii. To facilitate reproducibility and reuse of methodsiii. To be recognized and cited (even for constituent resources)iv. To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun) 12
  13. 13. myExperiment Research Objects What can provenance do for me? 13
  14. 14. Why you want provenancei. To acknowledge sources you have based your work onii. Receive credit when others uses your workiii. Build trust (who did it?) and verify consistency (was it done correctly?)iv. To audit and defend for peer review What can provenance do for me?v. Keep track of resources that change over time (versioning)vi. Investigate and compare data (where did that strange value come from?)vii. Gather everything you need for that Methods sectionviii. Facilitate reproducibility by tracking activities and their outcomesix. To prevent decay by aggregating related resources and their descriptions 14
  15. 15. Thank you Questions? What can provenance do for me? Twitter: @soilandreyes Skype: soiland http://soiland-reyes.com/stian/work/ 15 http://www.wf4ever-project.org/