At "Metagenomics, metagenetics and Pylogenetic workflows for Ocean Sampling Day" Workshop
Max Planck Institute for Marine Microbiology, Bremen, Germany 2013-03-21
For PPTX source - download http://www.wf4ever-project.org/wiki/download/attachments/2064544/2013-03-21-OSD-Bremen-Stian-What+can+provenance+do+for+me.pptx
Handwritten Text Recognition for manuscripts and early printed texts
2013-03-21 What can provenance do for me?
1. What can provenance
do for me?
Stian Soiland-Reyes
myGrid, University of Manchester
This work is licensed under a Ocean Sampling Day planning Bremen 2013-03-21
Creative Commons Attribution 3.0 Unported License
2. Provenance of Stian Soiland-Reyes
• Developer/researcher in myGrid team, School of Computer
Science, University of Manchester since 2006
• Involved with:
• Taverna - Scientific workflow system
What can provenance do for me?
• myExperiment – sharing workflows and artefacts
• Wf4Ever - digital preservation (of workflows and workflow runs)
• W3C Provenance WG – standards for describing provenance
• Open Annotation – standard for tracking who said what about
something
2
http://soiland-reyes.com/stian/work/
3. Overview
What is provenance?
• Attribution
• Derivation
What can provenance do for me?
• Activities
• PROV model
Aggregating and sharing
Why you want provenance 3
4. What is provenance? Attribution
who did it?
Abstraction levels Activity
shallots, sign, photo or flickr page? what happens to it?
Date and tool
when was it made?
using what?
Derivation Origin
how did it change? where is it from? Aggregation
what is it part of?
Annotations Attributes
what do others say about it? what is it?
Licensing 4
can I use it?
By Dr Stephen Dann
licensed under Creative Commons Attribution-ShareAlike 2.0 Generic
http://www.flickr.com/photos/stephendann/3375055368/
5. Attribution
actedOnBehalfOf
The
Alice
lab
• Who collected this sample? Who helped?
• Which lab performed the sequencing? wasAttributedTo
• Who did the data analysis?
Data
• Who curated the results?
•
What can provenance do for me?
Who produced the raw data this analysis is based on?
• Who wrote the analysis workflow?
Why do I need this?
Roles Agent types
i. To be recognized for my work prov:wasAttributedTo Person
prov:actedOnBehalfOf Organization
ii. Who should I give credits to? dct:creator SoftwareAgent
dct:publisher
iii. Who should I complain to? pav:authoredBy
pav:contributedBy
iv. Can I trust them? pav:curatedBy
pav:createdBy
5
pav:importedBy
v. Who should I make friends with? pav:providedBy
...
6. Sample
Derivation wasDerivedFrom
• Which sample was this metagenome sequenced from?
Meta -
• Which meta-genomes was this sequence extracted from? genome
• Which sequence was the basis for the results?
• What is the previous revision of the new results? wasQuotedFrom
What can provenance do for me?
Sequence
Why do I need this?
i. To verify consistency (did I use wasInfluencedBy
wasDerivedFrom
the correct sequence?)
ii. To find the latest revision Old
wasRevisionOf
New
iii. To backtrack where a diversion results results
appeared after a change
iv. To credit work I depend on 6
v. Auditing and defence for peer review
7. Lab
Alice
technician
Activities
Sample
hadRole
wasAssociatedWith
used
• What happened? When? Who? "2012-06-21"
Sequencing
• What was used and generated? wasStartedAt
• Why was this workflow started? wasGeneratedBy wasInformedBy
• Which workflow ran? Where? Metagenome Workflow
What can provenance do for me?
server
wasStartedBy
Why do I need this?
wasAssociatedWith
i. To see which analysis was performed
Workflow
ii. To find out who did what run hadPlan
iii. What was the metagenome
used for? wasGeneratedBy
iv. To understand the whole process Workflow
“make me a Methods section” Results definition
Results 7
v. To track down inconsistencies
9. How to find provenance data
has_provenance
resource
provenance
• Tracking provenance data has_query_service
• Querying provenance “What was derived from X?”
What can provenance do for me?
• Pingback of provenance “Here’s new provenance
data about X”
Provenance
service
Why do I need this?
i. To propagate provenance data (e.g. when integrating data)
ii. To include external provenance (e.g. for reference datasets)
iii. To avoid black-box provenance (e.g. in workflows)
iv. To merge provenance at different abstraction levels 9
v. To see what has used the data (“Has someone done the analysis?”)
http://www.w3.org/TR/prov-aq/
11. Gathering everything
• Research Objects (RO) aggregate related resources, their
provenance and annotations
• Conveys “everything you need to know” about a
study/experiment/analysis/dataset/workflow
• Shareable, evolvable, contributable, citable
What can provenance do for me?
• ROs have their own provenance and lifecycles
Hypothesis Provenance
Raw data
aggregates
Research
Object Annotations
Workflow
Analysis tools 11
Results http://purl.org/wf4ever/model
Paper Reference literature
12. Research Objects
Hypothesis Provenance
Raw data
aggregates
Research
Object Annotations
Workflow
Analysis tools
Results
What can provenance do for me?
Paper Reference literature
Why do I need them?
i. To share your research materials (RO as a social object)
ii. To facilitate reproducibility and reuse of methods
iii. To be recognized and cited (even for constituent resources)
iv. To preserve results and prevent decay (curation of
workflow definition; using provenance for partial rerun)
12
14. Why you want provenance
i. To acknowledge sources you have based your work on
ii. Receive credit when others uses your work
iii. Build trust (who did it?) and verify consistency (was it done
correctly?)
iv. To audit and defend for peer review
What can provenance do for me?
v. Keep track of resources that change over time (versioning)
vi. Investigate and compare data (where did that strange value
come from?)
vii. Gather everything you need for that Methods section
viii. Facilitate reproducibility by tracking activities and their
outcomes
ix. To prevent decay by aggregating related resources and their
descriptions 14
15. Thank you
Questions?
What can provenance do for me?
Twitter: @soilandreyes
Skype: soiland
http://soiland-reyes.com/stian/work/
15
http://www.wf4ever-project.org/