Provenance in the Dynamic, Collaborative New                  Science                    Dr Jun Zhao               Departm...
Technological infrastructure for the preservation and efficientretrieval and reuse of scientific workflows in a range of d...
Packaging, preserving and publishing
Astronomy Use Case:     A Repeaters Story●   Dealing with big amounts of tabular    data●   A lot of small scripts to avoi...
Research Objectshttp:/www.wf4ever-project.org                                       ●                                     ...
Biology Use Case: A Reusers Story●   Takes a set of genes from gene experiment results    performed by others, as read in ...
Biology Use Case: A Reusers Story●   Search for existing experiments from    myExperiment (http://myexperiment.org)●   Cha...
How Can It be Supported?●   A reference to the source of the data and the people to acknowledge for it.●   The initial hyp...
Where is Linked Data?
The Role of Linked Data in Wf4Ever●   Collaborative science●   Dynamic science●   Open science
Provenance Challenge●   Identity●   Context●   Storage●   Retrieval
Take home●   Provenance should be user-driven●   Linked Data should be a means to an end●   http://www.wf4ever-project.org
Acknowledgement●   Marco Roos of Leiden Unveristy (NL) and Jose    Enrique Ruiz of Instituto de Astrofísica de    Andalucí...
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
Upcoming SlideShare
Loading in …5
×

2011 03-provenance-workshop-edingurgh

1,675
-1

Published on

Linked Data + provenance requirements from #wf4ever is now online

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,675
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2011 03-provenance-workshop-edingurgh

  1. 1. Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford jun.zhao@zoo.ox.ac.uk
  2. 2. Technological infrastructure for the preservation and efficientretrieval and reuse of scientific workflows in a range of disciplines
  3. 3. Packaging, preserving and publishing
  4. 4. Astronomy Use Case: A Repeaters Story● Dealing with big amounts of tabular data● A lot of small scripts to avoid creating blackbox process● Local resource sharing, public access only after publication● Data must be frequently updated from external data repositories● Data updates must be tested before being executed● Data must be locally stored with versioning● “... we dont like to spread [the tasks] and lose controls who is doing what ...”
  5. 5. Research Objectshttp:/www.wf4ever-project.org ● Aggregation – Pointers or literals of internal and external content; ● Identity –Equivalence, equality; ● Metadata – A reusable object; ● Lifecycle – Stages of development. Impacts on available functionality; ● Versioning – Recording changes; ● Security – Access, authentication, ownership, trust; ● Graceful Degradation of Understanding – Opaque RO domain content. ● Mixed stewardship ● Provenance ROs are Content Aware Objects ● Of compound objects that bundle things together ● Of evolutions ● Of dynamic objects and static objects
  6. 6. Biology Use Case: A Reusers Story● Takes a set of genes from gene experiment results performed by others, as read in a scientific paper● Perform dry analysis to understand which genes and which biological processes were disturbed by which chemical compounds ● basic affymetrix data processing ● statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds) ● find those pathways that are most prominent among the filtered genes
  7. 7. Biology Use Case: A Reusers Story● Search for existing experiments from myExperiment (http://myexperiment.org)● Challenge: Understand the workflow ● Perform test runs with test data and his own data ● Read others logs ● Read annotations to workflows● Reuse scripts from colleagues and perform tests that his colleagues are familiar with
  8. 8. How Can It be Supported?● A reference to the source of the data and the people to acknowledge for it.● The initial hypothesis● The conceptual workflow or a summary of the experiment plan● References to workflows that were tested, with comments on their application for the users use case● The workflow of the users, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments)● The runs of the users own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. here I used parameter A, next time I may try B)● The final hypothesis, with comments.● A reference to the results of the workflow● Design logs that record the users considerations while making the workflow● Run logs that record the users considerations while running and interpreting the workflow
  9. 9. Where is Linked Data?
  10. 10. The Role of Linked Data in Wf4Ever● Collaborative science● Dynamic science● Open science
  11. 11. Provenance Challenge● Identity● Context● Storage● Retrieval
  12. 12. Take home● Provenance should be user-driven● Linked Data should be a means to an end● http://www.wf4ever-project.org
  13. 13. Acknowledgement● Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain)● Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain)● Hui Hua and Jenny Molly of University of Oxford (UK)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×