Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

2011 03-provenance-workshop-edingurgh


Published on

Linked Data + provenance requirements from #wf4ever is now online

  • Be the first to comment

  • Be the first to like this

2011 03-provenance-workshop-edingurgh

  1. 1. Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford
  2. 2. Technological infrastructure for the preservation and efficientretrieval and reuse of scientific workflows in a range of disciplines
  3. 3. Packaging, preserving and publishing
  4. 4. Astronomy Use Case: A Repeaters Story● Dealing with big amounts of tabular data● A lot of small scripts to avoid creating blackbox process● Local resource sharing, public access only after publication● Data must be frequently updated from external data repositories● Data updates must be tested before being executed● Data must be locally stored with versioning● “... we dont like to spread [the tasks] and lose controls who is doing what ...”
  5. 5. Research Objectshttp:/ ● Aggregation – Pointers or literals of internal and external content; ● Identity –Equivalence, equality; ● Metadata – A reusable object; ● Lifecycle – Stages of development. Impacts on available functionality; ● Versioning – Recording changes; ● Security – Access, authentication, ownership, trust; ● Graceful Degradation of Understanding – Opaque RO domain content. ● Mixed stewardship ● Provenance ROs are Content Aware Objects ● Of compound objects that bundle things together ● Of evolutions ● Of dynamic objects and static objects
  6. 6. Biology Use Case: A Reusers Story● Takes a set of genes from gene experiment results performed by others, as read in a scientific paper● Perform dry analysis to understand which genes and which biological processes were disturbed by which chemical compounds ● basic affymetrix data processing ● statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds) ● find those pathways that are most prominent among the filtered genes
  7. 7. Biology Use Case: A Reusers Story● Search for existing experiments from myExperiment (● Challenge: Understand the workflow ● Perform test runs with test data and his own data ● Read others logs ● Read annotations to workflows● Reuse scripts from colleagues and perform tests that his colleagues are familiar with
  8. 8. How Can It be Supported?● A reference to the source of the data and the people to acknowledge for it.● The initial hypothesis● The conceptual workflow or a summary of the experiment plan● References to workflows that were tested, with comments on their application for the users use case● The workflow of the users, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments)● The runs of the users own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. here I used parameter A, next time I may try B)● The final hypothesis, with comments.● A reference to the results of the workflow● Design logs that record the users considerations while making the workflow● Run logs that record the users considerations while running and interpreting the workflow
  9. 9. Where is Linked Data?
  10. 10. The Role of Linked Data in Wf4Ever● Collaborative science● Dynamic science● Open science
  11. 11. Provenance Challenge● Identity● Context● Storage● Retrieval
  12. 12. Take home● Provenance should be user-driven● Linked Data should be a means to an end●
  13. 13. Acknowledgement● Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain)● Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain)● Hui Hua and Jenny Molly of University of Oxford (UK)