Provenance in the Dynamic, Collaborative  New  Science Dr Jun Zhao Department of Zoology University of Oxford [email_addre...
 
 
 
Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of ...
Packaging, preserving and publishing
<ul><li>Dealing with big amounts of tabular data
A lot of small scripts to avoid creating blackbox process
Local resource sharing, public access only after publication
Data must be frequently updated from external data repositories
Data updates must be tested before being executed
Data must be locally stored with versioning
“ ... we don't like to spread [the tasks] and lose controls who is doing what ...” </li></ul>Astronomy Use Case: A Repeate...
<ul>Research Objects </ul><ul><li>Aggregation  – Pointers or literals of internal and external content;
Identity  –Equivalence, equality;
Metadata  –  A reusable object;
Upcoming SlideShare
Loading in...5
×

2011 03-provenance-workshop-edingurgh

917

Published on

Present provenance requirements from EU Wf4Ever project at Edinburgh provenance workshop March 2011.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
917
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2011 03-provenance-workshop-edingurgh

  1. 1. Provenance in the Dynamic, Collaborative New Science Dr Jun Zhao Department of Zoology University of Oxford [email_address]
  2. 5. Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines
  3. 6. Packaging, preserving and publishing
  4. 7. <ul><li>Dealing with big amounts of tabular data
  5. 8. A lot of small scripts to avoid creating blackbox process
  6. 9. Local resource sharing, public access only after publication
  7. 10. Data must be frequently updated from external data repositories
  8. 11. Data updates must be tested before being executed
  9. 12. Data must be locally stored with versioning
  10. 13. “ ... we don't like to spread [the tasks] and lose controls who is doing what ...” </li></ul>Astronomy Use Case: A Repeater's Story
  11. 14. <ul>Research Objects </ul><ul><li>Aggregation – Pointers or literals of internal and external content;
  12. 15. Identity –Equivalence, equality;
  13. 16. Metadata – A reusable object;
  14. 17. Lifecycle – Stages of development. Impacts on available functionality ;
  15. 18. Versioning – Recording changes;
  16. 19. Security – Access, authentication, ownership, trust;
  17. 20. Graceful Degradation of Understanding – Opaque RO domain content.
  18. 21. Mixed stewardship
  19. 22. Provenance </li><ul><li>Of compound objects
  20. 23. Of evolutions
  21. 24. Of dynamic objects and static objects </li></ul></ul><ul>ROs are Content Aware Objects <li>that bundle things together </li></ul><ul>http:/www.wf4ever-project.org </ul>
  22. 25. Biology Use Case: A Reuser's Story <ul><li>Takes a set of genes from gene experiment results performed by others, as read in a scientific paper
  23. 26. Perform 'dry' analysis to understand which genes and which biological processes were disturbed by which chemical compounds </li><ul><li>basic affymetrix data processing
  24. 27. statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds)
  25. 28. find those pathways that are most prominent among the filtered genes </li></ul></ul>
  26. 29. Biology Use Case: A Reuser's Story <ul><li>Search for existing experiments from myExperiment (http://myexperiment.org)
  27. 30. Challenge: Understand the workflow </li><ul><li>Perform test runs with test data and his own data
  28. 31. Read others' logs
  29. 32. Read annotations to workflows </li></ul><li>Reuse scripts from colleagues and perform tests that his colleagues are familiar with </li></ul>
  30. 33. How Can It be Supported? <ul><li>A reference to the source of the data and the people to acknowledge for it.
  31. 34. The initial hypothesis
  32. 35. The conceptual workflow or a summary of the experiment plan
  33. 36. References to workflows that were tested , with comments on their application for the user's use case
  34. 37. The workflow of the user's, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments)
  35. 38. The runs of the user's own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. 'here I used parameter A, next time I may try B')
  36. 39. The final hypothesis, with comments.
  37. 40. A reference to the results of the workflow
  38. 41. Design logs that record the user's considerations while making the workflow
  39. 42. Run logs that record the user's considerations while running and interpreting the workflow </li></ul>
  40. 43. Where is Linked Data?
  41. 44. The Role of Linked Data in Wf4Ever <ul><li>Collaborative science
  42. 45. Dynamic science
  43. 46. Open science </li></ul>
  44. 47. Provenance Challenge <ul><li>Identity
  45. 48. Context
  46. 49. Storage
  47. 50. Retrieval </li></ul>
  48. 51. Take home <ul><li>Provenance should be user-driven
  49. 52. Linked Data should be a means to an end
  50. 53. http://www.wf4ever-project.org </li></ul>
  51. 54. Acknowledgement <ul><li>Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain)
  52. 55. Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain)
  53. 56. Hui Hua and Jenny Molly of University of Oxford (UK) </li></ul>

×