Astronomy Use Case: A Repeaters Story● Dealing with big amounts of tabular data● A lot of small scripts to avoid creating blackbox process● Local resource sharing, public access only after publication● Data must be frequently updated from external data repositories● Data updates must be tested before being executed● Data must be locally stored with versioning● “... we dont like to spread [the tasks] and lose controls who is doing what ...”
Research Objectshttp:/www.wf4ever-project.org ● Aggregation – Pointers or literals of internal and external content; ● Identity –Equivalence, equality; ● Metadata – A reusable object; ● Lifecycle – Stages of development. Impacts on available functionality; ● Versioning – Recording changes; ● Security – Access, authentication, ownership, trust; ● Graceful Degradation of Understanding – Opaque RO domain content. ● Mixed stewardship ● Provenance ROs are Content Aware Objects ● Of compound objects that bundle things together ● Of evolutions ● Of dynamic objects and static objects
Biology Use Case: A Reusers Story● Takes a set of genes from gene experiment results performed by others, as read in a scientific paper● Perform dry analysis to understand which genes and which biological processes were disturbed by which chemical compounds ● basic affymetrix data processing ● statistical analysis to identify genes that are significantly differentially expressed under different conditions (with/without the compounds) ● find those pathways that are most prominent among the filtered genes
Biology Use Case: A Reusers Story● Search for existing experiments from myExperiment (http://myexperiment.org)● Challenge: Understand the workflow ● Perform test runs with test data and his own data ● Read others logs ● Read annotations to workflows● Reuse scripts from colleagues and perform tests that his colleagues are familiar with
How Can It be Supported?● A reference to the source of the data and the people to acknowledge for it.● The initial hypothesis● The conceptual workflow or a summary of the experiment plan● References to workflows that were tested, with comments on their application for the users use case● The workflow of the users, possibly with a backlog of previous versions that the user wishes to keep for reference (with notes and comments)● The runs of the users own workflow, results and the recorded steps that lead to the results, in some cases with comments for later reference (e.g. here I used parameter A, next time I may try B)● The final hypothesis, with comments.● A reference to the results of the workflow● Design logs that record the users considerations while making the workflow● Run logs that record the users considerations while running and interpreting the workflow
Take home● Provenance should be user-driven● Linked Data should be a means to an end● http://www.wf4ever-project.org
Acknowledgement● Marco Roos of Leiden Unveristy (NL) and Jose Enrique Ruiz of Instituto de Astrofísica de Andalucía (Spain)● Carole Goble of University of Manchester (UK) and Jose Manuel Gomez of iSOCO (Spain)● Hui Hua and Jenny Molly of University of Oxford (UK)
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.