This document discusses the need for infrastructure to preserve and share scientific workflows and data. It provides two use cases, one from astronomy and one from biology, to illustrate challenges around collaboration, data management and reproducibility. Research Objects are proposed as a way to bundle workflows and data with metadata, provenance and lifecycles. Linked data could help support collaborative science by linking research objects and enabling discovery. Ensuring user-driven provenance is also discussed as important for adoption.
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Presented on 2012-03-28 at EGI Community Forum 2012, Munich.
http://www.wf4ever-project.org/
http://purl.org/wf4ever/model
http://cf2012.egi.eu/
https://www.egi.eu/indico/sessionDisplay.py?sessionId=66&confId=679#20120328
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
Presented on 2012-03-28 at EGI Community Forum 2012, Munich.
http://www.wf4ever-project.org/
http://purl.org/wf4ever/model
http://cf2012.egi.eu/
https://www.egi.eu/indico/sessionDisplay.py?sessionId=66&confId=679#20120328
Expert Finding and Visualisation in a Personal Learning EnvironmentWolfgang Reinhardt
Slides from my talk at ICL09 in Villach, Austria focussing on the results of our project group MoKEx 4. Main content is about expert finding and visualization in a PLE-like environment.
This presentation was prepared for ProjectWorld / BusinessAnalystWorld (Toronto, Canada - May 12-13, 2009). The presentation discusses agile project management in general and some specifics of the first agile project at CISTI.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
The digital universe is booming, especially metadata and user-generated data. This raises strong challenges in order to identify the relevant portions of data which are relevant for a particular problem and to deal with the lifecycle of data. Finer grain problems include data evolution and the potential impact of change in the applications relying on the data, causing decay. The management of scientific data is especially sensitive to this. We present the Research Objects concept as the means to indentify and structure relevant data in scientific domains, addressing data as first-class citizens. We also identify and formally represent the main reasons for decay in this domain and propose methods and tools for their diagnosis and repair, based on provenance information. Finally, we discuss on the application of these concepts to the broader domain of the Web of Data: Data with a Purpose.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
In this quality assurance training session, you will learn Core Java for Selenium. Topics covered in this course are:
• Java Fundamentals
• Introduction to Java
• Programming Language fundamentals
• Key Java Concepts
• Strings and collections
TO know more, visit this link: https://www.mindsmapped.com/courses/quality-assurance/get-practical-training-on-software-testing-quality-assurance-qa/
Expert Finding and Visualisation in a Personal Learning EnvironmentWolfgang Reinhardt
Slides from my talk at ICL09 in Villach, Austria focussing on the results of our project group MoKEx 4. Main content is about expert finding and visualization in a PLE-like environment.
This presentation was prepared for ProjectWorld / BusinessAnalystWorld (Toronto, Canada - May 12-13, 2009). The presentation discusses agile project management in general and some specifics of the first agile project at CISTI.
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
The digital universe is booming, especially metadata and user-generated data. This raises strong challenges in order to identify the relevant portions of data which are relevant for a particular problem and to deal with the lifecycle of data. Finer grain problems include data evolution and the potential impact of change in the applications relying on the data, causing decay. The management of scientific data is especially sensitive to this. We present the Research Objects concept as the means to indentify and structure relevant data in scientific domains, addressing data as first-class citizens. We also identify and formally represent the main reasons for decay in this domain and propose methods and tools for their diagnosis and repair, based on provenance information. Finally, we discuss on the application of these concepts to the broader domain of the Web of Data: Data with a Purpose.
Presentation given at CERN Workshop on Innovations in Scholarly Communication (OAI7) on 22nd June 2011
http://indico.cern.ch/conferenceDisplay.py?confId=103325
Research Objects for improved sharing and reproducibilityOscar Corcho
Presentation about the usage of Research Objects to improve scientific experiment sharing and reproducibility, given at the Dagstuhl Perspective Workshop on the intersection between Computer Sciences and Psychology (July 2015)
In this quality assurance training session, you will learn Core Java for Selenium. Topics covered in this course are:
• Java Fundamentals
• Introduction to Java
• Programming Language fundamentals
• Key Java Concepts
• Strings and collections
TO know more, visit this link: https://www.mindsmapped.com/courses/quality-assurance/get-practical-training-on-software-testing-quality-assurance-qa/
7. Astronomy Use Case:
A Repeater's Story
● Dealing with big amounts of tabular
data
● A lot of small scripts to avoid creating
blackbox process
● Local resource sharing, public
access only after publication
● Data must be frequently updated
from external data repositories
● Data updates must be tested before
being executed
● Data must be locally stored with
versioning
● “... we don't like to spread [the tasks]
and lose controls who is doing
what ...”
8. Research Objects
http:/www.wf4ever-project.org
●
Aggregation – Pointers or literals of
internal and external content;
●
Identity –Equivalence, equality;
●
Metadata – A reusable object;
●
Lifecycle – Stages of development.
Impacts on available functionality;
●
Versioning – Recording changes;
●
Security – Access, authentication,
ownership, trust;
●
Graceful Degradation of
Understanding – Opaque RO
domain content.
●
Mixed stewardship
●
Provenance
ROs are Content Aware Objects
●
Of compound objects
that bundle things together
●
Of evolutions
●
Of dynamic objects and static
objects
9. Biology Use Case: A Reuser's Story
● Takes a set of genes from gene experiment results
performed by others, as read in a scientific paper
● Perform 'dry' analysis to understand which genes and
which biological processes were disturbed by which
chemical compounds
● basic affymetrix data processing
● statistical analysis to identify genes that are significantly
differentially expressed under different conditions (with/without the
compounds)
● find those pathways that are most prominent among the filtered
genes
10. Biology Use Case: A Reuser's Story
● Search for existing experiments from
myExperiment (http://myexperiment.org)
● Challenge: Understand the workflow
● Perform test runs with test data and his own data
● Read others' logs
● Read annotations to workflows
● Reuse scripts from colleagues and perform
tests that his colleagues are familiar with
11. How Can It be Supported?
● A reference to the source of the data and the people to acknowledge for it.
● The initial hypothesis
● The conceptual workflow or a summary of the experiment plan
● References to workflows that were tested, with comments on their application for
the user's use case
● The workflow of the user's, possibly with a backlog of previous versions that the
user wishes to keep for reference (with notes and comments)
● The runs of the user's own workflow, results and the recorded steps that lead to
the results, in some cases with comments for later reference (e.g. 'here I used
parameter A, next time I may try B')
● The final hypothesis, with comments.
● A reference to the results of the workflow
● Design logs that record the user's considerations while making the workflow
● Run logs that record the user's considerations while running and interpreting the
workflow
15. Take home
● Provenance should be user-driven
● Linked Data should be a means to an end
● http://www.wf4ever-project.org
16. Acknowledgement
● Marco Roos of Leiden Unveristy (NL) and Jose
Enrique Ruiz of Instituto de Astrofísica de
Andalucía (Spain)
● Carole Goble of University of Manchester (UK)
and Jose Manuel Gomez of iSOCO (Spain)
● Hui Hua and Jenny Molly of University of
Oxford (UK)