Is preserving data enough? Towards the preservation of scientific methods

515 views

Published on

In recent years there have been many efforts towards the preservation of data belonging to scientific research. Institutions like the Virtual Observatory and journals like PLOS ONE, Geoscience Data Journal, Ecological Archives accept datasets that support or were produced in scientific publications. Other efforts like Figshare allow citing data from unpublished research and research in progress, allowing acknowledging authors and improving the shareability of their work. At the same time, many of the challenges associated to the preservation and sharing of data has been a topic of discussion in international initiatives like the Research Data Alliance, which through its working and interest groups aims at identifying requirements and proposing reference solutions to improve such tasks like data citation and provision of correct e-infrastructure for repositories.
However, data per se is often not relevant without proper description metadata, its provenance and the software used for its creation. In fact, scientists are starting to be more concerned about the preservation of the software and methods used to deliver a particular scientific result. Reproducibility and inspectability are crucial for enabling the interpretation and the reusability of a given dataset. In "in vitro" and "in vivo" sciences, protocols exist to capture the methods necessary to reproduce an experiment. In computational sciences this is achieved with scientific workflows, which capture the method (i.e., steps and data dependencies) used to obtain a specific result. In this short talk we will introduce the set of checklists we have developed for the proper conservation of scientific workflows, encapsulated as Research Objects, by adapting existing standards for data preservation.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
515
On SlideShare
0
From Embeds
0
Number of Embeds
59
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Is preserving data enough? Towards the preservation of scientific methods

  1. 1. Daniel Garijo, Oscar Corcho, Khalid Belhajjame, Lourdes Verdes-Montenegro, Julián Garrido, Raúl Palma, Cezary Mazurek and Kristina Hettne Ontology Engineering Group (Universidad Politécnica de Madrid) University Paris-Dauphine AMIGA (Instituto de Astrofísica de Andalucía) Poznan Supercomputing and Networking Center LUMC dgarijo@fi.upm.es Warsaw, May 28th 2015 Is preserving data enough? Towards the preservation of scientific methods
  2. 2. Where does data come from? Scientific workflows 2Is preserving data enough? Towards the preservation of scientific methods Benefits: •Sharing and reusing previous work •Time savings: reexecution of old experiments with different parameters). •Teaching: new students can learn existing methods in the lab •Design for modularity, so others can reuse •Design for standardization, reduction of heterogeneity •Debugging of executions •Paper writing, linking execution pipelines to publications. •Reproducibility. •Etc. Lab book Digital Log Workflow Experiment
  3. 3. How do we preserve workflows? 3 Workflow repositories are great! But: •Manual annotation and documentation •Workflow conservation plan? •No clear link between data and method •How to reproduce a workflow? Workflows keep breaking! •Zhao et al: Why Workflows Break - Understanding and Combating Decay in Taverna Workflows. >90 workflows analyzed •Third party resources not available/accessible •Missing example data •Lack of documentation •Incomplete metadata. Is preserving data enough? Towards the preservation of scientific methods Do I have to document everything again?? Didn’t I just write a paper?
  4. 4. Our solution: Data + method =Context - Research Object 4 Aggregation of resources that bundles together the contents of a research work Is preserving data enough? Towards the preservation of scientific methods OAI-ORE + + PROV OA
  5. 5. How to preserve Research Objects? 5Is preserving data enough? Towards the preservation of scientific methods Three main ways/levels: •Descriptive reproducibility •Documentation •Workflow execution reproducibility •Can we run the workflow? •Workflow results reproducibility •Can we get the same results? Checklists! •Corcho et al: Checklist for workflow conservation. •http://dx.doi.org/10.6084/m9.figshare.1285011 •40 different aspects •Documentation •Goals •Results •Metadata •…. •Corcho et al: Checklist for a workflow conservation plan •http://dx.doi.org/10.6084/m9.figshare.1285012 •Based on the DCC’s data management plan
  6. 6. Some examples 6Is preserving data enough? Towards the preservation of scientific methods Levels of reproducibility Workflow conservation Plan
  7. 7. Conclusions 7Is preserving data enough? Towards the preservation of scientific methods •Research Objects help bundling and bridging the gap between data and methods (scientific workflows) •We need to preserve research objects as much as data and workflows used to obtain it! •Documentation •Ability to execute the experiment •Ability to obtain the same results •Checklists are a first step towards improving documentation, archival and preservation research objects. http://www.researchobject.org/
  8. 8. Daniel Garijo, Oscar Corcho, Khalid Belhajjame, Lourdes Verdes-Montenegro, Julián Garrido, Raúl Palma, Cezary Mazurek and Kristina Hettne Ontology Engineering Group (Universidad Politécnica de Madrid) University Paris-Dauphine AMIGA (Instituto de Astrofísica de Andalucía) Poznan Supercomputing and Networking Center LUMC dgarijo@fi.upm.es Warsaw, May 28th 2015 Is preserving data enough? Towards the preservation of scientific methods

×