Reproducible and citable data and models: an introduction.
Sep. 23, 2015•0 likes•4,209 views
Download to read offline
Report
Science
Prepared and presented by Carole Goble (University of Manchester), Wolfgang Mueller (HITS), Dagmar Waltermath (University of Rostock), at the Reproducible and Citable Data and Models Workshop, Warnemünde, Germany. September 14th - 16th 2015.
Reproducible and citable data and models: an introduction.
2. Introduction:
What do we mean by citable
data and reproducible
models?
Carole Goble, Wolfgang Müller, Dagmar Waltemath
FAIRDOM Consortium
The University of Manchester, UK
carole.goble@manchester.ac.uk
EraSysAPP Workshop Data Citation and Model Reproducibility, Rostock, 14-16 Sept 2015
3. “An article about
computational science in a
scientific publication is not
the scholarship itself, it is
merely advertising of the
scholarship. The actual
scholarship is the complete
software development
environment, [the complete
data] and the complete set
of instructions which
generated the figures.”
David Donoho, “Wavelab and
Reproducible Research,” 1995
5. Why Unreproducible Research Happens
• Tainted resources
• Black boxes
• Poor Reporting
• Unavailable resources /
results: data, software
• Bad maths
• Sins of omission
• Poor training,
sloppiness
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
Ioannidis, Why Most Published Research Findings Are False, August 2005
Joppa, et al,TroublingTrends inScientificSoftwareUse SCIENCE 340 May 2013
Scientific method
• Impact factor mania
• Pressure to publish
• Broken peer review
• Research never reported
• Disorganisation
• Time pressures
• Prep & curate costs
Social environment
9. Packaging, Porting, Reusing
Bergman et al COMBINE archive and OMEX format: one file to
share all information to reproduce a modeling project, BMC
Bioinformatics 2014, 15:369
10. Citable Data
• Persistent Identifiers
• Resolution
• Citation attribution
• Credit lists
• Citation infrastructure
• Snapshots and versioning
• Link Data with Publications
11. submit article
and move on…
publish article
Research
Environment
Publication
Environment
Peer
Review
• Find, Access,
Interoperate, Reuse
– Catalogues, Public
Archives, Licensing,
Guidelines, Standards
– Packaging models and
Data
• Data / Software
Publishing
– Link Data / Software and
Literature
• Credit and citation
12. submit article
and move on…
publish article
Research
Environment
Publication
Environment
Peer
Review
Citing data in research articles:
principles, implementation,
challenges – and the benefits of
changing our ways
Johanna McEntyre (Mon)
Publishing data and code
openly Tom Ingraham (Mon)
The FAIRDOM Commons for
Systems Biology Carole Goble
(Mon)
The OpenAIRE infrastructure and
RDA Data Publishing Working
Group: results and vision Paolo
Manghi (Wed)
Data
13. submit article
and move on…
publish article
Research
Environment
Publication
Environment
Peer
Review
Standards for reproducibility of
model-based results
Dagmar Waltemath (Tues)
Reproducible model construction,
validation and simulation
Jacky Snoep (Tues)
Capturing the context – one
small(ish) step for modellers,
one giant leap for mankind
Mihai Glont (Tues)
Archiving modeling results
Finn Bacall, Stuart
Owen, Martin Scharm (Weds)
Models
14. submit article
and move on…
publish article
Research
Environment
Publication
Environment
Peer
Review
Hands On
Editor's Notes
Lots of research is incomparable.
EXECUTION
REPORTING
Gathered: scattered across different repositories/catalogues
Availability of dependencies: Know and have all necessary elements available, accessible, maybe open
Change management: Data? Services? Methods? Prevent, Detect, Repair.
Execution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), authoring and reading
Description: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding
Purpose for doing it: reason and reward sensitivity
Reporting and Preserving
SOPs for methods
Current work on research reproducibility has focused on the creation of tools for packaging research artefacts such as data and software so that the analyses can be run by others
the creation of domain-specific guidelines and checklists for the reporting of research.
How
Open software (inspection) reproduce
Closed software (execution, but not inspection) – VM! replication
FAIR Model
Description left to right
Portability up and down
FAIRport* ReproducibilityFind, Access, Interoperate, Reuse, PortPreservation - Lots of copies keeps stuff safe
Stability dimension
Add two more dimensions to our classification of themes
A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System
Overlap of course
Static vs dynamic.
GRANULARITY
This model for audit and target of your systems
overcoming data type silos
public integrative data sets
transparency matters
cloud
Recomputation.org
Reproducibility by ExecutionRun It
Reproducibility by InspectionRead It
Availability – coverage
Gathered: scattered across resources, across the paper and supplementary materials
Availability of dependencies: Know and have all necessary elements
Change management: Data? Services? Methods? Prevent, Detect, Repair.
Execution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), Skills/Infrastructure for authoring and reading
Description: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding
Documentation vs Bits (VMs) reproducibility
Learn/understand (reproduce and validate, reproduce using different codes) vs Run (reuse, validate, repeat, reproduce under different configs/settings)