This presentation was delivered by Tomasz Miksa (SBA Research) at the PERICLES final project conference 'Acting on Change: New Approaches and Future Practices in LTDP' (Wellcome Collection Conference Centre, London, 30 Nov -1 Dec 2016).
Tomasz Miksa joined Barbara Reed (Recordkeeping Innovation), Pip Laurenson (Tate/PERICLES), Simon Waddington (King's College London/PERICLES) and Patricia Falcao (Tate/PERICLES) in a thematic session on 'Risk assessment for preservation in the active life of complex digital objects'.
This session looked at how to characterise different types of risk which are relevant to the preservation of a range of digital objects in different contexts including those described by continuum theory. It also considered what type of information is available and required for accurate assessment within different preservation contexts, namely digital artworks, scientific data, records and archives. The focus of this session was largely be on complex digital objects.
http://pericles-project.eu/
2. Tomasz Miksa tmiksa@sba-research.org
eScience and Research Infrastructures
Scientists exchange
- facilities
- resources
- services
- datasets
Research requires
- special tooling and software
- workflows to
• capture
• transform
• visualize
• interpret the data
3. Tomasz Miksa tmiksa@sba-research.org
Taverna Workflow
Workflows and Context
‘Workflows’ can be
- ad hoc commands and scripts
executed manually
- well-structured processes
executed within a controlled environment
Workflows
- share infrastructure with other processes
- delegate tasks to tools installed in the system
- require specific configurations
- can use distributed systems
#!/bin/bash
# fetch data
java -jar GestBarragensWSClientIQData.jar
unzip -o IQData.zip
# fix encoding
#iconv -f LATIN1 -t UTF-8 iq.r > iq_utf8.r
# generate references
R --vanilla < iq_utf8.r > IQout.txt
# create pdf
pdflatex iq.tex
pdflatex iq.tex
Script
4. Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Current studies show very low reproducibility in
- medicine
- economy
- computer science
Reproducibility requires
- well documented research workflows
- precise information
on the experiment's environment
5. Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Neuroanatomical studies
FreeSurfer Software
- cortical thickness and volume of neuroanatomical structures
Different
- FreeSurfer Versions
• v4.3.1, v4.5.0, v5.0.0
- Workstation
• Mac, Hewlett‐Packard
- Operating system version
• OSX 10.5, OSX 10.6
E. Gronenschild, P. Habets, H. I. L. Jacobs, R. Mengelers, N. Rozendaal, J. van Os, and M. Marcelis, “The effects
of freesurfer version, workstation type, and macintosh operating system version on anatomical volume and cortical
thickness measurements,” 2012.
6. Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Computer Science
613 papers in 8 ACM conferences
C. Collberg and T. Proebsting, “Measuring reproducibility in computer systems research,”
2014. [Online]. Available: http://reproducibility.cs.arizona.edu/tr.pdf
7. Tomasz Miksa tmiksa@sba-research.org
Reproducibility
Computer Science
E-mail responses from authors
- Wrong version
- Code will be available soon
- Programmer left
- Bad backup practices
- Commercial code
- Proprietary academic code
- Intellectual property
- No intention to release
- …
11. TIMBUS - Risk mitigation
strategies
Metadata and
documentation
Migration
- File formats
- Storage media
- Alternative services
• Open source service
• In‐housing of services
Emulation
Virtualisation
Mock‐up of systems
12. Tomasz Miksa tmiksa@sba-research.org
Summary
Scientific experiments
- workflows for data processing with software dependencies
Risks affecting reproducibility
- low due to insufficient experiment description
Solutions for improving reproducibility
- improve data management, sharing and reuse
TIMBUS approach for process preservation
- based on risk management practices
- using context modelling to evaluate preservation alternatives