Keynote speech - Carole Goble - Jisc Digital Festival 2015
Mar. 13, 2015•0 likes•3,978 views
Download to read offline
Report
Carole Goble is a professor in the school of computer science at the University of Manchester.
In this keynote, Carole offered her insights into research data management and data centres.
Keynote speech - Carole Goble - Jisc Digital Festival 2015
2. RARE and FAIR Science:
Reproducibility and
Research Objects
Professor Carole Goble FREng FBCS
The University of Manchester, UK
The Software Sustainability Institute
carole.goble@manchester.ac.uk
Jisc Digital Festival, 9-10 March 2015, ICC Birmingham, UK
3. KnowledgeTurning, Flow
Barriers to Cure
» Access to scientific
resources
» Coordination and
Collaboration
» Flow of Information
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
[Josh Sommer]
6. VirtualWitnessing*
Scientific publications:
» announce a result
» convince readers the result is correct
“papers in experimental [and computational
science] should describe the results and
provide a clear enough protocol [algorithm]
to allow successful repetition and extension”
Jill Mesirov, Broad Institute, 2010**
**Accessible Reproducible Research, Science 22January 2010,Vol. 327 no. 5964 pp. 415-416, DOI: 10.1126/science.1179653
*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.
7. Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF
COLITIS Inflammatory Bowel Diseases, , 2015,
“Only one of the 58 papers reported all essential
criteria on our checklist. Animal age, gender, housing
conditions and mortality/morbidity were all poorly
reported…..”
http://www.nature.com/news/male-researchers-stress-out-rodents-1.15106
8. “An article about computational science in a
scientific publication is not the scholarship
itself, it is merely advertising of the
scholarship.The actual scholarship is the
complete software development
environment, [the complete data] and the
complete set of instructions which generated
the figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
datasets
data collections
standard operating
procedures
software
algorithms
configurations
tools and apps
codes
workflows, scripts
code libraries
services
system software
infrastructure
compilers, hardware
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et alThe case for open computer programs, Nature 482,
2012
9. Of 50 papers randomly chosen from 378 manuscripts in 2011 that use
BurrowsWheeler Aligner for mapping Illumina reads
7studies listed necessary details
26no access to primary data sets, broken links to home websites
31no s/w version, parameters, exact version of genomic reference
sequence
Nekrutenko &Taylor, Next-generation sequencing data interpretation: enhancing, reproducibility and accessibility, NatureGenetics 13 (2012)
10. Broken software Broken science
» GeoffreyChang, Scripps Institute
» Homemade data-analysis program
inherited from another lab
» Flipped two columns of data,
inverting the electron-density map
used to derive protein structure
» Retract 3 Science papers and 2
papers in other journals
» One paper cited by 364
The structures of MsbA (purple) and
Sav1866 (green) overlap little (left)
until MsbA is inverted (right).
Miller A Scientist's Nightmare: Software Problem Leads to Five Retractions Science 22 December 2006: vol. 314 no. 5807 1856-1857
http://www.software.ac.uk/blog/2014-12-04-its-impossible-conduct-research-without-software-say-7-out-10-uk-researchers
11. Software making practices
“As a general rule,
researchers do not
test or document their
programs rigorously,
and they rarely
release their codes,
making it almost
impossible to
reproduce and verify
published results
generated by
scientific software”
2000 scientists. J.E. Hannay et al., “How Do Scientists Develop and Use Scientific Software?” Proc. ICSEWorkshop Software Eng. for
Computational Science and Eng., 2009, pp. 1–8.
15. republic of science*
regulation of science
institution cores libraries
*Merton’s four norms of scientific behaviour (1942)
public services
16. Honest Error Science is messy
Inherent
Reinhart/Rogoff Austerity economics
Thomas Herndon
Nature Oct ’12
Zoë Corbyn
Fraud
17. “I can’t immediately reproduce the research in my own laboratory.
It took an estimated 280 hours for an average user to approximately
reproduce the paper.”
Prof Phil Bourne
Associate Director, NIH Big Data 2 Knowledge Program
18. When research goes “wrong”
»Tainted resources
»Black boxes
»Poor Reporting
»Unavailable resources /
results: data, software
»Bad maths
»Sins of omission
»Poor training, sloppiness
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
Ioannidis, Why Most Published Research Findings Are False, August 2005
Joppa, et al,TroublingTrends inScientificSoftwareUseSCIENCE 340 May 2013
Scientific method
20. Social environment
» Impact factor mania
» Pressure to publish
» Broken peer review
» Research never reported
» Disorganisation
» Time pressures
» Prep & curate costs
When research goes “wrong”
https://www.sciencenews.org/article/12-reasons-research-goes-wrong (adapted)
Nick D Kim, strange-matter.net
Norman Morrison
Do a Replication Study?
No thanks! Not FAIR.
Hard. Resource intensive.
Unrecognised. Trolled.
Just gathering the bits .
21. Cross-Institutional e-Laboratory
Scattered parts, Subject specific / General resources
Fragmented Landscape
101 Innovations in Scholarly Communication - the Changing ResearchWorkflow, Boseman and Kramer, 2015,
http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826
25. Research Objects
Compound Investigations, Research Products
Multi-various Products,
Platforms/Resources
Units of exchange, commons, contextual metadata
http://www.researchobject.org
26. http://www.researchobject.org
First class citizens - data, software, methods
- id, manage, credit, track, profile, focus
A Framework to Bundle and Relate (scattered) resources
Metadata Objects that carry Research Context
Research Objects
27. • closed <-> open
• local <-> alien
• embed <-> refer
• fixed <-> fluid
• nested
• multi –typed, stewarded,
sited, authored
• span research, researchers,
platforms, time
• cite? resolve? steward?
28. Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, I3CK, 2013
means
ends
driver
29. Research Object packages codes, study,
and metadata to exchange descriptions of
clinical study cohorts, statistical scripts,
data (CKAN for the Farr Commons).
STELAR Asthma e-Lab: StudyTeam for
Early Life Asthma Research
ClinicalCodes.org coded patient cohorts
exchanged with NHS FARSITE system
MRC funded multi-site collaboration to
support safe use of patient and research
data for medical research
STELAR e-Lab
Platform 1
Platform 2
Platform 3
30. Focus, Pivot and Profile
Profile around methods, workflows, scripts, software, data, figures….
31. Focus on the figure: F1000Research Living Figures,
versioned articles, in-article data manipulation
R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482
Simply data + code
Can change the definition of
a figure, and ultimately the
journal article
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1;
ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can replicate the study, or
contribute their data to a meta-
analysis or disease model - figure
automatically updates.
Data updates time-stamped.
New conclusions added via versions.
32. Jennifer Schopf,Treating Data Like Software: A Case for Production Quality Data,JCDL 2012
Software-like Release paradigm
Not a static document paradigm
Reproduce looks backwards -> Release looks forwards
» Science, methods, data
change -> agile
evolution
» Comparisons , versions,
forks & merges,
dependencies
» Id & Citations
» Interlinked ROs
36. Aggregated Commons Infrastructure
ConsistentComparative Reporting
• Design, protocols, samples,
software, models….
• Just Enough Results Model
• Common and specific elements
http://www.seek4science.org http://www.fair-dom.org http://isatools.org
39. RO as Instrument, Materials, Method
Input Data
Software
Output Data
Config
Parameters
Drummond, Replicability is not Reproducibility: Nor is it Good Science, online
Peng, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
40. Public data sets
My algorithm
ROWorkflow as Instrument
BioSTIF
My data set
Public software
42. 1. Science Changes. So does the Lab.
BioSTIF
“The questions don’t
change but the
answers do”
Dan Reed
The lab is not fixed
Updated resources
Uncertainty
43. Zhao, et al .Why workflows break - Understanding and combating decay in
Taverna workflows, 8th Intl Conf e-Science 2012
2. Instruments Break, Labs Decay
materials become unavailable, technicians leave
Reproducibility Window
» Bit rot, Black boxes
» Proprietary Licenses
» Clown services
» Partial replication
» Prepare to Repair
› form or function?
› preserve or sustain?
44. RO as Instrument, Materials, Method
Input Data
Software
Output Data
Config
Parameters
Methods
(techniques, algorithms,
spec. of the steps)
Materials
(datasets, parameters,
algorithm seeds)
Experiment
Instruments
(codes, services, scripts,
underlying libraries)
Laboratory
(sw and hw infrastructure,
systems software,
integrative platforms)
Setup
Drummond, Replicability is not Reproducibility: Nor is it Good Science, online
Peng, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
50. The IT Crowd, Series 3, Episode 4
The eLabVirtual Machine* (or Docker Image**)
* a black box though
**docker.com
Reproduce by Running:
Active Instrument
Retain the bits
55. Workflow definition
Data (inputs, outputs)
Parameter configs
Provenance log
Hettne et al Structuring research methods and data with the research object model: genomics
workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf
myRDM
58. Method Matters
Make reproducible ->
Born
Be smart about
reproducibility
Think Commons not
Repository
Best Practices for ScientificComputing http://arxiv.org/abs/1210.0530
Stodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009
RARE & FAIR KnowledgeTurns with Research Objects
64. Training
56%
Of UK researchers develop their own
research software or scripts
73% Of UK researchers have had no formal
software engineering training
Survey of researchers from 15 RussellGroup universities conducted by SSI between August - October 2014.
406 respondents covering representative range of funders, discipline and seniority.
66. BUT……
two years time when the paper is written
reviewers want additional work
statistician wants more runs
analysis may need to be repeated
post-doc leaves, student arrives
new data, revised data
updated versions of algorithms/codes
sample was contaminated
67. Inspired by Bob Harrison
• Incremental shift for infrastructure
providers.
• Moderate shift for policy makers and
stewards.
• Paradigm shift for researchers and their
institutions.
The Challenge
68. All the members of the Wf4Ever team
Colleagues in Manchester’s Information
Management Group
http://www.researchobject.org
http://www.wf4ever-project.org
http://www.fair-dom.org
http://seek4science.org
http://rightfield.org.uk
http://www.software.ac.uk
http://www.datafairport.org
http://myexperiment.org
http://www.biovel.euAlanWilliams
Norman Morrison
Stian Soiland-Reyes
Paul Groth
Tim Clark
Juliana Freire
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Barend Mons
Sean Bechhofer
Philip Bourne
Matthew Gamble
Raul Palma
Jun Zhao
Neil Chue Hong
Josh Sommer
Matthias Obst
Jacky Snoep
David Gavaghan
Rebecca Lawrence
69. Contact…
Professor Carole Goble CBE FREng FBCS
The University of Manchester, UK
carole.goble@manchester.ac.uk
https://sites.google.com/site/caro
legoble
http://www.mygrid.org.uk