SlideShare a Scribd company logo
results may vary
reproducibility, open science
and all that jazz
Professor Carole Goble
The University of Manchester, UK
carole.goble@manchester.ac.uk
@caroleannegoble
Keynote ISMB/ECCB 2013 Berlin, Germany, 23 July 2013
“knowledge turning”
New Insight

• life sciences
• systems biology
• translational
medicine
• biodiversity
• chemistry
• heliophysics
• astronomy
• social science
• digital libraries
• language analysis

[Josh Sommer, Chordoma Foundation]

Goble et al Communications in Computer and Information Science 348, 2013
automate: workflows,
pipeline & service
integrative frameworks

scientific
software
engineering

CS
SE

pool, share &
collaborate
web systems

semantics & ontologies
machine readable documentation

nanopub
coordinated execution of services,
codes, resources
transparent, step-wise methods
auto documentation, logging
reuse variants
http://www.seek4science.org
store/organise/link
data, models, sops,
experiments,
publications

explore/annotate
data, models, sops

yellow pages, find
peers and experts

open and controlled curation & data
pooling & credit
mgt support

catalogue and gateway to
local and public resources
APIs

simulate models

governance & policies
• PALS
reproducibility
a principle of the
scientific method
separates scientists
from other researchers
and normal people

http://xkcd.com/242/
datasets
data collections
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware

“An article about
computational science in a
scientific publication is not the
scholarship itself, it is merely
advertising of the scholarship.
The actual scholarship is the
complete software
development environment,
[the complete data] and the
complete set of instructions
which generated the figures.”
David Donoho, “Wavelab and
Reproducible Research,” 1995

Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160

Ince et al The case for open computer
programs, Nature 482, 2012
• Workshop Track (WK03) What
Bioinformaticians need to know
about digital publishing beyond the
PDF
• Workshop Track (WK02):
Bioinformatics Cores Workshop,
• ICSB Public Policy Statement on
Access to Data
hope over experience
“an experiment is reproducible until
another laboratory tries to repeat it.”
Alexander Kohn

even computational ones
hand-wringing,
weeping, wailing,
gnashing of teeth.
Nature checklist.
Science
requirements for
data and code
availability.
attacks on authors,
editors, reviewers,
publishers, funders,
and just about
everyone.
http://www.nature.com/nature/focus/reproducibility/index.html
47/53 “landmark” publications
could not be replicated
[Begley, Ellis Nature, 483, 2012]
Nekrutenko & Taylor, Next-generation sequencing data interpretation:
enhancing, reproducibility and accessibility, Nature Genetics 13 (2012)

59% of papers in the 50 highest-IF journals comply with
(often weak) data sharing rules.
Alsheikh-Ali et al Public Availability of Published Research Data in High-Impact Journals.
PLoS ONE 6(9) 2011
170 journals, 2011-2012
Required as condition of publication
Required but may not affect decisions
Explicitly encouraged may be reviewed
and/or hosted
Implied
No mention

Required as condition of publication
Required but may not affect decisions
Explicitly encouraged may be reviewed
and/or hosted
Implied
No mention

Stodden V, Guo P, Ma Z (2013) Toward Reproducible Computational
Research: An Empirical Analysis of Data and Code Policy Adoption by
Journals. PLoS ONE 8(6): e67111. doi:10.1371/journal.pone.0067111
replication gap
Out of 18 microarray papers, results
Out of 18 microarray papers, results
from 10 could not be reproduced
from 10 could not be reproduced

More retractions:
>15X increase in last decade
At current % > by 2045 as many papers published as
retracted
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
“When I use a word," Humpty Dumpty
said in rather a scornful tone, "it means
just what I choose it to mean - neither
more nor less.”
[Lewis Carroll]
conceptual replication
“show A is true by doing B
rather than doing A again”
verify but not falsify
[Yong, Nature 485, 2012]

regenerate
the figure

replicate
rerun

repeat

re-compute
recreate
revise
regenerate

redo

restore
recycle
reuse

re-examine

reconstruct

review

repurpose
repeat
same
experiment
same lab

replicate

test

same
experiment
different set up

reproduce

same
experiment
different lab
different
experiment
some of same

reuse

Drummond C Replicability is not Reproducibility: Nor is it Good Science, online
Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
validation

verification

assurance meets the
needs of a
stakeholder
e.g. error
measurement,
documentation

complies with a
regulation,
requirement,
specification, or
imposed condition
e.g. a model

science review: articles, algorithms, methods
technical review: code, data, systems
V. Stodden, “Trust Your Science? Open Your Data and Code!”
Amstat News, 1 July 2011
defend
repeat

Sound

Design
Design

Collection
Collection

review1/certify
replicate
Peer
Peer
Review
Review

Prediction
Prediction

Peer
Peer
Reuse
Reuse

Execution
Execution
Publish
Publish
Result Analysis
Result Analysis

make&run&document

report&review&support

review2compare
reproduce

transfer
reuse

* Adapted from Mesirov, J. Accessible Reproducible Research Science 327(5964), 415-416 (2010)
disorganisation

“I can’t immediately reproduce the
research in my own laboratory. It
took an estimated 280 hours for
an average user to approximately
reproduce the paper.
Data/software versions. Workflows
are maturing and becoming
helpful”
Phil Bourne

Garijo et al. 2013 Quantifying Reproducibility in Computational Biology:
The Case of the Tuberculosis Drugome PLOS ONE under review.

fraud Corbyn, Nature Oct 2012

inherent
rigour reporting & experimental design
cherry picking data
misapplication use of black box software*
software misconfigurations, random seed reporting
non-independent bias, poor positive and negative controls
dodgy normalisation, arbitrary cut-offs, premature data triage
un-validated materials, improper statistical analysis, poor
statistical power, stop when “get to the right answer”
*8% validation Joppa, et al, Troubling Trends in Scientific Software Use SCIENCE 340 May 2013
http://www.nature.com/authors/policies/checklist.pdf
• anyone anything
anytime
• publication access, data,
models, source codes,
resources, transparent
methods, standards,
formats, identifiers, apis,
licenses, education,
policies

• “accessible, intelligible,
assessable, reusable”

http://royalsociety.org/policy/projects/science-public-enterprise/report/
G8 open data charter

http://opensource.com/government/13/7/open-data-charter-g8
regulation of science
institution cores public services
libraries
republic of science*
*Merton’s four norms of scientific behaviour (1942)
a meta-manifesto (I)

• all X should be available and assessable
forever
• the copyright of X should be clear
• X should have citable, versioned identifiers
• researchers using X should visibly credit X’s
creators
• credit should be assessable and count in all
assessments
• X should be curated, available, linked to all
necessary materials, and intelligible

What’s the real issue?
we do pretty well
•
•
•
•
•
•
•
•

major public data repositories
multiple declarations for depositing data
thriving open source community
plethora of data standardisation efforts
core facilities
heroic data campaigns
international and national bioinformatics coordination
diy biology movement

• great stories- Shiga-Toxin strain of E. coli, Hamburg, May
2011, China BGI Open data crowd sourcing effort.
• Oh, wait…University of Münster/University of Göttingen
squabble http://www.nature.com/news/2011/110721/full/news.2011.430.html
hard: patient data
(inter)national complications
bleeding heart paternalism
defensive research
informed consent
fortresses

[John Wilbanks]

http://www.broadinstitute.org/files/news/pd
fs/GAWhitePaperJune3.pdf

Kotz, J. SciBX 5(25) 2012
massive centralisation –
clouds, curated core
facilities
long tail massive
decentralisation –
investigator held datasets
fragmentation & fragility
a data scarcity at point of
delivery
RIP data
quality/trust/utility
Acta Crystallographica
section B or C

data/code as
first class citizen
we are not bad people
we make progress
there was never a golden age
there never is
a reproducibility paradox
big, fast,
complicated,
multi-step,
multi-type
multi-field

expectations
of
reproducibility

diy publishing
greater access
pretty stories shiny results feedback loop
announce a result, convince us its correct
novel, attention grabbing
neat, only positive
review: the direction of
science, the next paper,
how I would do it.
reject papers purely based
on public data
obfuscate to avoid scrutiny
PLoS and F1000 counter
the scientific sweatshop
no resources, time, accountability

getting it published not getting it right
game changing benefit to justify disruption
citation distortion

Micropublications arxive reference

Clark et al Micropublications 2013 arXiv:1305.3506

[Tim Clark]

Greenberg How citation distortions create unfounded authority: analysis of a citation network.
British Medical Journal 2009, 339:b2680.
Simkin, Roychowdhury Stochastic modeling of citation slips. Scientometrics 2005, 62(3):367-384.
independent replication studies
self-correcting science

“blue collar

• hostility
• hard
• resource
intensive
• no funding, time,
recognition, place
to publish
• invisible to
science” originators

John Quackenbush
independent review
self-correcting science

“blue collar

• hostility
• hard
• resource
intensive
• no funding, time,
recognition, place
to publish
• invisible to
science” originators

John Quackenbush
what is the point: “no one will want it”
“the questions don’t change but the answers do”*
• two years time when the paper is written
• reviewers want additional work
• statistician wants more runs
• analysis may need to be repeated
• post-doc leaves, student arrives
• new data, revised data
• updated versions of algorithms/codes
quid pro quo citizenship
• trickle down theory: more open more use more credit*
others might
• meta-analysis
• novel discovery
• other methods
* Dan Reed
emerging reproducible system ecosystem
App Store needed!
instrumented desktop tools
hosted services
packaging and archiving
repositories, catalogues
online sharing platforms
integrated authoring
integrative frameworks

XworX

ReproZip

Sweave
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
integrated database and journal
http://www.gigasciencejournal.com

copy editing computational workflows
from 10 scripts + 4 modules + >20 parameters
to Galaxy workflows

2-3 months
2-3 weeks

made reproducible

galaxy.cbiit.cuhk.edu.hk
[Peter Li]
supporting data reproducibility
Open-Paper

to

d
ke
Lin OI
D

Open-Data
Data sets

78GB CC0 data

DO
I

Lin
ke
DOI:10.1186/2047-217X-1-18 d

>11000 accesses

to

DOI:10.5524/100038

Analyses

Open-Pipelines
Open-Workflows
DOI:10.5524/100044

Open-Review
8 reviewers tested data in ftp server & named reports published

Open-Code

Enabled code to being picked apart by bloggers in wiki
http://homolog.us/wiki/index.php?title=SOAPdenovo2

Code in sourceforge under GPLv3:
>5000 downloads http://soapdenovo2.sourceforge.net/

[Scott Edmunds]
Here is What I Want – The Paper
As Experiment

0. Full text of PLoS papers stored
in a database

4. The composite view has
links to pertinent blocks
of literature text and back to the PDB

4.

1.
1. A link brings up figures
from the paper

2.
[Phil Bourne]

3. A composite view of
journal and database
content results

3.

2. Clicking the paper figure retrieves
data from the PDB which is
analyzed

1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
PLoS Comp. Biol. 2005 1(3) e34
"A single pass approach to reducing sampling variation, removing errors, and
scaling de novo assembly of shotgun sequences"
http://arxiv.org/abs/1203.4802

born reproducible
http://ged.msu.edu/papers/2012-diginorm/
http://ivory.idyll.org/blog/replication-i.html

[C. Titus Brown]
made reproducible
http://getutopia.com

[Pettifer, Attwood]
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and  who gets the credit?
The Research Lifecycle
Authoring
Tools
Lab
Notebooks

Data
Capture

Software
Repositories

Analysis
Tools

Scholarly
Communication

Visualization

IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION

Commercial &
Public Tools

DisciplineBased Metadata
Standards

Git-like
Resources
By Discipline

Community Portals

Data Journals

New Reward
Systems

Training
Institutional Repositories
Commercial Repositories

[Phil Bourne]
message #1: lower friction
born reproducible

Process

=

Interest
Friction

Number
x people
reach

the neylon equation
Cameron Neylon, BOSC 2013, http://cameronneylon.net/
4+1 architecture of reproducibility
“development” view

“logical” view

social scenarios

“process” view

“physical” view
“logical view”

rigour
reporting
reassembly

recognition
review
reuse

resources
responsibility
reskilling
reporting
documentation

availability
observations
• the strict letter of the law
• (methods) modeller/ workflow makers vs (data)
experimentalists
• young researchers, support from PIs
• buddy reproducibility testing, curation help
• just enough just in time
• staff leaving and project ends
• public scrutiny, competition
• decaying local systems
• long term safe haven commitment
• funder commitment from the start
(Lusch, Vargo 2008)
(Harris and Miller 2011)

(Nowak 2006)
(Clutton-Brock 2009)
Tenopir et al 2011)
Borgman, 2012)

(Malone 2010)
(Benkler 2011)

[Kristian Garza]

(Thomson, Perry, and Miller 2009)
(Wood and Gray 1991)
(Roberts and Bradley 1991)
(Shrum and Chompalov 2007)
scientific ego-system
trust, reciprocity, collaboration to compete
blame
scooped
uncredited
misinterpretation
scrutiny
cost
loss
distraction
left behind
Merton’s four norms of scientific behaviour (1942)
dependency

fame
competitive
advantage
productivity
credit
adoption
kudos
for love

Fröhlich’s principles of scientific communication (1998)

Malone, Laubacher & Dellarocas The Collective Intelligence Genome, Sloan Management Review,(2010)
local asset economies
economics of scarce prized
commodities

• local investment
– protective

• collective purchasing
trade

– share

• sole provider
– broadcast

[Nielson] [Roffel]

(Lusch, Vargo 2008)
(Harris and Miller 2011
asymmetrical reciprocity
•
•
•
•
•
•
•
•

hugging
flirting
voyerism
inertia
sharing creep
credit drift
local control
code throwaway

family
friends
acquaintances strangers
rivals
ex-friends

Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012
Borgman The conundrum of sharing research data, JASIST 2012
1 0 JA N UA RY 2 0 1 3 | VO L 4 9 3 | N AT U R E | 1 5 9

recognition

“all research products and all scholarly labour
are equally valued except by promotion and
review committees”
message #2
visible reciprocity contract

citation is like ♥ not $
large data providers
infrastructure codes
“click and run”
instrument platforms
make credit count
Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012
Duck et al bioNerDS: exploring bioinformatics' database and software use through literature mining.
BMC Bioinformatics. 2013
Piwowar et al Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS
ONE 2007
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/,
Workshop: Reproducible Research: Tools and Strategies for Scientific Computing
Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
in perpetuity
“its not ready yet”, “I need another publication”
shame
“its too ugly”, “I didn’t work out the details”
effort
“we don’t have the skills/resources”, “the reviewers
don’t need it”
loss
“the student left”, “we can’t find it”
insecurity
“you wouldn’t understand it”, “I made it so no one
could understand it”.
Randall J. LeVeque ,Top Ten Reasons To Not Share Your Code (and why you should anyway) April
2013 SIAM News
the goldilocks paradox
“the description needed
to make an experiment
reproducible is too much
for the author and too
little for the reader”
just enough just in time
Galaxy Luminosity Profiling
José Enrique Ruiz (IAA-CSIC)
http://www.rightfield.org.uk

1. Enrich Spreadsheet Template

reducing the
friction of
curation

2. Use in Excel or OpenOffice

3. Extract and Process

RDF Graph

:
anonymous reuse is hard
nearly always negotiated
reskilling: software making practices
Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a
Computational science: ...Error…why scientific programming does not compute.

“As a general rule,
researchers do not
test or document
their programs
rigorously, and they
rarely release their
codes, making it
almost impossible
to reproduce and
verify published
results generated
by scientific
software”
http://matt.might.net/articles/crapl/

http://sciencecodemanifesto.org/
Greg
Wilson

better
software

C Titus Brown

better
research

data
carpentry
a word on reinventing
Sean Eddy

author HMMER and Infernal software
suites for sequence analysis

innovation is algorithms and methodology.
rediscovery of profile stochastic context-free grammars
(re)coding is reproducing.
reinvent what is innovative.
reuse what is utility.

Goble, seven deadly sins of bioinformatics, 35.5K views
http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics
message #3
placing value on reproducibility
take action
Organisation

Culture

Execution

Metrics

Process
[Daron Green]
(re)assembly
Gather the bits together
Find and get the bits
Bits broken/changed/lost
Have other bits
Understand the bits and
how to put together
Bits won’t work together
What bit is critical?
Can I use a different
tool?
Can’t operate the tool
Who’s job is this?
specialist codes

gateways

libraries, platforms, tools

data collections
catalogues
commodity
platforms

my data
my process
my codes

integrative
frameworks

service based

software
sepositories
(cloud)
hosted
services
Diff

Orig

repeat
(re-run)

replicate reproduce
(regenerate)

(recreate)

reuse
(repurpose/extend)

Actors
Results
Experiment
Materials

(datasets,
parameters, seeds)

Methods
(techniques, algorithms,
spec of the steps)

Setup
Instruments

(codes, services, scripts,
underlying libraries)

Laboratory
(sw and hw infrastructure,
systems software,
integrative platforms)

snapshot spectrum
materials

use workflows
capture the steps

method

instruments and laboratory

standardised pipelines
auto record of
experiment and set-up
report & variant reuse
buffered infrastructure

BioSTIF

interactive
local & 3rd party independent resources
shielded heterogeneous infrastructures
use provenance
the link between computation and results
static verifiable record
track changes
repair
partially repeat/reproduce
carry citation
calc data quality/trust
select data to keep/release
compare diffs/discrepancies

d1

d2

d1'

d2

S0

S1

S0

S1

w

z

w

S2

S'2

y

y'

S4

S4

df

df'

(i) Trace A

W3C PROV standard

(ii) Trace B

PDIFF: comparing provenance traces to
diagnose divergence across experimental
results [Woodman et al, 2011]
“an experiment is as transparent
as the visibility of its steps”
black boxes
closed codes &
services, proprietary
licences, magic cloud
services, manual
manipulations, poor
provenance/version
reporting, unknown
peer review, mis-use,
platform calculation
dependencies
Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012
dependencies & change
degree of self-contained preservation
open world, distributed, alien hosted
data/software versions and accessibility hamper replication
spin-rate of versions

[Zhao et al. e-Science 2012]

“all you need to do is copy the box that the internet is in”
preservation & distribution
portability / packaging
VM

availability
open

[Adapted Freire, 2013]

gather
dependencies
capture
steps

variability
sameness

description
intelligibility

Reproducibility
framework
packaging bickering
byte execution
virtual machine
black box

description
archived record
white box

data+compute
co-location cloud
packaging
ELIXIR Embassy Cloud

reproduce

repeat

“in-nerd-tia”
big data big compute
community facilities
cloud host costs and confidence
data scales
dump and file
capability
message #4:
“the reproducible window”
all experiments become less reproducible over time
icanhascheezburger.com

how, why and what matters
benchmarks for codes
plan to preserve
repair on demand
description persists
use frameworks

results may vary

partial replication
approximate reproduction
verification

Sandve, Nekrutenko, Taylor, Hovig Ten simple rules for reproducible in silico research,
PLoS Comp Bio submitted
message #5: puppies aren’t free
long term reliability of hosts
multiple stewardship
fragmented
business models
reproducibility service industry

24% NAR services unmaintained after three years Schultheiss et al. (2010) PLoS Comp
•
•
•
•
•
•

the meta-manifesto

all X should be available and assessable forever
the copyright of X should be clear
X should have citable, versioned identifiers
researchers using X should visibly credit X’s creators
credit should be assessable and count in all assessments
X should be curated, available, linked to all necessary materials, and
intelligible

• making X reproducible/open should be from
cradle to grave, continuous, routine, and
easier
• tools/repositories should be made to help, be
maintained and be incorporated into working
practices
• researchers should be able to adapt their
working practices, use resources, and be
trained to reproduce
• cost and responsibility should be transparent,
planned for, accounted and borne collectively
• we all should start small, be imperfect but
take action. Today.
http://www.force11.org
• evolution of a body
• fork, pull, merge
• subpart different cycles,
stewardship, authors
• refactored granularity
• software release
practices for
workflows, scripts,
services, data and
articles
• thread the salami across
parts, repositories and
journals
• chop up and microattribute

research is like
software

Faculty1000

Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
http://www.researchobject.org/

http://www.w3.org/community/rosc/

bundles and relates digital resources of a
scientific experiment or investigation using
standard mechanisms
towards a release app store
• checklists for
descriptive
reproducibility
• packaging for multihosted research
(executable)
components
• exchange between
tools and researchers
• framework for research
release and threaded
publishing using core
standards

TT43 Lounge 81
those messages again
•
•
•
•
•

lower friction, born reproducible
credit is like love
take action, use (workflow) frameworks
prepare for the reproducible window
puppies aren’t free
final message

The revolution is not
an apple that falls
when it is ripe. You
have to make it drop.
acknowledgements
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

David De Roure
Tim Clark
Sean Bechhofer
Robert Stevens
Christine Borgman
Victoria Stodden
Marco Roos
Jose Enrique Ruiz del Mazo
Oscar Corcho
Ian Cottam
Steve Pettifer
Magnus Rattray
Chris Evelo
Katy Wolstencroft
Robin Williams
Pinar Alper
C. Titus Brown
Greg Wilson
Kristian Garza

•

Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams

•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•

Juliana Freire
Jill Mesirov
Simon Cockell
Paolo Missier
Paul Watson
Gerhard Klimeck
Matthias Obst
Jun Zhao
Pinar Alper
Daniel Garijo
Yolanda Gil
James Taylor
Alex Pico
Sean Eddy
Cameron Neylon
Barend Mons
Kristina Hettne
Stian Soiland-Reyes
Rebecca Lawrence
Mr Cottam

10th anniversary today!
summary

[Jenny Cham]

https://twitter.com/csmcr/status/361835508994813954
•

myGrid
–

•

http://www.biovel.eu

Force11
–

•
•

http://www.software.ac.uk

BioVeL
–

•

http://www.wf4ever-project.org

Software Sustainability Institute
–

•

http://www.getutopia.com

Wf4ever
–

•

http://www.rightfield.org.uk

UTOPIA Documents
–

•

http://www.sysmo-db.org

Rightfield
–

•

http://www.biocatalogue.org

SysMO-SEEK
–

•

http://www.myexperiment.org

BioCatalogue
–

•

http://www.taverna.org.uk

myExperiment
–

•

http://www.mygrid.org.uk

Taverna
–

•

Further Information

http://www.force11.org

http://reproducibleresearch.net
http;//reproduciblescience.org

More Related Content

What's hot

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
Carole Goble
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
Paul Groth
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
Duncan Hull
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Carole Goble
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
Philip Bourne
 
UKON 2014
UKON 2014UKON 2014
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
Paul Groth
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
Raul Palma
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
Carole Goble
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Carole Goble
 
CSHALS 2013
CSHALS 2013CSHALS 2013
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
GigaScience, BGI Hong Kong
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
Alejandra Gonzalez-Beltran
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
Scott Edmunds
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
Carole Goble
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
Bryan Heidorn
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 

What's hot (20)

Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
 
Bioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big DataBioinformatics in the Era of Open Science and Big Data
Bioinformatics in the Era of Open Science and Big Data
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 

Viewers also liked

Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
judan1970
 
Graphic Designer
Graphic DesignerGraphic Designer
Graphic Designer
Usman Ali Kjs
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
Revolution Analytics
 
The Era of Open
The Era of OpenThe Era of Open
The Era of Open
Philip Bourne
 
An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013
National Information Standards Organization (NISO)
 
Ngsp
NgspNgsp
Ngsp
Tim Clark
 
Overview of Digital Publishing
Overview of Digital PublishingOverview of Digital Publishing
Overview of Digital Publishing
Philip Bourne
 
Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11
Maryann Martone
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
Herbert Van de Sompel
 
Science in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific NorthwestScience in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific Northwest
Cameron Neylon
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
Kaitlin Thaney
 
Columbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceColumbia Talk on Open Notebook Science
Columbia Talk on Open Notebook Science
Jean-Claude Bradley
 
Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020
Reme Melero
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
Philip Bourne
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
Platforma Otwartej Nauki
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
University of Michigan Taubman Health Sciences Library
 
Open science
Open scienceOpen science
Open science
Soenke Bartling
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
Reme Melero
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
petermurrayrust
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
Mike Travers
 

Viewers also liked (20)

Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
Unit 1, Lesson 1.8 - The Scientific Method (Part Two)
 
Graphic Designer
Graphic DesignerGraphic Designer
Graphic Designer
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
The Era of Open
The Era of OpenThe Era of Open
The Era of Open
 
An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013
 
Ngsp
NgspNgsp
Ngsp
 
Overview of Digital Publishing
Overview of Digital PublishingOverview of Digital Publishing
Overview of Digital Publishing
 
Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Science in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific NorthwestScience in the Open - Science Commons Pacific Northwest
Science in the Open - Science Commons Pacific Northwest
 
Building Capacity for Open Science
Building Capacity for Open ScienceBuilding Capacity for Open Science
Building Capacity for Open Science
 
Columbia Talk on Open Notebook Science
Columbia Talk on Open Notebook ScienceColumbia Talk on Open Notebook Science
Columbia Talk on Open Notebook Science
 
Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020 Open Science and European Access Policies in H2020
Open Science and European Access Policies in H2020
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Relationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social MediaRelationships between Open Science, Science 2.0, and Social Media
Relationships between Open Science, Science 2.0, and Social Media
 
Open science
Open scienceOpen science
Open science
 
Introduction to open science
Introduction to open scienceIntroduction to open science
Introduction to open science
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Directions in Open Science
Directions in Open ScienceDirections in Open Science
Directions in Open Science
 

Similar to ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?

Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
voginip
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
Duncan Hull
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Jisc
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
William Gunn
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
GigaScience, BGI Hong Kong
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
Philip Bourne
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
Sandra Binning
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
Carole Goble
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
Maryann Martone
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
Carole Goble
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
Susanna-Assunta Sansone
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
Carole Goble
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
David De Roure
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
Right to Research
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
Ross Mounce
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
Philip Bourne
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
GigaScience, BGI Hong Kong
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
Heather Piwowar
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
GigaScience, BGI Hong Kong
 

Similar to ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit? (20)

Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015Keynote speech - Carole Goble - Jisc Digital Festival 2015
Keynote speech - Carole Goble - Jisc Digital Festival 2015
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Data at the NIH: Some Early Thoughts
Data at the NIH: Some Early ThoughtsData at the NIH: Some Early Thoughts
Data at the NIH: Some Early Thoughts
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
Research Objects for FAIRer Science
Research Objects for FAIRer Science Research Objects for FAIRer Science
Research Objects for FAIRer Science
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Evolution of e-Research
Evolution of e-ResearchEvolution of e-Research
Evolution of e-Research
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014The State of Open Research Data - OpenCon 2014
The State of Open Research Data - OpenCon 2014
 
The State of Open Research Data
The State of Open Research DataThe State of Open Research Data
The State of Open Research Data
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
Why study Data Sharing? (+ why share your data)
Why study Data Sharing?  (+ why share your data)Why study Data Sharing?  (+ why share your data)
Why study Data Sharing? (+ why share your data)
 
Nicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShowNicole Nogoy at the Auckland BMC RoadShow
Nicole Nogoy at the Auckland BMC RoadShow
 

More from Carole Goble

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Carole Goble
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Carole Goble
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
Carole Goble
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
Carole Goble
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
Carole Goble
 

More from Carole Goble (20)

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
Research Software Sustainability takes a Village
Research Software Sustainability takes a VillageResearch Software Sustainability takes a Village
Research Software Sustainability takes a Village
 
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Open Research: Manchester leading and learning
Open Research: Manchester leading and learningOpen Research: Manchester leading and learning
Open Research: Manchester leading and learning
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 

Recently uploaded

Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour International
 
INSIDE OUT - PowerPoint Presentation.pptx
INSIDE OUT - PowerPoint Presentation.pptxINSIDE OUT - PowerPoint Presentation.pptx
INSIDE OUT - PowerPoint Presentation.pptx
RODELAZARES3
 
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptxParkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
AnujVishwakarma34
 
How To Sell Hamster Kombat Coin In Pre-market
How To Sell Hamster Kombat Coin In Pre-marketHow To Sell Hamster Kombat Coin In Pre-market
How To Sell Hamster Kombat Coin In Pre-market
Sikandar Ali
 
2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
JavaScript Interview Questions PDF By ScholarHat
JavaScript Interview  Questions PDF By ScholarHatJavaScript Interview  Questions PDF By ScholarHat
JavaScript Interview Questions PDF By ScholarHat
Scholarhat
 
A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...
Association for Project Management
 
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
Alvaro Barbosa
 
3. Maturity_indices_of_fruits_and_vegetable.pptx
3. Maturity_indices_of_fruits_and_vegetable.pptx3. Maturity_indices_of_fruits_and_vegetable.pptx
3. Maturity_indices_of_fruits_and_vegetable.pptx
UmeshTimilsina1
 
MathematicsGrade7-Presentation-July-12024.pptx
MathematicsGrade7-Presentation-July-12024.pptxMathematicsGrade7-Presentation-July-12024.pptx
MathematicsGrade7-Presentation-July-12024.pptx
nolicaliso1
 
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfPRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
nservice241
 
1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx
UmeshTimilsina1
 
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.pptFIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
ashutoshklal29
 
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
Nguyen Thanh Tu Collection
 
ASP.NET Core Interview Questions PDF By ScholarHat.pdf
ASP.NET Core Interview Questions PDF By ScholarHat.pdfASP.NET Core Interview Questions PDF By ScholarHat.pdf
ASP.NET Core Interview Questions PDF By ScholarHat.pdf
Scholarhat
 
View Inheritance in Odoo 17 - Odoo 17 Slides
View Inheritance in Odoo 17 - Odoo 17  SlidesView Inheritance in Odoo 17 - Odoo 17  Slides
View Inheritance in Odoo 17 - Odoo 17 Slides
Celine George
 
7. Post Harvest Entomology and their control.pptx
7. Post Harvest Entomology and their control.pptx7. Post Harvest Entomology and their control.pptx
7. Post Harvest Entomology and their control.pptx
UmeshTimilsina1
 
RDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEWRDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEW
Murugan Solaiyappan
 
6. Physiological Disorder of fruits and vegetables.pptx
6. Physiological Disorder of fruits and vegetables.pptx6. Physiological Disorder of fruits and vegetables.pptx
6. Physiological Disorder of fruits and vegetables.pptx
UmeshTimilsina1
 
FINAL MATATAG Science CG 2023 Grades 3-10.pdf
FINAL MATATAG Science CG 2023 Grades 3-10.pdfFINAL MATATAG Science CG 2023 Grades 3-10.pdf
FINAL MATATAG Science CG 2023 Grades 3-10.pdf
maritescanete2
 

Recently uploaded (20)

Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
Codeavour 5.0 International Impact Report - The Biggest International AI, Cod...
 
INSIDE OUT - PowerPoint Presentation.pptx
INSIDE OUT - PowerPoint Presentation.pptxINSIDE OUT - PowerPoint Presentation.pptx
INSIDE OUT - PowerPoint Presentation.pptx
 
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptxParkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
 
How To Sell Hamster Kombat Coin In Pre-market
How To Sell Hamster Kombat Coin In Pre-marketHow To Sell Hamster Kombat Coin In Pre-market
How To Sell Hamster Kombat Coin In Pre-market
 
2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx2024 Winter SWAYAM NPTEL & A Student.pptx
2024 Winter SWAYAM NPTEL & A Student.pptx
 
JavaScript Interview Questions PDF By ScholarHat
JavaScript Interview  Questions PDF By ScholarHatJavaScript Interview  Questions PDF By ScholarHat
JavaScript Interview Questions PDF By ScholarHat
 
A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...A beginner’s guide to project reviews - everything you wanted to know but wer...
A beginner’s guide to project reviews - everything you wanted to know but wer...
 
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
Benchmarking Sustainability: Neurosciences and AI Tech Research in Macau - Ke...
 
3. Maturity_indices_of_fruits_and_vegetable.pptx
3. Maturity_indices_of_fruits_and_vegetable.pptx3. Maturity_indices_of_fruits_and_vegetable.pptx
3. Maturity_indices_of_fruits_and_vegetable.pptx
 
MathematicsGrade7-Presentation-July-12024.pptx
MathematicsGrade7-Presentation-July-12024.pptxMathematicsGrade7-Presentation-July-12024.pptx
MathematicsGrade7-Presentation-July-12024.pptx
 
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfPRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
 
1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx1. Importance_of_reducing_postharvest_loss.pptx
1. Importance_of_reducing_postharvest_loss.pptx
 
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.pptFIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
 
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH LỚP 12 - GLOBAL SUCCESS - FORM MỚI 2025 - ...
 
ASP.NET Core Interview Questions PDF By ScholarHat.pdf
ASP.NET Core Interview Questions PDF By ScholarHat.pdfASP.NET Core Interview Questions PDF By ScholarHat.pdf
ASP.NET Core Interview Questions PDF By ScholarHat.pdf
 
View Inheritance in Odoo 17 - Odoo 17 Slides
View Inheritance in Odoo 17 - Odoo 17  SlidesView Inheritance in Odoo 17 - Odoo 17  Slides
View Inheritance in Odoo 17 - Odoo 17 Slides
 
7. Post Harvest Entomology and their control.pptx
7. Post Harvest Entomology and their control.pptx7. Post Harvest Entomology and their control.pptx
7. Post Harvest Entomology and their control.pptx
 
RDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEWRDBMS Lecture Notes Unit4 chapter12 VIEW
RDBMS Lecture Notes Unit4 chapter12 VIEW
 
6. Physiological Disorder of fruits and vegetables.pptx
6. Physiological Disorder of fruits and vegetables.pptx6. Physiological Disorder of fruits and vegetables.pptx
6. Physiological Disorder of fruits and vegetables.pptx
 
FINAL MATATAG Science CG 2023 Grades 3-10.pdf
FINAL MATATAG Science CG 2023 Grades 3-10.pdfFINAL MATATAG Science CG 2023 Grades 3-10.pdf
FINAL MATATAG Science CG 2023 Grades 3-10.pdf
 

ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do open science and who gets the credit?

  • 1. results may vary reproducibility, open science and all that jazz Professor Carole Goble The University of Manchester, UK carole.goble@manchester.ac.uk @caroleannegoble Keynote ISMB/ECCB 2013 Berlin, Germany, 23 July 2013
  • 2. “knowledge turning” New Insight • life sciences • systems biology • translational medicine • biodiversity • chemistry • heliophysics • astronomy • social science • digital libraries • language analysis [Josh Sommer, Chordoma Foundation] Goble et al Communications in Computer and Information Science 348, 2013
  • 3. automate: workflows, pipeline & service integrative frameworks scientific software engineering CS SE pool, share & collaborate web systems semantics & ontologies machine readable documentation nanopub
  • 4. coordinated execution of services, codes, resources transparent, step-wise methods auto documentation, logging reuse variants
  • 5. http://www.seek4science.org store/organise/link data, models, sops, experiments, publications explore/annotate data, models, sops yellow pages, find peers and experts open and controlled curation & data pooling & credit mgt support catalogue and gateway to local and public resources APIs simulate models governance & policies
  • 7. reproducibility a principle of the scientific method separates scientists from other researchers and normal people http://xkcd.com/242/
  • 8. datasets data collections algorithms configurations tools and apps codes workflows scripts code libraries services, system software infrastructure, compilers hardware “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 Morin et al Shining Light into Black Boxes Science 13 April 2012: 336(6078) 159-160 Ince et al The case for open computer programs, Nature 482, 2012
  • 9. • Workshop Track (WK03) What Bioinformaticians need to know about digital publishing beyond the PDF • Workshop Track (WK02): Bioinformatics Cores Workshop, • ICSB Public Policy Statement on Access to Data
  • 10. hope over experience “an experiment is reproducible until another laboratory tries to repeat it.” Alexander Kohn even computational ones
  • 11. hand-wringing, weeping, wailing, gnashing of teeth. Nature checklist. Science requirements for data and code availability. attacks on authors, editors, reviewers, publishers, funders, and just about everyone. http://www.nature.com/nature/focus/reproducibility/index.html
  • 12. 47/53 “landmark” publications could not be replicated [Begley, Ellis Nature, 483, 2012]
  • 13. Nekrutenko & Taylor, Next-generation sequencing data interpretation: enhancing, reproducibility and accessibility, Nature Genetics 13 (2012) 59% of papers in the 50 highest-IF journals comply with (often weak) data sharing rules. Alsheikh-Ali et al Public Availability of Published Research Data in High-Impact Journals. PLoS ONE 6(9) 2011
  • 14. 170 journals, 2011-2012 Required as condition of publication Required but may not affect decisions Explicitly encouraged may be reviewed and/or hosted Implied No mention Required as condition of publication Required but may not affect decisions Explicitly encouraged may be reviewed and/or hosted Implied No mention Stodden V, Guo P, Ma Z (2013) Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals. PLoS ONE 8(6): e67111. doi:10.1371/journal.pone.0067111
  • 15. replication gap Out of 18 microarray papers, results Out of 18 microarray papers, results from 10 could not be reproduced from 10 could not be reproduced More retractions: >15X increase in last decade At current % > by 2045 as many papers published as retracted 1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14 2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html 3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
  • 16. “When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.” [Lewis Carroll] conceptual replication “show A is true by doing B rather than doing A again” verify but not falsify [Yong, Nature 485, 2012] regenerate the figure replicate rerun repeat re-compute recreate revise regenerate redo restore recycle reuse re-examine reconstruct review repurpose
  • 17. repeat same experiment same lab replicate test same experiment different set up reproduce same experiment different lab different experiment some of same reuse Drummond C Replicability is not Reproducibility: Nor is it Good Science, online Peng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.
  • 18. validation verification assurance meets the needs of a stakeholder e.g. error measurement, documentation complies with a regulation, requirement, specification, or imposed condition e.g. a model science review: articles, algorithms, methods technical review: code, data, systems V. Stodden, “Trust Your Science? Open Your Data and Code!” Amstat News, 1 July 2011
  • 20. disorganisation “I can’t immediately reproduce the research in my own laboratory. It took an estimated 280 hours for an average user to approximately reproduce the paper. Data/software versions. Workflows are maturing and becoming helpful” Phil Bourne Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review. fraud Corbyn, Nature Oct 2012 inherent
  • 21. rigour reporting & experimental design cherry picking data misapplication use of black box software* software misconfigurations, random seed reporting non-independent bias, poor positive and negative controls dodgy normalisation, arbitrary cut-offs, premature data triage un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer” *8% validation Joppa, et al, Troubling Trends in Scientific Software Use SCIENCE 340 May 2013
  • 23. • anyone anything anytime • publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies • “accessible, intelligible, assessable, reusable” http://royalsociety.org/policy/projects/science-public-enterprise/report/
  • 24. G8 open data charter http://opensource.com/government/13/7/open-data-charter-g8
  • 25. regulation of science institution cores public services libraries republic of science* *Merton’s four norms of scientific behaviour (1942)
  • 26. a meta-manifesto (I) • all X should be available and assessable forever • the copyright of X should be clear • X should have citable, versioned identifiers • researchers using X should visibly credit X’s creators • credit should be assessable and count in all assessments • X should be curated, available, linked to all necessary materials, and intelligible What’s the real issue?
  • 27. we do pretty well • • • • • • • • major public data repositories multiple declarations for depositing data thriving open source community plethora of data standardisation efforts core facilities heroic data campaigns international and national bioinformatics coordination diy biology movement • great stories- Shiga-Toxin strain of E. coli, Hamburg, May 2011, China BGI Open data crowd sourcing effort. • Oh, wait…University of Münster/University of Göttingen squabble http://www.nature.com/news/2011/110721/full/news.2011.430.html
  • 28. hard: patient data (inter)national complications bleeding heart paternalism defensive research informed consent fortresses [John Wilbanks] http://www.broadinstitute.org/files/news/pd fs/GAWhitePaperJune3.pdf Kotz, J. SciBX 5(25) 2012
  • 29. massive centralisation – clouds, curated core facilities long tail massive decentralisation – investigator held datasets fragmentation & fragility a data scarcity at point of delivery RIP data quality/trust/utility Acta Crystallographica section B or C data/code as first class citizen
  • 30. we are not bad people we make progress there was never a golden age there never is
  • 31. a reproducibility paradox big, fast, complicated, multi-step, multi-type multi-field expectations of reproducibility diy publishing greater access
  • 32. pretty stories shiny results feedback loop announce a result, convince us its correct novel, attention grabbing neat, only positive review: the direction of science, the next paper, how I would do it. reject papers purely based on public data obfuscate to avoid scrutiny PLoS and F1000 counter
  • 33. the scientific sweatshop no resources, time, accountability getting it published not getting it right game changing benefit to justify disruption
  • 34. citation distortion Micropublications arxive reference Clark et al Micropublications 2013 arXiv:1305.3506 [Tim Clark] Greenberg How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal 2009, 339:b2680. Simkin, Roychowdhury Stochastic modeling of citation slips. Scientometrics 2005, 62(3):367-384.
  • 35. independent replication studies self-correcting science “blue collar • hostility • hard • resource intensive • no funding, time, recognition, place to publish • invisible to science” originators John Quackenbush
  • 36. independent review self-correcting science “blue collar • hostility • hard • resource intensive • no funding, time, recognition, place to publish • invisible to science” originators John Quackenbush
  • 37. what is the point: “no one will want it” “the questions don’t change but the answers do”* • two years time when the paper is written • reviewers want additional work • statistician wants more runs • analysis may need to be repeated • post-doc leaves, student arrives • new data, revised data • updated versions of algorithms/codes quid pro quo citizenship • trickle down theory: more open more use more credit* others might • meta-analysis • novel discovery • other methods * Dan Reed
  • 38. emerging reproducible system ecosystem App Store needed! instrumented desktop tools hosted services packaging and archiving repositories, catalogues online sharing platforms integrated authoring integrative frameworks XworX ReproZip Sweave
  • 41. integrated database and journal http://www.gigasciencejournal.com copy editing computational workflows from 10 scripts + 4 modules + >20 parameters to Galaxy workflows 2-3 months 2-3 weeks made reproducible galaxy.cbiit.cuhk.edu.hk [Peter Li]
  • 42. supporting data reproducibility Open-Paper to d ke Lin OI D Open-Data Data sets 78GB CC0 data DO I Lin ke DOI:10.1186/2047-217X-1-18 d >11000 accesses to DOI:10.5524/100038 Analyses Open-Pipelines Open-Workflows DOI:10.5524/100044 Open-Review 8 reviewers tested data in ftp server & named reports published Open-Code Enabled code to being picked apart by bloggers in wiki http://homolog.us/wiki/index.php?title=SOAPdenovo2 Code in sourceforge under GPLv3: >5000 downloads http://soapdenovo2.sourceforge.net/ [Scott Edmunds]
  • 43. Here is What I Want – The Paper As Experiment 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 1. A link brings up figures from the paper 2. [Phil Bourne] 3. A composite view of journal and database content results 3. 2. Clicking the paper figure retrieves data from the PDB which is analyzed 1. User clicks on thumbnail 2. Metadata and a webservices call provide a renderable image that can be annotated 3. Selecting a features provides a database/literature mashup 4. That leads to new papers PLoS Comp. Biol. 2005 1(3) e34
  • 44. "A single pass approach to reducing sampling variation, removing errors, and scaling de novo assembly of shotgun sequences" http://arxiv.org/abs/1203.4802 born reproducible http://ged.msu.edu/papers/2012-diginorm/ http://ivory.idyll.org/blog/replication-i.html [C. Titus Brown]
  • 47. The Research Lifecycle Authoring Tools Lab Notebooks Data Capture Software Repositories Analysis Tools Scholarly Communication Visualization IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION Commercial & Public Tools DisciplineBased Metadata Standards Git-like Resources By Discipline Community Portals Data Journals New Reward Systems Training Institutional Repositories Commercial Repositories [Phil Bourne]
  • 48. message #1: lower friction born reproducible Process = Interest Friction Number x people reach the neylon equation Cameron Neylon, BOSC 2013, http://cameronneylon.net/
  • 49. 4+1 architecture of reproducibility “development” view “logical” view social scenarios “process” view “physical” view
  • 52. observations • the strict letter of the law • (methods) modeller/ workflow makers vs (data) experimentalists • young researchers, support from PIs • buddy reproducibility testing, curation help • just enough just in time • staff leaving and project ends • public scrutiny, competition • decaying local systems • long term safe haven commitment • funder commitment from the start
  • 53. (Lusch, Vargo 2008) (Harris and Miller 2011) (Nowak 2006) (Clutton-Brock 2009) Tenopir et al 2011) Borgman, 2012) (Malone 2010) (Benkler 2011) [Kristian Garza] (Thomson, Perry, and Miller 2009) (Wood and Gray 1991) (Roberts and Bradley 1991) (Shrum and Chompalov 2007)
  • 54. scientific ego-system trust, reciprocity, collaboration to compete blame scooped uncredited misinterpretation scrutiny cost loss distraction left behind Merton’s four norms of scientific behaviour (1942) dependency fame competitive advantage productivity credit adoption kudos for love Fröhlich’s principles of scientific communication (1998) Malone, Laubacher & Dellarocas The Collective Intelligence Genome, Sloan Management Review,(2010)
  • 55. local asset economies economics of scarce prized commodities • local investment – protective • collective purchasing trade – share • sole provider – broadcast [Nielson] [Roffel] (Lusch, Vargo 2008) (Harris and Miller 2011
  • 56. asymmetrical reciprocity • • • • • • • • hugging flirting voyerism inertia sharing creep credit drift local control code throwaway family friends acquaintances strangers rivals ex-friends Tenopir, et al. Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6) 2012 Borgman The conundrum of sharing research data, JASIST 2012
  • 57. 1 0 JA N UA RY 2 0 1 3 | VO L 4 9 3 | N AT U R E | 1 5 9 recognition “all research products and all scholarly labour are equally valued except by promotion and review committees”
  • 58. message #2 visible reciprocity contract citation is like ♥ not $ large data providers infrastructure codes “click and run” instrument platforms make credit count Rung, Brazma Reuse of public wide gene expression data Nature Review Genetics 2012 Duck et al bioNerDS: exploring bioinformatics' database and software use through literature mining. BMC Bioinformatics. 2013 Piwowar et al Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2007
  • 59. Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Workshop: Reproducible Research: Tools and Strategies for Scientific Computing Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)
  • 60. in perpetuity “its not ready yet”, “I need another publication” shame “its too ugly”, “I didn’t work out the details” effort “we don’t have the skills/resources”, “the reviewers don’t need it” loss “the student left”, “we can’t find it” insecurity “you wouldn’t understand it”, “I made it so no one could understand it”. Randall J. LeVeque ,Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM News
  • 61. the goldilocks paradox “the description needed to make an experiment reproducible is too much for the author and too little for the reader” just enough just in time Galaxy Luminosity Profiling José Enrique Ruiz (IAA-CSIC)
  • 62. http://www.rightfield.org.uk 1. Enrich Spreadsheet Template reducing the friction of curation 2. Use in Excel or OpenOffice 3. Extract and Process RDF Graph :
  • 63. anonymous reuse is hard nearly always negotiated
  • 64. reskilling: software making practices Zeeya Merali , Nature 467, 775-777 (2010) | doi:10.1038/467775a Computational science: ...Error…why scientific programming does not compute. “As a general rule, researchers do not test or document their programs rigorously, and they rarely release their codes, making it almost impossible to reproduce and verify published results generated by scientific software”
  • 67. a word on reinventing Sean Eddy author HMMER and Infernal software suites for sequence analysis innovation is algorithms and methodology. rediscovery of profile stochastic context-free grammars (re)coding is reproducing. reinvent what is innovative. reuse what is utility. Goble, seven deadly sins of bioinformatics, 35.5K views http://www.slideshare.net/dullhunk/the-seven-deadly-sins-of-bioinformatics
  • 68. message #3 placing value on reproducibility take action Organisation Culture Execution Metrics Process [Daron Green]
  • 69. (re)assembly Gather the bits together Find and get the bits Bits broken/changed/lost Have other bits Understand the bits and how to put together Bits won’t work together What bit is critical? Can I use a different tool? Can’t operate the tool Who’s job is this?
  • 70. specialist codes gateways libraries, platforms, tools data collections catalogues commodity platforms my data my process my codes integrative frameworks service based software sepositories (cloud) hosted services
  • 71. Diff Orig repeat (re-run) replicate reproduce (regenerate) (recreate) reuse (repurpose/extend) Actors Results Experiment Materials (datasets, parameters, seeds) Methods (techniques, algorithms, spec of the steps) Setup Instruments (codes, services, scripts, underlying libraries) Laboratory (sw and hw infrastructure, systems software, integrative platforms) snapshot spectrum
  • 72. materials use workflows capture the steps method instruments and laboratory standardised pipelines auto record of experiment and set-up report & variant reuse buffered infrastructure BioSTIF interactive local & 3rd party independent resources shielded heterogeneous infrastructures
  • 73. use provenance the link between computation and results static verifiable record track changes repair partially repeat/reproduce carry citation calc data quality/trust select data to keep/release compare diffs/discrepancies d1 d2 d1' d2 S0 S1 S0 S1 w z w S2 S'2 y y' S4 S4 df df' (i) Trace A W3C PROV standard (ii) Trace B PDIFF: comparing provenance traces to diagnose divergence across experimental results [Woodman et al, 2011]
  • 74. “an experiment is as transparent as the visibility of its steps” black boxes closed codes & services, proprietary licences, magic cloud services, manual manipulations, poor provenance/version reporting, unknown peer review, mis-use, platform calculation dependencies Joppa et al SCIENCE 340 May 2013; Morin et al Science 336 2012
  • 75. dependencies & change degree of self-contained preservation open world, distributed, alien hosted data/software versions and accessibility hamper replication spin-rate of versions [Zhao et al. e-Science 2012] “all you need to do is copy the box that the internet is in”
  • 76. preservation & distribution portability / packaging VM availability open [Adapted Freire, 2013] gather dependencies capture steps variability sameness description intelligibility Reproducibility framework
  • 77. packaging bickering byte execution virtual machine black box description archived record white box data+compute co-location cloud packaging ELIXIR Embassy Cloud reproduce repeat “in-nerd-tia”
  • 78. big data big compute community facilities cloud host costs and confidence data scales dump and file capability
  • 79. message #4: “the reproducible window” all experiments become less reproducible over time icanhascheezburger.com how, why and what matters benchmarks for codes plan to preserve repair on demand description persists use frameworks results may vary partial replication approximate reproduction verification Sandve, Nekrutenko, Taylor, Hovig Ten simple rules for reproducible in silico research, PLoS Comp Bio submitted
  • 80. message #5: puppies aren’t free long term reliability of hosts multiple stewardship fragmented business models reproducibility service industry 24% NAR services unmaintained after three years Schultheiss et al. (2010) PLoS Comp
  • 81. • • • • • • the meta-manifesto all X should be available and assessable forever the copyright of X should be clear X should have citable, versioned identifiers researchers using X should visibly credit X’s creators credit should be assessable and count in all assessments X should be curated, available, linked to all necessary materials, and intelligible • making X reproducible/open should be from cradle to grave, continuous, routine, and easier • tools/repositories should be made to help, be maintained and be incorporated into working practices • researchers should be able to adapt their working practices, use resources, and be trained to reproduce • cost and responsibility should be transparent, planned for, accounted and borne collectively • we all should start small, be imperfect but take action. Today. http://www.force11.org
  • 82. • evolution of a body • fork, pull, merge • subpart different cycles, stewardship, authors • refactored granularity • software release practices for workflows, scripts, services, data and articles • thread the salami across parts, repositories and journals • chop up and microattribute research is like software Faculty1000 Jennifer Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012
  • 83. http://www.researchobject.org/ http://www.w3.org/community/rosc/ bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms
  • 84. towards a release app store • checklists for descriptive reproducibility • packaging for multihosted research (executable) components • exchange between tools and researchers • framework for research release and threaded publishing using core standards TT43 Lounge 81
  • 85. those messages again • • • • • lower friction, born reproducible credit is like love take action, use (workflow) frameworks prepare for the reproducible window puppies aren’t free
  • 86. final message The revolution is not an apple that falls when it is ripe. You have to make it drop.
  • 87. acknowledgements • • • • • • • • • • • • • • • • • • • David De Roure Tim Clark Sean Bechhofer Robert Stevens Christine Borgman Victoria Stodden Marco Roos Jose Enrique Ruiz del Mazo Oscar Corcho Ian Cottam Steve Pettifer Magnus Rattray Chris Evelo Katy Wolstencroft Robin Williams Pinar Alper C. Titus Brown Greg Wilson Kristian Garza • Wf4ever, SysMO, BioVel, UTOPIA and myGrid teams • • • • • • • • • • • • • • • • • • • Juliana Freire Jill Mesirov Simon Cockell Paolo Missier Paul Watson Gerhard Klimeck Matthias Obst Jun Zhao Pinar Alper Daniel Garijo Yolanda Gil James Taylor Alex Pico Sean Eddy Cameron Neylon Barend Mons Kristina Hettne Stian Soiland-Reyes Rebecca Lawrence
  • 90. • myGrid – • http://www.biovel.eu Force11 – • • http://www.software.ac.uk BioVeL – • http://www.wf4ever-project.org Software Sustainability Institute – • http://www.getutopia.com Wf4ever – • http://www.rightfield.org.uk UTOPIA Documents – • http://www.sysmo-db.org Rightfield – • http://www.biocatalogue.org SysMO-SEEK – • http://www.myexperiment.org BioCatalogue – • http://www.taverna.org.uk myExperiment – • http://www.mygrid.org.uk Taverna – • Further Information http://www.force11.org http://reproducibleresearch.net http;//reproduciblescience.org

Editor's Notes

  1. {"93":"Added afterwards\n","82":"capacity to do it\ndump and file\nback to negotiated reuse\n","71":"Sean Eddy Howard Hughes Medical Institute's Janelia Farm \nseveral computational tools for sequence analysis \n","60":"Created and shared large, valuable dataset which is highly regarded by peers or\nPublication in J. Big Useful Datasets, impact factor X \n","49":"So we had better be sure we know why we are doing it and make it easier\nCost benefit inbalance….\n","38":"So if you don’t want to do it for “them” do it for you\n","27":"More open more citations\n100,000 genome\n100 genome\nEncode, ENA etc\nBetter than chemists and social scientists\nhttp://www.nature.com/news/2011/110721/full/news.2011.430.html\nPublished online 21 July 2011 | Nature | doi:10.1038/news.2011.430 \nNews\nE. coli outbreak strain in genome race\nSequence data reveal pathogen's deadly origins.\nMarian Turner \nThe collaborative atmosphere that surrounded the public release of genome sequences in the early weeks of this year's European Escherichia coli outbreak has turned into a race for peer-reviewed publication.\nA paper published in PLoS One today, by Dag Harmsen from the University of Münster, Germany, and his colleagues, contains the first comparative analysis of the sequence of this year's E. coli outbreak strain (called LB226692 in the publication) and a German isolate from 2001 (called 01-09591), which was held by the E. coli reference laboratory at the University of Münster, headed by Helge Karch. The scientists also compared the two strains with the publicly available genome of strain 55989, isolated in central Africa in the 1990s.\nThe LB226692 and 01-09591 genomes were sequenced using an Ion Torrent PGM sequencer from Life Technologies of Carlsbad, California (see 'Chip chips away at the cost of a genome'). The authors say that their publication is the first example of next-generation, whole-genome sequencing being used for real-time outbreak analysis. "This represents the birth of a new discipline — prospective genomics epidemiology," says Harmsen. He predicts that this method will rapidly become routine public-health practice for outbreak surveillance.\nBut Harmsen's group was pipped to the publishing post by Rolf Daniel and his colleagues at the University of Göttingen in Germany, who published a comparison of the sequence of two isolates from the outbreak with the 55989 strain in Archives of Microbiology on 28 June. Harmsen says that this competition is why his group did not release the 2001 strain sequence before today's PLoS One publication.\nBoth groups say that their genomic sequencing and analysis were conducted independently. But their findings don't really differ from sequence analyses that other scientists were simultaneously documenting in the public domain, following the release, on 2 June, by China's BGI (formerly known as the Beijing Genomics Institute) of a full genome sequence of the outbreak strain — also generated using Ion Torrent sequencing. These scientists say that there is very little information in either publication that was not previously available on their website. "The crowd-sourcing efforts arrived at almost all of the scientific conclusions about the strain comparisons first," says Mark Pallen from the University of Birmingham, UK, "so we're surprised and disappointed that these findings are not referred to in these papers."\nEveryone agrees that the Münster laboratory released information on defining genetic features of the 2011 outbreak strain that allowed accurate patient diagnosis and strain tracking as soon as they had the information. The current squabbling revolves around genomic details that point to how the unusual strain evolved.\nSo what have the combined analyses revealed so far? All of the strains have a similar enteroaggregative E. coli (EAEC) genetic background, but the 2011 outbreak strain contains plasmid- and chromosome-encoded genes that differ both from the 2001 German and from the earlier African strain. The 2011 and 2001 strains, but not the African strain, carry the important stx gene for Shiga-toxin production — the cause of so many people's sickness — although the African strain carries an intact stx integration site, suggesting it may have evolved from a strain that did once carry it. The African strain also does not contain a tellurite-resistance gene that the other two strains do. The 2011 and 2001 also have different genes for fimbriae — the cell protrusions that make EAEC bacteria particularly sticky.\nThe authors of the paper in PLoS One hypothesize that the strains all derive from a common Shiga-toxin producing EAEC progenitor. They say the genetic steps between the three strains are suggestive of a 'common ancestor model'. It is evolutionarily more likely that bacteria lose genetic elements than gain them, and Harmsen cites the large ter genetic island as an example of a genetic element more likely to have been lost from a common progenitor than gained by subsequently appearing strains.\n"All of these analyses are an example of bacterial evolution being in constant flux," says Pallen, reiterating that this outbreak highlighted the importance of establishing more flexible diagnostic frameworks for E. coli strains.\nHarmsen says he expects at least two further papers on analysis of the genome sequences to be published by independent groups in the next few weeks. \n","16":"The letter or the spirit of the experiment\nindirect and direct reproducibility\nReproduce the same affect? Or same result?\nConcept drift towards bottom.\nAs an old ontologist I wanted an ontology or a framework or some sort of property based classification. \n","88":"ENCODE threads\n","77":"Simplify\nTrack\nVersions and retractions\nError propagation\nContributions and credits\nFix\nWorkflow repair, alternate component discovery, Black box annotation\nRerun and Replay\nPartial reproducibility: Replay some of the workflow\nA verifiable, reviewable trace in people terms\nAnalyse \nCalculate data quality & trust, \nDecide what data to keep or release\nCompare to find differences and discrepancies\nS. Woodman, H. Hiden, P. Watson,  P. Missier Achieving Reproducibility by Combining Provenance with Service and Workflow Versioning. In: The 6th Workshop on Workflows in Support of Large-Scale Science. 2011, Seattle\n","55":"Of the SEEK project\n","33":"Less true of specialist journals and things that aren’t nature.\nSloan Digital Sky Survey generated more papers by independents based on its data than the data collectors\nhttp://www.scribedevil.com/dedicated-digital-research-and-development-budget/\n“When we review papers we’re often making authors prove that their findings are novel or interesting. We’re not often making them prove that their findings are true”. Joseph Simmons, in Nature 485, Bad Copy\nPeer review references.\n","83":"Partial – over proprietary steps or difficult-to-reproduce subparts, or through examining the log\nThe law of decline\nall set-ups need to refresh or they stagnate\nthe world changes, \nnew results not the same ones.\nall set-ups need to refresh\nhow long is enough?\npartial – over proprietary steps or difficult-to-reproduce subparts, or through examining the log\nThe lab is not fixed\nPredictive models\nUpdated resources\nNew versions\nDeal with uncertainty\nReproducibility is not fossellization.\nStability – people want to use the same set up at the end of their project as they did at the beginning. Same for paper reviews\nWhen does that matter and when doesn’t it?\nThe change of a API won’t matter to the result but it will to the workflow machinery.\nThat change of an algorithm won’t impact on the workflow but will impact on the what the experiment means.\nSame API?\nSame code?\nSame version of code?\nSame dataset?\nSame version of data set?\nSame method?\n“the questions don’t change but the answers do” [Dan Reed]\n","72":"“faster/easier/intelligible/rewarded to reinvent”\nSociologically:\nAn end to build it and they will come\nAlternative metrics accepted by the community\nAlternative reward systems that recognize the realities of today’s scholarship, namely:\nOpen data availability\nSoftware availability\nCollaborative research\n","61":"“faster/easier/intelligible/rewarded to reinvent”\nWe have seen the enemy and he is us.\nBMC Bioinformatics. 2013 Jun 15;14:194. doi: 10.1186/1471-2105-14-194.\nbioNerDS: exploring bioinformatics' database and software use through literature mining.\nDuck G, Nenadic G, Brass A, Robertson DL, Stevens R.\n1. there's a lot of stuff out there and the world is quite dynamic in some respects 2. the top ten are interesting in themselves 3. it appears that a lot of tools/db have little reported use (note "reported") 4. I would tentatively say "bio types are more consevative".... the paper just reports on a survey of Genome biology and BMC Bioinformatics; Geraint has figure for a survey of all of PMC 2013 version (1/2 million articles)\nCredit is like love\nA reproducibility contract needs to be backed by a reciprocity contract\n","39":"The HOW \nBolt on, Built in\nhttp://www.zenodo.org/\nhttps://olivearchive.org\nmake reproducible -> born reproducible\nComputational experiments have three advantages: explicit, auto-tracking, portability\n","28":"ELIXIR http://www.out-law.com/en/articles/2013/june/international-biomedical-research-data-sharing-standards-to-be-created/\nhttp://www.broadinstitute.org/files/news/pdfs/GAWhitePaperJune3.pdf\nhttp://blog.ted.com/2012/06/29/unreasonable-people-unite-john-wilbanks-at-tedglobal-2012/\n[http://www.nature.com/scibx/journal/v5/n25/fig_tab/scibx.2012.644_F1.html\n","6":"social worker\n","89":"Standards are the key to reproducibility\nMost of the time people don’t care\n","78":"Science 13 April 2012: 336(6078) 159-160 \n","67":"Access to expertise\ncompetency, capacity, resources\nauthors to make in silico experiments reproducible \nreviewers and readers to be able reproduce or reuse them\n","45":"Galaxy pages (30K users, 1K new users/month)\n","34":"I bought the rights to this image\n","23":"“if it isn’t open it isn’t science”Mike Ashburner\n","12":"Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. doi:10.1371/journal.pmed.0020124 \nhttp://www.reuters.com/article/2012/03/28/us-science-cancer-idUSBRE82R12P20120328\n","1":"reproducibility \nreuse and reinventionopenness\nhow we undertake and reward \ncomputational science\nHow could we evaluate research and researchers? Reproducibility underpins the scientific method: at least in principle if not practice. The willing exchange of results and the transparent conduct of research can only be expected up to a point in a competitive environment. Contributions to science are acknowledged, but not if the credit is for data curation or software. From a bioinformatics view point, how far could our results be reproducible before the pain is just too high? Is open science a dangerous, utopian vision or a legitimate, feasible expectation? How do we move bioinformatics from one where results are post-hoc "made reproducible", to pre-hoc "born reproducible"? And why, in our computational information age, do we communicate results through fragmented, fixed documents rather than cohesive, versioned releases? I will explore these questions drawing on 20 years of experience in both the development of technical infrastructure for Life Science and the social infrastructure in which Life Science operates. \n","84":"research costs\nSelf-promotion\nI can publish every new monolithic thing and I can’t publish if I reuse someone else’s thing.\nNovelty vs Standards\nStandards are boring “blue collar” science (Quackenbush)\nResearch vs Production Confusion\nHow do you get funding for production software other than claiming to be researching stuff? \nHow do you get a publication out of a bit of research software without claiming a potential user-base?\nI don’t want to be a long-term service provider!\nlifeboats, \npuppies and the\nrepublic of openness\nHow long is forever?\nsmall short funding cycles\n58% built by students 24% unmaintained after three years*\nlarge\nsustain through reinvention\nfunding policies and business Models\nsustainability of suppliers & hosts\nlifeboats, \npuppies and the\nrepublic of openness\nHow long is forever?\ncost of sustaining your home mades\npreparing for reproducibility is not negligible cost.\ntransparently account for the cost\ndrive down & spread the cost\nResearch Management and Scholarship Vendors and service providers\nThe workflow fixer collective!\nMendeley, FigShare … \nResearch Management Systems (PURE, Simplectic)\nLab management systems (Labguru)\nLibraries, communities….\n","73":"Documentation to reassemble\ngovernance\nAnatomy of an experiment\nSubtly sometimes but its pretty important when it comes to tractability, intent and practicality.\nVMs enable you to reproduce a lab\nBy making the experiment portable. That’s the point of portability. It’s the instrument aspect. \nAeronautical engineering codes (proprietary) – multiple codes in business, without being inspected.\nDoesn’t help if running over a super computer or the cloud\n","51":"The only equation I have in the talk.\n","40":"Sharecropping.\nWhy research objects are external.\n","7":"I\n","79":"A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System \nZhao, Gomez-Perez, Belhajjame, Klyne, Garcia-Cuesta, Garrido, Hettne, Roos, De Roure and Goble. Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012\nReproducibility success is proportional to the number of dependent components and your control over them”\nMany reasons why. \nChange / Availability\nUpdates to public datasets, changes to services / codes\nAvailability/Access to components / execution environment\nPlatform differences on simulations, code ports\nVolatile third-party resources (50%): Not available, available but inaccessible, changed\nPrevent, Detect, Repair\n","68":"Time. Money. No papers (skipped in talk)\n","57":"Barriers\nPerceived norms. Me, you, them.\nTemporal construal. Have values of getting it right – concrete overcomes abstract goal\nMotivated reasoning\nMinimal accountability\nI am busy – can’t append things to the workflow. Must integrate into the workflow. The benefit must be game changing to justify disruption. \nWork with existing incentives and nudge them\nTop down – fast but narrow \nBottom up – slow but comprehensive\nLeverage norms\nwith transparency comes accountability” Mark Borkum\n","35":"Citation Distortion\nGreenberg’s [3, 4] analysis of the distortion and fabrication of claims in the biomedical literature demonstrates why citable Claims are necessary. In his analysis, it is straightforward to see how citation distortions may contribute to non-reproducible results in a pharmaceutical context, as reported by [8]. \nGreenberg SA: How citation distortions create unfounded authority: analysis of a citation network. British Medical Journal 2009, 339:b2680.\n4. Greenberg SA: Understanding belief using citation networks. Journal of Evaluation in Clinical Practice 2011, 17(2):389-393.\nBegley CG, Ellis LM: Drug development: Raise standards for preclinical cancer research. Nature 2012, 483(7391):531-533.\nSimkin and Roychowdhury showed that, in the sample of publications they studied, a majority of scientific citations were merely copied from the reference lists in other publications [31, 32]. The increasing interest in direct data citation of datasets, deposited in robust repositories, is another result of this growing concern with the evidence behind assertions in the literature [33]. \nSimkin MV, Roychowdhury VP: Stochastic modeling of citation slips. Scientometrics 2005, 62(3):367-384.\n32.Simkin MV, Roychowdhury VP: A mathematical theory of citing. Journal of the American Society for Information Science and Technology 2007, 58(11):1661-1673.\n33.Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP: Public availability of published research data in high-impact journals. PLoS ONE 2011, 6(9):e24357.\n","2":"When I was 17 years old I took a career capability test \nFailed computer science, recommended social work\nKnowledge Discovery, Knowledge Engineering and Knowledge Management \nCommunications in Computer and Information Science Volume 348, 2013, pp 3-25 \nAccelerating Scientists’ Knowledge Turns\nCarole Goble, David De Roure, Sean Bechhofer \nAbstract\nA “knowledge turn” is a cycle of a process by a professional, including the learning generated by the experience, deriving more good and leading to advance. The majority of scientific advances in the public domain result from collective efforts that depend on rapid exchange and effective reuse of results. We have powerful computational instruments, such as scientific workflows, coupled with widespread online information dissemination to accelerate knowledge cycles. However, turns between researchers continue to lag. In particular method obfuscation obstructs reproducibility. The exchange of “Research Objects” rather than articles proposes a technical solution; however the obstacles are mainly social ones that require the scientific community to rethink its current value systems for scholarship, data, methods and software.\n","85":"spreading the cost\ncradle to grave reproducibility\ntools, processes, standards\ncombine making & reporting\njust enough, imperfect\ncost in\ntrain up and support\nplanning\nWe cannot sacrifice the youth\nProtect them….a new generation\nEcosystem of support tools navigation\n","74":"the reproducibility ecosystem\nFor peer and author\ncomplicated and scattered - super fragmentation – supplementary materials, multi-hosted, multi-stewarded. \nwe must use the right platforms for the right tools \nThe trials and tribulations of review\nIts Complicated\nwww.biostars.org/\nApache\nService based ScienceScience as a Service\n","63":"Randall J. LeVeque ,Top Ten Reasons To Not Share Your Code (and why you should anyway) April 2013 SIAM News\nToo ugly to show anyone else. \nI didn't work out all the details. \nMy ex-student wrote the code \nMy competitors would be unfair. \nIts valuable intellectual property. \nIt would make papers longer. \nReferees won’t check the code. \nThe code is too sophisticated for you\nMy code invokes other code with unpublished (proprietary) code. \nReaders who have access to my code will want user support.\n","80":"Preservation - Lots of copies keeps stuff safe\nStability dimension\nAdd two more dimensions to our classification of themes\nA virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major classifications, based on their use and degree of correspondence to any real machine: System \nOverlap of course\nStatic vs dynamic. \nGRANULARITY\nThis model for audit and target of your systems\novercoming data type silos\npublic integrative data sets\ntransparency matters\ncloud\nRecomputation.org \nReproducibility by ExecutionRun It\nReproducibility by InspectionRead It\nAvailability – coverage\nGathered: scattered across resources, across the paper and supplementary materials \nAvailability of dependencies: Know and have all necessary elements\nChange management: Data? Services? Methods? Prevent, Detect, Repair.\nExecution and Making Environments: Skills/Infrastructure to run it: Portability and the Execution Platform (which can be people…), Skills/Infrastructure for authoring and reading\nDescription: Explicit: How, Why, What, Where, Who, When, Comprehensive: Just Enough, Comprehensible: Independent understanding\nDocumentation vs Bits (VMs) reproducibility\nLearn/understand (reproduce and validate, reproduce using different codes) vs Run (reuse, validate, repeat, reproduce under different configs/settings)\n","58":"And economics changes\nLocal Asset EconomiesScarcity of Prized Commodity (e.g. Instrument / Data / Model / Knowledge)\nEquipment, data, method, analysis\nTrade\nReward\nPenalty\nCost\nLove or Money\n","36":"Not true in other disciplines – like physics\nreluctance and invisibility\n","25":"Pressure from top, pressure from below\nsqueeze\n","14":"Added afterwards. \n1. Required as condition of publication, certain exceptions permitted (e.g. preserving confidentiality of human subjects)\n2. Required but may not affect editorial/publication decisions\n3. Explicitly encouraged/addressed; may be reviewed and/or hosted\n4. Implied\n5. No mention\n","3":"In May myExperiment\n * 14,660 page views  * 3,076 unique visitors  * 67% new visitors, 33% returning visitors \n","86":"last modelthe pdf not sole focus\n","75":"Reproducibility is like pornography – hard to define but you know it when you see it.\nStatic vs Dynamic\nReproduce the method and result. Reuse the method (reusing the result is just using the result)\n(techniques,\nalgorithms, spec of the \nsteps – pref. executable\n","64":"preparing for reproducibility is not negligible cost.\ntransparently account for the cost\ndrive down & spread the cost\nWhen necessary\nBewildering range of standards for formats, terminologies and checklists\nThe Economics of Curation\nCurate Incrementally\nEarly and When Worthwhile\nRamps: Automation & Integrated Tools\nCopy editing Methods\n35 different kinds of annotations\n5 Main Workflows, 14 Nested Workflows, 25 Scripts, 11 Configuration files, 10 Software dependencies, 1 Web Service, Dataset: 90 galaxies observed in 3 bands \nDifficult and time consuming\nIntrinsic worth\nPoor Reward economy.\nCapability vs Capacity\nSustaining the commons\n","53":"Red dominated by social\nBlack dominated by technical (?)\n","92":"Special thanks – 10th anniversary today! And yes, that is Mike Bada\n","81":"In-nerd-tia\nThe tendency of people, esp techy people, to get bogged down in over thinking, over engineering and trivia to the point that we can do nothing. Julie McMurray\n","70":"http://software-carpentry.org/\nPrlić A, Procter JB (2012) Ten Simple Rules for the Open Development of Scientific Software. PLoS Comput Biol 8(12): e1002802. doi:10.1371/journal.pcbi.1002802 Installing and running someone else’s code, understanding it….\nBest Practices for Scientific Computing http://arxiv.org/abs/1210.0530\nWorkshop on Maintainable Software Practices in e-Science – e-Science Conferences \nStodden, Reproducible Research Standard, Intl J Comm Law & Policy, 13 2009 \n","59":"Less likely to have personal sharing less likely to get credit\n","48":"This article by Phil Bourne et al doesn’t have any data sets deposited in repositories, but does include data in tables in the PDF, which are also available in the XML provided by PLoS. Here, Utopia has spotted that there’s a table of data (notice the little blue table icon to the left of the table). Clicking on the icon opens a window with a simple ‘spreadsheet’ of the data extracted from the paper, which you can then export in CSV to a proper spreadsheet of your choice. You can also scatter-plot the data to get a quick-and-dirty overview of what’s in the table. \n","37":"Not true in other disciplines – like physics\nreluctance and invisibility\n","4":"myExperiment currently has 9183 members, 335 groups, 2869 workflows, 772 files and 341 packs \n21 different systems.\n","87":"To share your research materials (RO as a social object)\nTo facilitate reproducibility and reuse of methods\nTo be recognized and cited (even for constituent resources)\nTo preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun)\n","76":"platform\nlibraries, plugins\nInfrastructure\ncomponents, services\ninfrastructure\n","65":"JERM spreadsheets for data integration and data exchange\nIncluding data and their metadata \nStandards-conformant\n","43":"Took 6 months in total\nThe SOAPdenovo2 paper uses 10 executables/scripts - see attachment. SOAPdenovo2 itself contains 4 modules and each can be called individually. Complexity to reproducing a result also comes from the number of parameters that can be configured in a tool. For example, the SOAPdenovo2 tool allows you to configure over 20 parameters.\nThe paper was SOAPdenovo2: http://www.gigasciencejournal.com/content/1/1/18 It took about 3 months part-time work as I first had to learn how to use and deploy the Galaxy workflow system here in BGI. This was followed by wrapping of SOAPdenovo2 and its supporting tools as Galaxy tools. A lot of effort was required to understand how the analyses in the paper was implemented as bash and perl scripts before the Galaxy workflow could be developed that replicated some of the analyses in the paper. This was due to the fact that most of the executable tools had their own configuration file and then there was a global configuration file on top of this. Now that our Galaxy system is deployed and the editor understand how to use it, the re-implementation of the paper's analyses will probably take 2-3 weeks. This included some extra work the editor did which showed how to visualise the genome assembly results of a SOAPdenovo2 process. \n"}