Failures in reproducibility and robustness of scientific findings are explored from statistical, historical, and argumentation theory perspectives. The impact of false positives in the literature is connected to failures in T1 and T2 biomedical translation, and is shown to have a significant impact on the costs of therapeutic development and availability of needed treatments to the public. Technological and social approaches to resolve these issues are presented. "Reproducibility" initiatives are critiqued as unsustainable and non-authoritative; improved requirements and methods for scientific communication of findings including data, methods and material are supported as the best approaches for improved reproducibility.
2. “It has become apparent that an alarming number of
published results cannot be reproduced by other people.
That is what caused John Ioannidis to write his now
famous paper, Why Most Published Research Findings Are
False [1].
That sounds very strong. But in some areas of science it is
probably right.”
- David Colquhoun [2]
1. Ioannidis, J.P.A. (2005) Why Most Published Research Findings Are False, PLoS Med, 2, e124.
2. Colquhoun, D. (2014) An investigation of the false discovery rate and the misinterpretation of p-
values, Royal Society Open Science, 1.
3. Outline
• The translation gap
• The false reported discovery rate
• Attrition in pharmaceutical pipelines
• Historical background on reproducibility
• Logical status of scientific articles
• Coping strategies at the ecosystem level
• The global argument graph
• Conclusions & postscript
5. Scannell et al. 2012. Nat Rev Drug Discov, 2012;11(3):191–200.
T2
6. • ~ 80% to 90% of top-tier academic research is non-
reproducible in pharma target discovery labs.
• All phases in pharma discovery, development,
preclinical and clinical have significant attrition.
• ~ 90% attrition in clinical trials has huge financial
and social impact: risk avoidance.
T1
Hay et al.(2014) Nature Biotechnology 32,40–51.
Begley and Ellis (2012) Nature, 483, 531-533.
Prinz et al. (2011) Nat Rev Drug Discov, 10, 712.
11. • Obakata et al. received extraordinary scrutiny
because of its surprising conclusions.
But what proportion of more “ordinary” papers receive
this type of scrutiny?
It received further scrutiny because upon examination
there turned out to be fraud.
What about non-fraudulent, but incorrect papers?
12. • Furthermore…
(1) It seems possible that Obokata’s fraudulent use of
data came from her inability to reproduce the
original Vacanti lab experiments in the RIKEN
environment.
(2) We do not know whether the technique began with
fraud at Harvard, or was simply “reproduced by
fraud” when legitimate reproduction failed at RIKEN.
13. Colquhoun 2014
• “Almost universal failure of biomedical papers to
appreciate what governs the false discovery rate.”
• “If you use p=0.05 to suggest that you have made a
discovery, you will be wrong at least 30% of the time.”
• “If, as is often the case, experiments are underpowered,
you will be wrong most of the time.”
• “To keep your false discovery rate below 5%, you need
to use a three-sigma rule, or to insist on p≤0.001.”
14. False discovery rate in
diagnostic tests
• For disorder X, a test correctly diagnoses
• 95% of people without X as “false(X)” (specificity =
.95) and
• 80% with X as “true(X)” (sensitivity = .80).
• Prevalence of X in the population = 1%
15. Diagnostic tests (contd.)
Colquhoun 2014, “An investigation of the false discovery rate and the misinterpretation of p-values”, Royal Society Open Science, 1.
False discovery rate: 86%
16. Drug screening
• Assume drug candidates work in 10% of cases.
• Power = 0.8, sig level 0.05
• False discovery rate = 45/(45+80)=36%
False discovery rate: 36%
17. “We optimistically estimate the median statistical
power of studies in the neuroscience field to be
between about 8% and about 31%.”
Button et al. 2013 Nature Reviews Neuroscience 14: 365-376
19. Pharma attrition &
productivity
attrition = 95.9%
$1.78 billion per new drug
Paul, S.M., et al. (2010) How to improve R&D productivity: the pharmaceutical industry's grand challenge, Nat Rev Drug Discov, 9, 203-214.
20. Pharma attrition &
productivity
attrition = 95.9%
$1.78 billion per new drug
Paul, S.M., et al. (2010) How to improve R&D productivity: the pharmaceutical industry's grand challenge, Nat Rev Drug Discov, 9, 203-214.
target
selection
?
21. “Improving the quality of target selection is
the single most important factor to transform
industry productivity and bring innovative
new medicines to patients.”
Bunnage, M.E. (2011) Getting pharmaceutical R&D back on target, Nat
Chem Biol, 7, 335-339.
23. Reproducibility
“Virtual witnessing” for those not present
using new the information technology of
the scientific journal &
the scientific article.
c. 1660: Robert Boyle and colleagues
concerned with scientific vlidity of claims,
e.g. “transformation of lead into gold”…
Scientific facts will now be established
by reproducible demonstration before a
“jury of one’s peers”.
24. adapted from [1] Steven Shapin 1984,
Pump and Circumstance:
Robert Boyle’s Literary Technology.
Social Studies of Science 14(4):481-520
25. BOYLE: “We took a large and lusty frog and having
included him in a small receiver we drew out the air
not very much and left him very much swelled and
able to move his throat from time to time - though
not so fast as when he freely breathed before the
exsuction (extraction) of the air. He continued alive
about two hours that we took notice of, sometimes
removing from one side of the receiver to the other,
but he swelled more than before, and did not
appear by any motion of his throat or thorax (chest)
to exercise respiration. But his head was not very
much swelled, nor his mouth forced open. After he
had remained there somewhat above 3 hours, for it
was not 3 hours and an half, perceiving noe signe of
life in him, we let in the air upon him, at which the
formerly tumid (swelled) body shrunk very much,
but seemed not to have any other change wrought
in it and though we took him out of the receiver yet
in the free air it self, he continued to appear stark
dead nevertheless to see the utmost of the
experiment having caused him to be carried into a
garden and layd upon the grass all night, the next
morning we found him perfectly alive again.” (BP 18,
fol. 127r)
adapted from Carusi 2015, “Virtual Witnessing”, in Future of Research Communications
& eScholarship, Mathematical Institute, Oxford UK, 11-12 January 2015.
26. Definition: A scientific article is a
1. defeasible argument for claims; supported by
2. exhibited, reproducible data and methods, and
3. explicit references to other work in the domain;
4. described using domain-agreed technical
terminology.
5. It exists in a complex ecosystem of technologies,
people and activities.
Logical status of a scientific article
29. Efforts to improve the
ecosystem
• Mandatory open access
• Direct data citation & archiving
• Methods cataloging & ID
• Open annotation (W3C OA)
• Micro- & nano-publications μPub
• Reproducibility initiative
30. Joint Declaration of Data Citation Principles
endorsed by over 80 scholarly organizations
32. “Micropublications” may be used to
construct a graph of the discussion
and evidence including challenges.
Clark, Ciccarese & Goble: Micropublications: a Semantic Model of
Claims, Evidence, Argument and Annotation for Biomedical
Communication. Journal of Biomedical Semantics 2014 5:28
(http://www.jbiomedsem.com/content/5/1/28/abstract).
41. But is this really such a great idea?
Does failure to reproduce
invalidate the original experiment,
or the reproduction experiment?
42. Transparency vs. Reproducibility
• Require significant effort to achieve progress but transparency
is more pragmatic.
• Transparency should naturally lead to more rapid
correction/validation/responsibility.
• Open licenses will facilitate assessment of reproducibility in
transparent content.
• Innovation and standardization needed in filtering and
identification of most reproducible works.
42
adapted with thanks, from a talk by Iain Hrynaszekwicz, Nature Publishing Group,
on “Transparency vs. Reproducibility”, Mathematical Institute, Oxford UK, Jan. 11, 2015
43. Should Scholarly Research Aim for
Reproducibility or Robustness?
Reproducibility: The ability of an entire experiment or study to be
reproduced, ideally according to the same reproducible experimental
description and procedure
Robustness: A characteristic describing a phenomenon / finding to be
detected effectively while the variables of a test system are altered
A robust concept can be observed without failure under a variety of
conditions
A robust finding may be (biologically) more relevant than reproducibility
⇨ Robustness of data may be key
adapted with thanks, from a talk by Thomas Steckler, Janssen Pharmaceuticals,
on “Reproducibility vs. Robustness”, Mathematical Institute, Oxford UK, Jan. 11, 2015
44. Conclusions
• False reported discovery rate (FRDR) is a systemic
problem in biomedical research and communication.
• FRDR drives up pharmaceutical attrition, cost of
health care; negatively impacts translation T1-T4.
• There are statistical, ethical, informatics and social
components to scientific reproducibility - all of which
need to be addressed.
46. Ernest Rutherford: “All science is either physics
or stamp collecting.”
Paraphrase: Physics is the best and most rigorous
of all scientific enterprises, i.e., the “gold standard”.
47. Historical values of the speed of light
• pre-17th century: ∞ (instantaneous)
• 1638 Galileo: at least 10 times faster than sound
• 1675 Ole Roemer: 200,000 Km/sec
• 1728 James Bradley: 301,000 Km/sec
• 1849 Hippolyte Louis Fizeau: 313,300 Km/s
• 1862 Leon Foucault 299,796 Km/s
• Today: 299,792.458 km/s
Boyle has in common with NGSP:
Creating a multiplicity of witnesses. In principle ‘all men …’
Universal language / display of trustworthiness (diligence and modesty); elaborate sentences, plus a lot of circumstantial details. EXAMPLE
Matters of fact -- this is what could be agreed upon.
Management of dispute (supported by/ refutes) FIND PREVIOUS SLIDE
Distinction of roles: direct witnessing; potential replicators; virtual witnessing --- not aimed at replication/ reproducion
----- Meeting Notes (09/09/2014 09:27) -----
NEXT: EXAMPLE
NEXT: HOW SCIENCE GOES WRONG