Fallibility in science: Responsible ways to handle mistakes

Fallibility in science:
Responsible ways to handle mistakes
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
@deevybee

Thought experiment #1
• PhD student, David, has run a series of studies trying to
find an impact of brain stimulation on language
comprehension in stroke patients
• After three studies with null findings, he has changed
the design in various ways and is overjoyed when the 4th
study gives a significant effect
• The paper is published, with David as first author and
his eminent supervisor as last author, in Nature.
• The university press office features the study and it is
highlighted on the BBC Radio 4 Today programme.
• Two weeks later, when preparing slides for a talk at
Society for Neuroscience, David finds the groups were
miscoded, and in fact the sham treatment group
obtained higher post-training scores

Questions
• What should David do?
• If disclosed, what impact will this have on David’s
career and that of his supervisor?
• If undisclosed, what impact will this have on David’s
career and that of his supervisor?
• Could this mistake have been avoided?

http://prawnsandprobability.blogspot.co.uk/2013/03/rethinking-retractions.html?m=1
• I was now due to give an hour long seminar in ~3 days that focused on some
completely false results.
• The paper I had been writing with Mike and David was now floundering without a
data set, and my contribution had been wiped out
• Worst of all: I had to tell my co-authors on the original paper that our results were
invalid, that we would have to retract the paper and that it was ALL MY FAULT for
not checking the code well enough.
Michael: Hey, are you ready for some news
Richard Mann: bring it
Michael: Dave reckons you only used 1/100th of the data in the .m files you sent
us, rather than 1/2 as it seems you intended
Basically just data from a single trial
Richard Mann: ... ......... um, ok

https://www.statnews.com/2017/06/01/shrimp-study-error/
What did Richard Mann do?
• Confessed to PI in ‘extremely drunk Skype conversation’
• Wrote apologetic letter to retract the paper
• Didn’t sleep much for several months
• Reanalysed the data correctly and published a paper – in the
same journal
His advice:
• Can’t rely on reviewers to catch errors like this
• Sharing code and data is best way to avoid such errors
• We need a system whereby retractions don’t carry stigma

https://whatsinjohnsfreezer.com/2014/05/10/co-rex-ions/
• Studied growth rates in Tyrannosaurus
• Amateur paleontologist Nathan Myhrvoldfound irregularities
in the data
• On reexamination Hutchinson agreed estimates were ‘not
good enough for firm conclusions’
• Retracted all aspects of growth rates from that paper
• Blogged about his experience
John Hutchinson

“My message … is to get out in front of problems like this, as
an author. Don’t wait for someone else to point it out. If you
find mistakes, correct them ASAP. Especially if they
(1) involve inaccurate data in the paper (in text, figures, tables,
whatever),
(2) would lead others to be unable to reproduce your work in
any way, even if they had all your original methods and data,
or
(3) alter your conclusions.
It is far less excruciating to do it this way then to have
someone else force you to do it, which will almost inevitably
involve more formality, deeper probing, exhaustion and
embarrassment. And there is really no excuse that you don’t
have time to do it.”
https://whatsinjohnsfreezer.com/2014/05/10/co-rex-ions/
John Hutchinson

https://www.statnews.com/2017/05/05/dirt-award-cleaning-scientific-literature/
http://retractionwatch.com/category/by-reason-for-retraction/doing-the-right-thing/

http://retractionwatch.com/2017/03/27/authors-retract-honest-error-say-arent-penalized-
result/#more-48973
Interviews with 14 scientists who
retracted papers for honest errors
between 2010-2015
• Authors who retract for honest error say
they are not penalised
• Indeed, may get kudos for integrity
• But notes that if authors ask to correct a
paper, journal often decides on retraction
• Important to de-stigmatise retraction.
• Usual focus is on negative examples where
papers retracted for fraud, etc. ECRs need
to hear about retraction for honest error
and realise it is OK

• A doctoral student, Helen, has run a study using
auditory event-related potentials (ERP) to compare
discrimination of certain sounds in people with dyslexia
vs nondyslexic controls
• She has published the data in PLOS One and has
deposited the anonymised raw EEG files on the
Dataverse public repository
• Three years later, a researcher from Iran contacts her to
say that he has reanalysed her EEG files and is unable
to reproduce her results. He has requested her analysis
scripts. He has no publication record and has very poor
English.
• Helen is now working on a different project and is
under intense time pressure to produce publications for
a fellowship proposal. She cannot find the scripts.

Questions
• What should Helen do?
• Would this kind of experience deter you from
making your data open?

http://www.russpoldrack.org/2013/02/anatomy-of-coding-error.html
”None of us likes to admit mistakes, but it's clear that
they happen often, and the only way to learn from them
is to talk about them. This is why I strongly encourage my
students to tell me about their mistakes and discuss
them in our lab meeting.”

Fallibility in science: overview
• Mistakes are everywhere
• They are not career-destroying
• Open science and collaboration can help avoid
errors but they will still occur
• Need to share code as well as data
• Important to talk about mistakes
• Correcting the record is painful and takes time, but
is important for science and for scientists
• If uncorrected, others may try to apply or build on
erroneous work

Replication
• How to respond when:
• Someone else fails to replicate your result
• You can’t replicate someone else’s study

Reasons for replication failure
• Initial result was a false positive
• Results are sensitive to contextual factors
• Lack of expertise (‘flair’) of replicator
• Initial results obtained using questionable research
practices – p-hacking etc
• Researcher used fraudulent practices
If you agree to work with replicators, it demonstrates that you are
genuinely interested in getting to the truth, and not fraudulent or sloppy

• 7 preregistered replication studies: none found predicted effects of power
pose on behavioural or hormonal measures.
• Dana Carney, who was first author on original power pose papers, advised
on design. Has subsequently concluded power pose effect is not real.
http://www.tandfonline.com/doi/full/10.1080/23743603.2017.1309876?src=recsys

Replication
• How to respond when:
• Your own study does not replicate
• You can’t replicate someone else’s study
http://deevybee.blogspot.co.uk/2014/08/replication-and-reputation-whose-
career.html

• Simon, a graduate student who works as a
demonstrator, has been trying to replicate a well-
known social priming effect in his undergraduate
lab classes. Over three years, he has not been able
to replicate the main finding.
• He has submitted a paper based on all three
studies to Psychological Science, who published the
initial paper reporting the result, but it is rejected
because of lack of novelty.
• He writes a blogpost about his experiences, casting
doubt on the original finding, but is then accused of
being an incompetent researcher who is using
social media to bully the authors of the original
study

Questions
• Was Simon’s response reasonable?
• What else could he do?
• What should he do now?

Appropriate response to finding problems in
others’ work depends on two things
Was it caused by:
• Honest error
• Questionable research practices
• p-hacking
• Suppressing inconvenient data
• Outright fraud: data manipulation
or invention
Key question:
Does it require:
• Correction
• Discussion
• Retraction
Should you go public with concerns , and if so, how & when?

Anne Weil
…my first prominent publication was a note tearing down
someone else’s work. That work had appeared in a major journal
and caused quite a stir — but the apparent results were the
product of a careless (not dishonest, just careless) mistake in
the analysis.
The note pointing this out was not derogatory in tone, nor was it
intended to shame, but was doubtless embarrassing to the
authors.
Now that I am much older, a little wiser, and a little kinder (and
a lot more employed, and thus less vulnerable to jerks) I would
send the authors my analysis of their math first and give them
the opportunity to correct.
And I hope that my colleagues would give me the same
consideration if (when?) I make a stupid mistake.
Comment on : https://whatsinjohnsfreezer.com/2014/05/10/co-rex-ions/
Honest error

others’ work depends on nature of problem
Concerns re research
design/analysis/interpretation
Usually due to ignorance rather than
deliberate malpractice:
• e.g. study does not have a crucial
control group
Key question:
Does it require:
• Correction
• Discussion
• Retraction
Usually needs DISCUSSION, but how/where?

Concerns re research design etc
PubMed Commons provides forum for post-publication peer review and
provides a way of starting a discussion
Commentators have to have published in a journal covered by PubMed and
are not anonymous
Comment should focus on the design flaw and its implications, not on the
researchers
PubMed Commons gives opportunity to email author to alert them to your
comment and reply – though a personal message may be more effective

Spiro Pantazatos2016 Oct 19 01:18 a.m. Mind the distance: spatial proximity
confounds tissue-tissue gene expression correlations reported in this study.
This is a novel and very interesting study. However, the authors do not adequately
control for spatial proximity, which, contrary to the authors’ claims in the original
article, accounts entirely for high within-network strength fraction according to
our recent replication/reanalysis of these same data. Furthermore, “null
networks”, (i.e. contiguous clusters with center coordinates randomly placed
throughout cortex), also have significantly high strength fractions, indicating that
high within-network strength fraction is not related to resting-state networks
identified by fMRI.
Here is a link to the full technical commentary and replication/reanalysis write-up
with additional supplementary discussion:
http://biorxiv.org/content/early/2016/10/04/079202
And here is a link to the replication/reanalysis code on Github:
https://github.com/spiropan/ABA_functional_networks
The lead authors are aware of these findings and concerns (I notified them via
personal email in March, 2016) and they have let me know they plan to respond.
Note: Commentator is (a) highly
specific; (b) provides links to
reanalysis; (c) has raised
concerns with authors

others’ work depends on nature of problem
• Questionable research practices
• p-hacking
• Suppressing inconvenient data
Key question:
Does it require:
• Correction
• Discussion
• Retraction
Harder to detect; again Pubmed Commons can be useful
These have been so common in our discipline
that they can be normative – often
recommended by editors/reviewers!
“Drop those results – they aren’t interesting”

https://www.ncbi.nlm.nih.gov/myncbi/franck.ramus.1/comments/
……..
Similarly, with 12 dyslexic individuals, only huge correlations greater than 0.576 could
be significant. Luckily this study observed a correlation of 0.588 between left V5/MT-
LGN connectivity and RAN (using a one-tailed test and correcting for two tests), but not
with reading comprehension. But what about the other behavioural variables, spelling
and reading speed? Are they not core symptoms of dyslexia, even more so than RAN?
Do they not rely on visual abilities? Were the a priori predictions so specific to RAN and
reading comprehension, that correlations with spelling and reading speed were not
even tested? If those predictions had been preregistered, this might be credible.
Alternatively, were those correlations tested, but not taken into account in the
correction for multiple tests? (not even mentioning correlations within the control
group, or across the two groups)
Section from comment by Franck Ramus on
Draws attention to probable p-hacking but avoids personal attack on authors

More serious problems can be tackled via PubMed Commons
See comment in PubMed Commons belowClin Sci (Lond). 2008
Feb;114(3):221-30.
Normal-sodium diet compared with low-sodium diet in compensated
congestive heart failure: is sodium an old enemy or a new friend?
David Nunan2017 May 31 11:16 a.m.
Readers may not be aware of concerns with duplicate data in this paper
and another paper (Parrinello G, 2009) by the same group published in the
Journal of Cardiac Failure in 2009. Both these papers were also included in
a 2012 systematic review published in BMJ Open Heart which was
subsequently retracted. A notice of concern was raised with the Journal of
Cardiac Failure paper. No such notice has been made for this paper and
neither individual papers have been retracted.
https://www.ncbi.nlm.nih.gov/myncbi/david.nunan.1/comments/

• A postdoctoral fellow, Susan, is conducting a meta-
analysis of studies on autistic behaviours in mice
with a particular genetic modification
• She finds suspicious similarities between results in
three papers by one research group, even though
they are described as involving different animals
• She emails the senior author to ask whether they
were the same animals in the three studies but gets
no reply

Questions
• What should Susan do?

Response to suspicion of fraud
• Check your facts and then check again
• Look for a pattern : a single dodgy result is never enough
• Discuss with author
• Discuss with journal
• Seek support from senior colleagues you trust
• N.B. Direct confrontation: important, but not for the
inexperienced or faint-hearted
Good advice here:
• Simonsohn (2013) Just post it: The lesson from two cases of
fabricated data detected by statistics alone. Psychological
Science, 24(10) 1875–1888

https://medium.com/@jamesheathers/the-buck-stops-nowhere-8284a57c88c9
”Criticism isn’t measured; in fact, it is not even considered ‘service’, a catch-all
term for unpaid yet necessary sideline tasks to academic life. It is not
considered at all.
An additional perspective is also instructive. Imagine reading the following:
Ø “responsible for three corrections and two retractions of terrible work
which wasted hundreds of thousands of $ / thousands of work hours”
Ø “hounded Journal XYZ into upholding their stated publication standards”
Ø “author of at least thirty angry letters to editors, resulting in etc. etc.”
Of course, it isn’t exactly easy to measure, but that is not the point here — the
point is that the above is simply unthinkable for someone inside the academic
tent. These sound like the career achievements of a curmudgeon, a thug or a
crank. Even to me, these points, this reads as the brag sheet of a five-year-old
boy who is proud of how many blocks he can kick over, wantonly destructive
and oddly specific.”
James Heathers, 2017
Tackling bad science takes up a lot of time and emotional energy
– for little reward

Need to change the incentives!
• Funders already alerted to this and working to
reward reproducible science rather than sexy
science
• ‘Bullied into Bad Science’ campaign – formed by
group of early-career researchers who were fed up
with being pressured to publish in Science, Nature
etc. – see @LoganCorina
• Need more institutional change: hiring policies to
value reproducible science

Overview:How to approach errors in others’ work
• Computational error/failure to replicate ≠ bad science
• Make contact with authors to express concerns at an early stage
• If no response from senior author, can raise with other authors
• Do not comment on social media unless and until direct approach to
authors has failed
• Take advice from senior colleagues you respect
• Red flags:
• Defensiveness and other-blaming (though these are natural human responses)
• Unwillingness to share data (though widespread!)
• Failure to respond when serious concerns are raised
• Avoid inflammatory language, mockery, attacks on individuals.
Objective statement of facts is more effective

Brown & Heathers: GRIM (Granularity-Related Inconsistency of
Means): mathematical methods for verifying summary statistics of
published research reports in psychology.
Epskamp & Nuijten (2016) R package “statcheck”: Extract statistics
from articles and recompute p values
Simonsohn (2013) Just post it: The lesson from two cases of
fabricated data detected by statistics alone. Psychological Science,
24(10) 1875–1888
Carlisle et al (2015) Calculating the probability of random sampling
for continuous variables in submitted or published randomised
controlled trials. Anaesthesia 2015, 70, 848–858
Technical postscript
Statistical methods for detection of error or fraud

In the end, being a good scientist
isn’t easy, but we can try!

Fallibility in science: Responsible ways to handle mistakes

More Related Content

What's hot

More from Dorothy Bishop

Recently uploaded

Fallibility in science: Responsible ways to handle mistakes