SlideShare a Scribd company logo
Insights from psychology
on lack of reproducibility
Dorothy V. M. Bishop
Professor of Developmental Neuropsychology
University of Oxford
Talk given at All Souls seminar on Reproducibility and Open Research, 31/10/18
The four horsemen of the Apocalypse
Publication biasLow power
Simple explainer using poker
Probability from
unbiased deck of
cards = 1 in 50
• If magician tells you he’ll deal you ‘3 of a kind’,
and he does so, you should be impressed
3 of a kind
• If magician deals 50 hands, and one of them is ‘3 of a
kind’, you should not be impressed
‘Surprisingness’ of a result only interpretable in context of full dataset
Simple explainer using poker
Probability from
unbiased deck of
cards = 1 in 50
• If magician tells you he’ll deal you ‘3 of a kind’,
and he does so, you should be impressed
3 of a kind
• If magician deals 50 hands, and one of them is ‘3 of a
kind’, you should not be impressed
‘Surprisingness’ of a result only interpretable in context of full dataset
Simple explainer using poker
Probability from
unbiased deck of
cards = 1 in 50
• If magician tells you he’ll deal you ‘3 of a kind’,
and he does so, you should be impressed
3 of a kind
• If magician deals 50 hands, and one of them is ‘3 of a
kind’, you should not be impressed
‘Surprisingness’ of a result only interpretable in context of full dataset
De Groot
Failure to distinguish between
hypothesis-testing and hypothesis-
generating (exploratory) research
-> misuse of statistical tests
Historical timeline: concerns about reproducibility
Describes P-hacking (though that term not used)
Situation when 10 statistical tests done in a study
“….when N=10 it is as if one participates… in a game of chance
with “probability of losing” α for each “draw” or “throw”. The
probability that we do not lose a single time in 10 draws can be
calculated in the case that the draws are independent; it equals (1
− α)^10. For α = 0.05, the traditional 5% level, this becomes 0.9510
= 0.60. This means, therefore, that we have a 40% chance of
rejecting at least one of our 10 null hypotheses — falsely”
De Groot
Low power
Sample size too
small to reliably
detect a true effect
of interest
De Groot
The “file drawer” problem
Prejudice against the null
“As it is functioning in at least some areas of
behavioral science research, the research-
publication system may be regarded as a
device for systematically generating and
propagating anecdotal information.”
Publication bias
Nonsignificant findings not
published: literature gives distorted
De Groot
• “Presenting post hoc hypotheses in a research report
as if they were, in fact, a priori hypotheses.”
• A way of ”translating type I errors into theory”
In survey by Kerr & Harris (1998), 52% respondents said they knew
of editors/reviewers encouraging HARKing
If we’ve known about this for
decades, why haven’t the problems
been fixed?
No one cause: need to consider research
environment and incentives
This talk: focus on cognitive biases that make it
hard to do science well
Idea: doing good science is in opposition to many
of our natural ways of thinking
Cognitive biases that make it hard
to do science well
• Failure to understand probability
• Tendency to see patterns in things
• Confirmation bias
• Errors of omission seen as acceptable
• Need for narrative
A certain town is served by two hospitals. In the larger hospital about 45
babies are born each day, and in the smaller hospital about 15 babies are
born each day. As you know, about 50% of all babies are boys. However, the
exact percentage varies from day to day. Sometimes it may be higher than
50%, sometimes lower. For a period of 1 year, each hospital recorded the
days on which more than 60% of the babies born were boys. Which hospital
do you think recorded more such days?
1.The larger hospital
2.The smaller hospital
3.About the same (that is, within 5% of each other)
Example from Daniel Kahneman & Amos Tversky
Consider this problem
Example from Daniel Kahneman & Amos Tversky
A certain town is served by two hospitals. In the larger hospital about 45
babies are born each day, and in the smaller hospital about 15 babies are
born each day. As you know, about 50% of all babies are boys. However, the
exact percentage varies from day to day. Sometimes it may be higher than
50%, sometimes lower. For a period of 1 year, each hospital recorded the
days on which more than 60% of the babies born were boys. Which hospital
do you think recorded more such days?
1.The larger hospital
2.The smaller hospital
3.About the same (that is, within 5% of each other)
Expected value:
Hosp15 = 57 days
Hosp45 = 26 days
Example from Daniel Kahneman & Amos Tversky
A certain town is served by two hospitals. In the larger hospital about 45
babies are born each day, and in the smaller hospital about 15 babies are
born each day. As you know, about 50% of all babies are boys. However, the
exact percentage varies from day to day. Sometimes it may be higher than
50%, sometimes lower. For a period of 1 year, each hospital recorded the
days on which more than 60% of the babies born were boys. Which hospital
do you think recorded more such days?
1.The larger hospital
2.The smaller hospital
3.About the same (that is, within 5% of each other)
Expected value:
Hosp15 = 57 days
Hosp45 = 26 days
Hospital 15 Hospital 45
Small sample gives noisier
estimates: red line bounces
around much more than blue line
Insensitivity to sample size
• People have strong intuitions about random
• These intuitions are wrong in fundamental
• These intuitions are shared by naive subjects and
by trained scientists;
• Intuitions are applied with unfortunate
consequences in the course of scientific inquiry
Tversky, A., & Kahneman, D. (1971). Belief in the law of small
numbers. Psychological Bulletin, 76, 105-110.
Work in progress: The experimenter game
• You have a budget to improve reading ability across Oxfordshire –
potential for roll-out to hundreds of schools. You’ve been offered
a remedy that claims to boost children’s reading ability by half a
standard deviation
• If you buy it and it turns out useless, you’ll lose a lot of money
• If you buy it and it really works, it will be worth a lot of money
• You’re not sure whether to trust the vendor – you think there’s a
50:50 chance that it really works
• You can run some tests on samples of children, but it costs
money – the more children you test, the more expensive.
• You have an optimization problem!
• So what’s your experimental strategy?
You decide to run a study with two groups of N children
What value of N should you start with? - let’s try 20 per group
Here’s a sample of data: do you think this is sufficient to decide
whether to adopt/reject the intervention?
This sample was drawn from population with no real difference
Here’s another sample of data: do you think this is sufficient to
decide whether to adopt/reject the intervention?
This time the sample was drawn from a sample with a true
With small sample, difference can look small when there is a
true effect – this illustrates problem of LOW POWER
You decide to increase sample per group to 50
This time, the impression from the sample of data gives a better
indication of the true effect in the population
But how reliable this impression is depends on the effect size, i.e.
the separation in the means of the population distributions
effect size
= .2
effect size
= .5
Separation between red dots (drawn from population with true
effect) and grey dots (drawn from population with no effect)
shows sample size where can reliably detect a true effect
Failure to appreciate power of ‘the prepared mind’
Tendency to see patterns in things
Example from Lazic, S. (2016) Experimental Design for Laboratory Biologists
Position of bomb hits:
General has map of bomb hits and wants to know if bombs were
dropped at random or whether some sites are being targeted.
Which map suggests targeting? Blue, red or neither?
General message: we tend to assume random data are regular, and
so try to interpret patterns when there is irregularity
The blue map may look as if there is targeting, especially if there are
potential targets at A or B.
In fact, blue X and Y co-ordinates were selected at random.
The red map does not suggest targeting, but it is not random. The co-
ordinates were selected to be evenly distributed, and then jittered
But! seeing novel patterns in complex data is one of the most
important and exciting aspects of science!
Consider Brodmann (1909): identified brain regions with different cell
types – not obvious: required expertise and painstaking study
Bailey and von Bonin (1951) noted problems in Brodmann's approach
— lack of observer independency, reproducibility and objectivity
Yet Brodman’s areas stood test of time: still used today
Special expertise or Jesus in toast?
How to decide
• Eradicate subjectivity from methods
• Adopt standards from industry for checking/double-
• Automate data collection and analysis as far as possible
• Make recordings of methods (e.g. Journal of Visualised
• Make data and analysis scripts open
Confirmation bias
How to do good science
That is the idea that we all hope you have learned in
studying science in school… ….
It’s a kind of scientific integrity, a principle of scientific
thought that corresponds to a kind of utter honesty—a
kind of leaning over backwards.
For example, if you’re doing an experiment, you should
report everything that you think might make it invalid—
not only what you think is right about it: other causes
that could possibly explain your results; and things you
thought of that you’ve eliminated by some other
experiment, and how they worked—to make sure the
other fellow can tell they have been eliminated.
Richard Feynman,
Caltech 1974 commencement address
Wason task:
a way of thinking about experimental design
Each card has a number on one side and a patch of colour on the
You are asked to test the hypothesis that – for these 4 cards - if an
even number appears on one side, then the opposite side is red.
• Are any of the cards irrelevant to the hypothesis?
• Are any of the cards critical to the hypothesis?
• Which card(s) would you turn over to test the hypothesis?
Wason task:
a way of thinking about experimental design
Each card has a number on one side and patch of colour on the other.
You are asked to test the hypothesis that – for these 4 cards - if an
even number appears on one side, then the opposite side is red.
• Usual response is B & C are critical.
• But C is not critical (we’re testing ‘if P then Q’, not ‘if Q then P’)
• D is critical as it has potential to disconfirm hypothesis – but usually
Wason task:
Shows how confirmation bias can affect
experimental design
We need to design experiments to look for disconfirmation of a theory .
In practice: "To test a hypothesis, we think of a result that would be found if the
hypothesis were true and then look for that result" (J. Baron, 1988, p. 231).
In survey of 84 scientists (physicists,biologists, psychologists,
sociologists) Mahoney (1976) found fewer than 10% correctly identified
the critical cards
“The self-deception comes
in that over the next 20
years, people believed
they saw specks of light
that corresponded to what
they thought Vulcan
should look during an
eclipse: round objects
crossing the face of the
sun, which were
interpreted as transits of
Confirmation bias at level of observations:
Seeing what you expect to see
• Cherry-picking may not be deliberate
• We find it much easier to process and remember information
that agrees with our viewpoint
Confirmation bias affects how we remember
and process information
Twin studies of SLI
same-sex twins
Lewis & Thompson, 1992 .86 .48
Bishop et al, 1995 .70 .46
Tomblin & Buckwalter, 1998 .96 .69
Hayiou-Thomas et al, 2005 .36 .33
A personal example: Slide from talks I gave on genetics
of language disorder
Twin concordance
points to genetic
influence when
Twin studies of SLI
same-sex twins
Lewis & Thompson, 1992 .86 .48
Bishop et al, 1995 .70 .46
Tomblin & Buckwalter, 1998 .96 .69
Hayiou-Thomas et al, 2005 .36 .33
I continued to use the original slide after 2005, despite
this additional study I had co-authored
I failed to mention this in talks for several years – I literally forgot about it –
presumably because it did not fit!
Example also illustrates how we will do further research
to try to make sense of data that does not fit our ideas –
but look far less closely when data does fit.
For denouement of this story, see
Confirmation bias affects literature reviews
Most literature reviews cherry-pick the evidence
(that’s why I’m not identifying this specific e.g.)
“Regardless of etiology, cerebellar neuropathology
commonly occurs in autistic individuals. Cerebellar
hypoplasia and reduced cerebellar Purkinje cell
numbers are the most consistent neuropathologies
linked to autism [8, 9, 10, 11, 12, 13]. MRI studies
report that autistic children have smaller cerebellar
vermal volume in comparison to typically developing
children [14].”
Example: Study published in 2013
• I was surprised by this introduction to a paper, as it did not fit my impression of the
literature on neuropathology in autism: but the authors seemed to cite a lot of
supportive evidence
• I checked to see if there was a relevant meta-analysis: there was….
Standardized mean difference is +ve when cerebellar volume is greater in ASD
Meta-analysis: Traut et al (2018)
Though Webb et al (ref 14) did find area of vermis smaller in ASD after covarying cerebellum size
Ref [14] –
Other studies
mostly found
no difference
or increase –
opposite of
what claimed
in 2013 paper
Confirmation bias tends to produce errors of
omission – these are generally thought to be less
serious than errors of commission (i.e. making
stuff up)
But consequences can be major
Errors of omission in reporting research
“[I]t is a truly gross ethical violation for a researcher to
suppress reporting of difficult-to-explain or embarrassing
data in order to present a neat and attractive package
to a journal editor.” (Greenwald, 1975, p. 19)
“Failure to report results from a clinical trial is equivalent
to fraud.” Iain Chalmers, personal communication
Consequence of omission errors in literature reviews
• When we read a peer-reviewed paper, we tend to trust the
citations that back up a point
• When we come to write our own paper, we cite the same
• A good scientist won’t cite papers without reading them, but
even this won’t save you from bias – you inherit it from prior
• If prior papers only cite materials agreeing with a viewpoint,
that viewpoint gets entrenched
• You won’t know – unless you explicitly search – that there are
other papers that give a different picture
The (partial*) solution
Always start with a systematic review
• Systematic review
• Collecting and summarise all empirical evidence that fits
pre-specified eligibility criteria to address a specific
• Meta-analysis
• Use statistical methods to summarise the results of
these studies
*But depends on finding all relevant papers
Example from Lazic, S. (2016) Experimental Design for Laboratory Biologists
100 relevant studies on gene/disease association
95 studies find no association.
Negative findings tend not to
be mentioned in Abstracts
5 false positive results. Disease
and gene mentioned in
Pubmed search for
disease AND gene
5 supporting and one
negative study found
Omission errors, commission errors and paltering
Paltering differs from
• Lying by omission (the passive omission of relevant information)
• Lying by commission (the active use of false statements)
I have no data on this, but personal experience suggests paltering is
common in literature reviews – and in reporting results
Let’s take another look at that cerebellum paper:
statements that are not untrue, but are misleading
“Regardless of etiology, cerebellar neuropathology
commonly occurs in autistic individuals. Cerebellar
hypoplasia and reduced cerebellar Purkinje cell
numbers are the most consistent neuropathologies
linked to autism [8, 9, 10, 11, 12, 13]. MRI studies
report that autistic children have smaller cerebellar
vermal volume in comparison to typically developing
children [14].” Impression of large body of work, but
mostly reviews of same few studies
Study 14 by Webb et al: found overall increase in cerebellum size:
smaller vermis effect only after adjusting total cerebellar volume
In terms of ethical behaviour, rank order the following
• Omission of relevant studies
• Stating that a study found something that it didn’t
• Stating a study result that was true, but in a misleading
I don’t know of studies looking at this in science reporting,
but analogous behaviour rated in studies of negotiation by
Rogers et al (2016)
• Omission of relevant information
• Lying (untrue statement)
• Stating something that is true, but in a misleading way
*Rogers, T. et al (2016). Artful paltering: The risks and rewards of using truthful statements to
mislead others. Journal of Personality and Social Psychology, 112(3), 456-473.
Neither omission of information nor paltering seen as
honest, but both are more acceptable than lying
How common are these in literature reviews? Does it
• Omission of relevant studies
• Stating that a study found something that it didn’t
• Stating a study result that was true, but in a misleading
My view:
Adoption of these behaviours in science is likely to depend on:
(a) Is it rewarded?
(b) Will it be detected?
(c) If it is, could you avoid blame?
(d) Are there obvious victims?
(e) Is ‘everyone doing it’?
Overlooked victims:
• Potential users (patients, etc)
• Researchers trying to build on results
• Funders
A further, overarching problem
The need for narrative
“Another reason why HARKed research reports may fare better in
the review and publication process is that they not only provide a
better fit to a specific good science script, they may also provide a
better fit to the more general good story script.
Positing a theory serves as an effective "initiating event." It gives
certain events significance and justifies the investigators'
subsequent purposeful activities directed at the goal of testing the
hypotheses. And, when one HARKs, a "happy ending” (i.e.,
confirmation) is guaranteed.”
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Darwinian processes in survival of ideas
“Examples of memes are tunes, ideas, catch-
phrases, clothes fashions, ways of making
pots or of building arches. Just as genes
propagate themselves in the gene pool by
leaping from body to body via sperms or
eggs, so memes propagate themselves in the
meme pool by leaping from brain to brain via
a process which, in the broad sense, can be
called imitation.”
R. Dawkins
Successful meme
• Easy to understand, remember, and communicate
to others
• Not helped by reporting everything!
• Not helped by reporting null results!
• May be influenced by whether confers advantage to
the person communicating
• Survival does not depend on whether they are
useful, true, or potentially harmful
Cognitive biases pervade every
step of the research process
Reading literature Confirmation bias, Omissions
Experimental design
Confirmation bias, Law of
small numbers
Experimental observations
Seeing patterns,
Confirmation bias
Data analysis
Confirmation bias, Seeing
patterns, Law of small
numbers, Omissions
Scientific reporting
Confirmation bias,
Omissions, Need for
Will anything change?
“It really is striking just for how long there have been
reports about the poor quality of research
methodology, inadequate implementation of research
methods and use of inappropriate analysis procedures
as well as lack of transparency of reporting. All have
failed to stir researchers, funders, regulators,
institutions or companies into action”. Bustin, 2014
Reasons for optimism
• Concern from those who use research:
• Doctors and patients
• Pharma companies
• Concern from funders
• Increase in studies quantifying the problem
• Social media
Professor Dorothy Bishop, FRS, FMedSci, FBA,
Wellcome Trust Principal Research Fellow,
Department of Experimental Psychology,
Anna Watts Building,
Woodstock Road,
OX2 6GG. @deevybee

More Related Content

What's hot

Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychology
Matti Heino
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revisedSandra Nicks
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Research
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
Research method
Research methodResearch method
Research method
Ch Irfan
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
Behaviour change and intervention research
Behaviour change and intervention researchBehaviour change and intervention research
Behaviour change and intervention research
Matti Heino
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
Mayo & parker spsp 2016 june 16
Mayo & parker   spsp 2016 june 16Mayo & parker   spsp 2016 june 16
Mayo & parker spsp 2016 june 16
The Research Problem Statement
The Research Problem StatementThe Research Problem Statement
The Research Problem Statement
Andreas Meiszner
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
CourseProjectReviewPaper.jktamanaPDFJasmine Tamanaha
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction

What's hot (20)

Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychology
Aron chpt 5 ed
Aron chpt 5 edAron chpt 5 ed
Aron chpt 5 ed
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revised
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Research
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
Research method
Research methodResearch method
Research method
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
Behaviour change and intervention research
Behaviour change and intervention researchBehaviour change and intervention research
Behaviour change and intervention research
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
Es estadísticas duro
Es estadísticas duroEs estadísticas duro
Es estadísticas duro
The Research Problem Statement
The Research Problem StatementThe Research Problem Statement
The Research Problem Statement
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
What is research
What is researchWhat is research
What is research
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction

Similar to Insights from psychology on lack of reproducibility

Chi square test
Chi square testChi square test
Chi square test
Slideshare cork skep3
Slideshare cork skep3Slideshare cork skep3
Slideshare cork skep3
Brian Hughes
The Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year LaterThe Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year Later
Intro Slides
Intro SlidesIntro Slides
Intro Slides
Eric Castro
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docxTopic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Class 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptxClass 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptx
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
Dorothy Bishop
Delusion delusion slideshare
Delusion delusion slideshareDelusion delusion slideshare
Delusion delusion slideshare
Brian Hughes
Research Methodology Module-05
Research Methodology Module-05Research Methodology Module-05
Research Methodology Module-05
Kishor Ade
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
Edmund Chattoe-Brown
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Balanced Scorecard Institute-Spider Strategies Strategy Execution Summit 2015
Strategy execution ebook why most ideas fail pdf
Strategy execution ebook  why most ideas fail pdfStrategy execution ebook  why most ideas fail pdf
Strategy execution ebook why most ideas fail pdf
Jeroen De Flander
David Didau ResearchED
David Didau ResearchEDDavid Didau ResearchED
David Didau ResearchEDDavid Didau
Avoiding flawed logic
Avoiding flawed logicAvoiding flawed logic
Avoiding flawed logicrogerperezFC
Distilling the crowd: the next evolutionary step in crowd wisdom
Distilling the crowd: the next evolutionary step in crowd wisdomDistilling the crowd: the next evolutionary step in crowd wisdom
Distilling the crowd: the next evolutionary step in crowd wisdom
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
Stephen Senn

Similar to Insights from psychology on lack of reproducibility (20)

Chi square test
Chi square testChi square test
Chi square test
Slideshare cork skep3
Slideshare cork skep3Slideshare cork skep3
Slideshare cork skep3
The Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year LaterThe Reproducibility Crisis in Psychological Science: One Year Later
The Reproducibility Crisis in Psychological Science: One Year Later
Intro Slides
Intro SlidesIntro Slides
Intro Slides
Research by MAGIC
Research by MAGICResearch by MAGIC
Research by MAGIC
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docxTopic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Topic Learning TeamNumber of Pages 2 (Double Spaced)Num.docx
Class 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptxClass 5 Hypothesis & Normal Disdribution.pptx
Class 5 Hypothesis & Normal Disdribution.pptx
What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?What is the reproducibility crisis in science and what can we do about it?
What is the reproducibility crisis in science and what can we do about it?
Delusion delusion slideshare
Delusion delusion slideshareDelusion delusion slideshare
Delusion delusion slideshare
Research Methodology Module-05
Research Methodology Module-05Research Methodology Module-05
Research Methodology Module-05
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
The Role of Agent-Based Modelling in Extending the Concept of Bounded Rationa...
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Jeroen De flander Strategy Execution ebook_ Why Most Ideas Fail
Strategy execution ebook why most ideas fail pdf
Strategy execution ebook  why most ideas fail pdfStrategy execution ebook  why most ideas fail pdf
Strategy execution ebook why most ideas fail pdf
David Didau ResearchED
David Didau ResearchEDDavid Didau ResearchED
David Didau ResearchED
Avoiding flawed logic
Avoiding flawed logicAvoiding flawed logic
Avoiding flawed logic
Distilling the crowd: the next evolutionary step in crowd wisdom
Distilling the crowd: the next evolutionary step in crowd wisdomDistilling the crowd: the next evolutionary step in crowd wisdom
Distilling the crowd: the next evolutionary step in crowd wisdom
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians

More from Dorothy Bishop

Exercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexiaExercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexia
Dorothy Bishop
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill Pandemic
Dorothy Bishop
Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
Dorothy Bishop
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
Dorothy Bishop
Otitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment biasOtitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment bias
Dorothy Bishop
What are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEFWhat are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEF
Dorothy Bishop
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
Dorothy Bishop
Data simulation basics
Data simulation basicsData simulation basics
Data simulation basics
Dorothy Bishop
Simulating data to gain insights into power and p-hacking
Simulating data to gain insights intopower and p-hackingSimulating data to gain insights intopower and p-hacking
Simulating data to gain insights into power and p-hacking
Dorothy Bishop
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
Dorothy Bishop
What is Developmental Language Disorder
What is Developmental Language DisorderWhat is Developmental Language Disorder
What is Developmental Language Disorder
Dorothy Bishop
Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...
Dorothy Bishop
Fallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakesFallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakes
Dorothy Bishop
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Dorothy Bishop
Southampton: lecture on TEF
Southampton: lecture on TEFSouthampton: lecture on TEF
Southampton: lecture on TEF
Dorothy Bishop
Reading list: What’s wrong with our universities
Reading list: What’s wrong with our universitiesReading list: What’s wrong with our universities
Reading list: What’s wrong with our universities
Dorothy Bishop
IJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : ReferencesIJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : References
Dorothy Bishop
What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...
Dorothy Bishop
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
Dorothy Bishop
Language, sex chromosomes and autism: unravelling the mystery
Language, sex chromosomes and autism: unravelling the mysteryLanguage, sex chromosomes and autism: unravelling the mystery
Language, sex chromosomes and autism: unravelling the mystery
Dorothy Bishop

More from Dorothy Bishop (20)

Exercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexiaExercise/fish oil intervention for dyslexia
Exercise/fish oil intervention for dyslexia
Open Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill PandemicOpen Research Practices in the Age of a Papermill Pandemic
Open Research Practices in the Age of a Papermill Pandemic
Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.Language-impaired preschoolers: A follow-up into adolescence.
Language-impaired preschoolers: A follow-up into adolescence.
Journal club summary: Open Science save lives
Journal club summary: Open Science save livesJournal club summary: Open Science save lives
Journal club summary: Open Science save lives
Otitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment biasOtitis media with effusion: an illustration of ascertainment bias
Otitis media with effusion: an illustration of ascertainment bias
What are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEFWhat are metrics good for? Reflections on REF and TEF
What are metrics good for? Reflections on REF and TEF
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
Data simulation basics
Data simulation basicsData simulation basics
Data simulation basics
Simulating data to gain insights into power and p-hacking
Simulating data to gain insights intopower and p-hackingSimulating data to gain insights intopower and p-hacking
Simulating data to gain insights into power and p-hacking
Talk on reproducibility in EEG research
Talk on reproducibility in EEG researchTalk on reproducibility in EEG research
Talk on reproducibility in EEG research
What is Developmental Language Disorder
What is Developmental Language DisorderWhat is Developmental Language Disorder
What is Developmental Language Disorder
Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...Developmental language disorder and auditory processing disorder: 
Same or di...
Developmental language disorder and auditory processing disorder: 
Same or di...
Fallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakesFallibility in science: Responsible ways to handle mistakes
Fallibility in science: Responsible ways to handle mistakes
Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
Southampton: lecture on TEF
Southampton: lecture on TEFSouthampton: lecture on TEF
Southampton: lecture on TEF
Reading list: What’s wrong with our universities
Reading list: What’s wrong with our universitiesReading list: What’s wrong with our universities
Reading list: What’s wrong with our universities
IJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : ReferencesIJLCD Winter Lecture 2016-7 : References
IJLCD Winter Lecture 2016-7 : References
What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...What's wrong with our Universities, and will the Teaching Excellence Framewor...
What's wrong with our Universities, and will the Teaching Excellence Framewor...
Bishop reproducibility references nov2016
Bishop reproducibility references nov2016Bishop reproducibility references nov2016
Bishop reproducibility references nov2016
Language, sex chromosomes and autism: unravelling the mystery
Language, sex chromosomes and autism: unravelling the mysteryLanguage, sex chromosomes and autism: unravelling the mystery
Language, sex chromosomes and autism: unravelling the mystery

Recently uploaded

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx

Recently uploaded (20)

Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx

Insights from psychology on lack of reproducibility

  • 1. Insights from psychology on lack of reproducibility Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee Talk given at All Souls seminar on Reproducibility and Open Research, 31/10/18
  • 2. The four horsemen of the Apocalypse P-hacking Publication biasLow power HARKing
  • 3. P-hacking Simple explainer using poker Probability from unbiased deck of cards = 1 in 50 • If magician tells you he’ll deal you ‘3 of a kind’, and he does so, you should be impressed 3 of a kind • If magician deals 50 hands, and one of them is ‘3 of a kind’, you should not be impressed ‘Surprisingness’ of a result only interpretable in context of full dataset
  • 4. P-hacking Simple explainer using poker Probability from unbiased deck of cards = 1 in 50 • If magician tells you he’ll deal you ‘3 of a kind’, and he does so, you should be impressed 3 of a kind • If magician deals 50 hands, and one of them is ‘3 of a kind’, you should not be impressed ‘Surprisingness’ of a result only interpretable in context of full dataset
  • 5. P-hacking Simple explainer using poker Probability from unbiased deck of cards = 1 in 50 • If magician tells you he’ll deal you ‘3 of a kind’, and he does so, you should be impressed 3 of a kind • If magician deals 50 hands, and one of them is ‘3 of a kind’, you should not be impressed ‘Surprisingness’ of a result only interpretable in context of full dataset
  • 6. 1956 De Groot Failure to distinguish between hypothesis-testing and hypothesis- generating (exploratory) research -> misuse of statistical tests Historical timeline: concerns about reproducibility Describes P-hacking (though that term not used) Situation when 10 statistical tests done in a study “….when N=10 it is as if one participates… in a game of chance with “probability of losing” α for each “draw” or “throw”. The probability that we do not lose a single time in 10 draws can be calculated in the case that the draws are independent; it equals (1 − α)^10. For α = 0.05, the traditional 5% level, this becomes 0.9510 = 0.60. This means, therefore, that we have a 40% chance of rejecting at least one of our 10 null hypotheses — falsely”
  • 7. 1956 De Groot Low power 1969 Cohen Sample size too small to reliably detect a true effect of interest
  • 8. 1956 De Groot 1975 Greenwald The “file drawer” problem 1979 Rosenthal Prejudice against the null “As it is functioning in at least some areas of behavioral science research, the research- publication system may be regarded as a device for systematically generating and propagating anecdotal information.” Publication bias 1969 Cohen Nonsignificant findings not published: literature gives distorted impression
  • 9. 1956 De Groot 1975 Greenwald 1979 Rosenthal HARKing 1969 Cohen 1998 Kerr • “Presenting post hoc hypotheses in a research report as if they were, in fact, a priori hypotheses.” • A way of ”translating type I errors into theory” In survey by Kerr & Harris (1998), 52% respondents said they knew of editors/reviewers encouraging HARKing
  • 10. If we’ve known about this for decades, why haven’t the problems been fixed? No one cause: need to consider research environment and incentives This talk: focus on cognitive biases that make it hard to do science well Idea: doing good science is in opposition to many of our natural ways of thinking
  • 11. Cognitive biases that make it hard to do science well • Failure to understand probability • Tendency to see patterns in things • Confirmation bias • Errors of omission seen as acceptable • Need for narrative
  • 12. A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days? 1.The larger hospital 2.The smaller hospital 3.About the same (that is, within 5% of each other) Example from Daniel Kahneman & Amos Tversky Consider this problem
  • 13. Example from Daniel Kahneman & Amos Tversky A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days? 1.The larger hospital 2.The smaller hospital 3.About the same (that is, within 5% of each other) Expected value: Hosp15 = 57 days Hosp45 = 26 days
  • 14. Example from Daniel Kahneman & Amos Tversky A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days? 1.The larger hospital 2.The smaller hospital 3.About the same (that is, within 5% of each other) Expected value: Hosp15 = 57 days Hosp45 = 26 days Day Hospital 15 Hospital 45 Small sample gives noisier estimates: red line bounces around much more than blue line
  • 15. Insensitivity to sample size • People have strong intuitions about random sampling; • These intuitions are wrong in fundamental respects; • These intuitions are shared by naive subjects and by trained scientists; • Intuitions are applied with unfortunate consequences in the course of scientific inquiry Tversky, A., & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105-110.
  • 16. Work in progress: The experimenter game • You have a budget to improve reading ability across Oxfordshire – potential for roll-out to hundreds of schools. You’ve been offered a remedy that claims to boost children’s reading ability by half a standard deviation • If you buy it and it turns out useless, you’ll lose a lot of money • If you buy it and it really works, it will be worth a lot of money • You’re not sure whether to trust the vendor – you think there’s a 50:50 chance that it really works • You can run some tests on samples of children, but it costs money – the more children you test, the more expensive. • You have an optimization problem! • So what’s your experimental strategy?
  • 17. You decide to run a study with two groups of N children What value of N should you start with? - let’s try 20 per group Here’s a sample of data: do you think this is sufficient to decide whether to adopt/reject the intervention?
  • 18. This sample was drawn from population with no real difference
  • 19. Here’s another sample of data: do you think this is sufficient to decide whether to adopt/reject the intervention?
  • 20. This time the sample was drawn from a sample with a true effect. With small sample, difference can look small when there is a true effect – this illustrates problem of LOW POWER
  • 21. You decide to increase sample per group to 50
  • 22. This time, the impression from the sample of data gives a better indication of the true effect in the population But how reliable this impression is depends on the effect size, i.e. the separation in the means of the population distributions
  • 23. Population effect size = .2 Population effect size = .5 Separation between red dots (drawn from population with true effect) and grey dots (drawn from population with no effect) shows sample size where can reliably detect a true effect
  • 24. Failure to appreciate power of ‘the prepared mind’ Tendency to see patterns in things
  • 26. Example from Lazic, S. (2016) Experimental Design for Laboratory Biologists Position of bomb hits: General has map of bomb hits and wants to know if bombs were dropped at random or whether some sites are being targeted. Which map suggests targeting? Blue, red or neither?
  • 27. General message: we tend to assume random data are regular, and so try to interpret patterns when there is irregularity The blue map may look as if there is targeting, especially if there are potential targets at A or B. In fact, blue X and Y co-ordinates were selected at random. The red map does not suggest targeting, but it is not random. The co- ordinates were selected to be evenly distributed, and then jittered A B
  • 28. But! seeing novel patterns in complex data is one of the most important and exciting aspects of science! Consider Brodmann (1909): identified brain regions with different cell types – not obvious: required expertise and painstaking study Bailey and von Bonin (1951) noted problems in Brodmann's approach — lack of observer independency, reproducibility and objectivity Yet Brodman’s areas stood test of time: still used today
  • 29. Special expertise or Jesus in toast? How to decide • Eradicate subjectivity from methods • Adopt standards from industry for checking/double- checking • Automate data collection and analysis as far as possible • Make recordings of methods (e.g. Journal of Visualised Experiments) • Make data and analysis scripts open
  • 31. How to do good science That is the idea that we all hope you have learned in studying science in school… …. It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty—a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid— not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated. Richard Feynman, Caltech 1974 commencement address
  • 32. Wason task: a way of thinking about experimental design Each card has a number on one side and a patch of colour on the other. You are asked to test the hypothesis that – for these 4 cards - if an even number appears on one side, then the opposite side is red. • Are any of the cards irrelevant to the hypothesis? • Are any of the cards critical to the hypothesis? • Which card(s) would you turn over to test the hypothesis? A B C D
  • 33. Wason task: a way of thinking about experimental design Each card has a number on one side and patch of colour on the other. You are asked to test the hypothesis that – for these 4 cards - if an even number appears on one side, then the opposite side is red. • Usual response is B & C are critical. • But C is not critical (we’re testing ‘if P then Q’, not ‘if Q then P’) • D is critical as it has potential to disconfirm hypothesis – but usually overlooked A B C D
  • 34. Wason task: Shows how confirmation bias can affect experimental design We need to design experiments to look for disconfirmation of a theory . In practice: "To test a hypothesis, we think of a result that would be found if the hypothesis were true and then look for that result" (J. Baron, 1988, p. 231). In survey of 84 scientists (physicists,biologists, psychologists, sociologists) Mahoney (1976) found fewer than 10% correctly identified the critical cards
  • 35. “The self-deception comes in that over the next 20 years, people believed they saw specks of light that corresponded to what they thought Vulcan should look during an eclipse: round objects crossing the face of the sun, which were interpreted as transits of Vulcan.” Confirmation bias at level of observations: Seeing what you expect to see
  • 36. • Cherry-picking may not be deliberate • We find it much easier to process and remember information that agrees with our viewpoint Confirmation bias affects how we remember and process information
  • 37. 37 Twin studies of SLI probandwise concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 .86 .48 Bishop et al, 1995 .70 .46 Tomblin & Buckwalter, 1998 .96 .69 Hayiou-Thomas et al, 2005 .36 .33 A personal example: Slide from talks I gave on genetics of language disorder Twin concordance points to genetic influence when MZ > DZ
  • 38. 38 Twin studies of SLI probandwise concordance: same-sex twins MZ DZ Lewis & Thompson, 1992 .86 .48 Bishop et al, 1995 .70 .46 Tomblin & Buckwalter, 1998 .96 .69 Hayiou-Thomas et al, 2005 .36 .33 I continued to use the original slide after 2005, despite this additional study I had co-authored I failed to mention this in talks for several years – I literally forgot about it – presumably because it did not fit!
  • 39. 39 Example also illustrates how we will do further research to try to make sense of data that does not fit our ideas – but look far less closely when data does fit. For denouement of this story, see
  • 40. Confirmation bias affects literature reviews
  • 41. Most literature reviews cherry-pick the evidence (that’s why I’m not identifying this specific e.g.) “Regardless of etiology, cerebellar neuropathology commonly occurs in autistic individuals. Cerebellar hypoplasia and reduced cerebellar Purkinje cell numbers are the most consistent neuropathologies linked to autism [8, 9, 10, 11, 12, 13]. MRI studies report that autistic children have smaller cerebellar vermal volume in comparison to typically developing children [14].” Example: Study published in 2013 • I was surprised by this introduction to a paper, as it did not fit my impression of the literature on neuropathology in autism: but the authors seemed to cite a lot of supportive evidence • I checked to see if there was a relevant meta-analysis: there was….
  • 42. Standardized mean difference is +ve when cerebellar volume is greater in ASD Meta-analysis: Traut et al (2018) Though Webb et al (ref 14) did find area of vermis smaller in ASD after covarying cerebellum size Ref [14] – larger cerebellum Other studies mostly found no difference or increase – opposite of what claimed in 2013 paper
  • 43. Confirmation bias tends to produce errors of omission – these are generally thought to be less serious than errors of commission (i.e. making stuff up) But consequences can be major
  • 44. Errors of omission in reporting research “[I]t is a truly gross ethical violation for a researcher to suppress reporting of difficult-to-explain or embarrassing data in order to present a neat and attractive package to a journal editor.” (Greenwald, 1975, p. 19) “Failure to report results from a clinical trial is equivalent to fraud.” Iain Chalmers, personal communication
  • 45. Consequence of omission errors in literature reviews • When we read a peer-reviewed paper, we tend to trust the citations that back up a point • When we come to write our own paper, we cite the same materials • A good scientist won’t cite papers without reading them, but even this won’t save you from bias – you inherit it from prior papers • If prior papers only cite materials agreeing with a viewpoint, that viewpoint gets entrenched • You won’t know – unless you explicitly search – that there are other papers that give a different picture
  • 46. The (partial*) solution Always start with a systematic review • Systematic review • Collecting and summarise all empirical evidence that fits pre-specified eligibility criteria to address a specific question • Meta-analysis • Use statistical methods to summarise the results of these studies *But depends on finding all relevant papers
  • 47. Example from Lazic, S. (2016) Experimental Design for Laboratory Biologists 100 relevant studies on gene/disease association 95 studies find no association. Negative findings tend not to be mentioned in Abstracts 5 false positive results. Disease and gene mentioned in Abstract Pubmed search for disease AND gene 5 supporting and one negative study found
  • 48. Omission errors, commission errors and paltering Paltering differs from • Lying by omission (the passive omission of relevant information) • Lying by commission (the active use of false statements) I have no data on this, but personal experience suggests paltering is common in literature reviews – and in reporting results
  • 49. Let’s take another look at that cerebellum paper: statements that are not untrue, but are misleading “Regardless of etiology, cerebellar neuropathology commonly occurs in autistic individuals. Cerebellar hypoplasia and reduced cerebellar Purkinje cell numbers are the most consistent neuropathologies linked to autism [8, 9, 10, 11, 12, 13]. MRI studies report that autistic children have smaller cerebellar vermal volume in comparison to typically developing children [14].” Impression of large body of work, but mostly reviews of same few studies Study 14 by Webb et al: found overall increase in cerebellum size: smaller vermis effect only after adjusting total cerebellar volume
  • 50. In terms of ethical behaviour, rank order the following behaviours: • Omission of relevant studies • Stating that a study found something that it didn’t • Stating a study result that was true, but in a misleading way
  • 51. I don’t know of studies looking at this in science reporting, but analogous behaviour rated in studies of negotiation by Rogers et al (2016) • Omission of relevant information • Lying (untrue statement) • Stating something that is true, but in a misleading way (paltering) 23% 5% 32% Honesty judgement in negotiation study* *Rogers, T. et al (2016). Artful paltering: The risks and rewards of using truthful statements to mislead others. Journal of Personality and Social Psychology, 112(3), 456-473. Neither omission of information nor paltering seen as honest, but both are more acceptable than lying
  • 52. How common are these in literature reviews? Does it matter? • Omission of relevant studies • Stating that a study found something that it didn’t • Stating a study result that was true, but in a misleading way My view: Adoption of these behaviours in science is likely to depend on: (a) Is it rewarded? (b) Will it be detected? (c) If it is, could you avoid blame? (d) Are there obvious victims? (e) Is ‘everyone doing it’? Overlooked victims: • Potential users (patients, etc) • Researchers trying to build on results • Funders
  • 53. A further, overarching problem The need for narrative “Another reason why HARKed research reports may fare better in the review and publication process is that they not only provide a better fit to a specific good science script, they may also provide a better fit to the more general good story script. Positing a theory serves as an effective "initiating event." It gives certain events significance and justifies the investigators' subsequent purposeful activities directed at the goal of testing the hypotheses. And, when one HARKs, a "happy ending” (i.e., confirmation) is guaranteed.” Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
  • 54. Darwinian processes in survival of ideas “Examples of memes are tunes, ideas, catch- phrases, clothes fashions, ways of making pots or of building arches. Just as genes propagate themselves in the gene pool by leaping from body to body via sperms or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation.” R. Dawkins
  • 55. Successful meme • Easy to understand, remember, and communicate to others • Not helped by reporting everything! • Not helped by reporting null results! • May be influenced by whether confers advantage to the person communicating • Survival does not depend on whether they are useful, true, or potentially harmful
  • 56. Cognitive biases pervade every step of the research process Reading literature Confirmation bias, Omissions Experimental design Confirmation bias, Law of small numbers Experimental observations Seeing patterns, Confirmation bias Data analysis Confirmation bias, Seeing patterns, Law of small numbers, Omissions Scientific reporting Confirmation bias, Omissions, Need for narrative
  • 57. Will anything change? “It really is striking just for how long there have been reports about the poor quality of research methodology, inadequate implementation of research methods and use of inappropriate analysis procedures as well as lack of transparency of reporting. All have failed to stir researchers, funders, regulators, institutions or companies into action”. Bustin, 2014 Reasons for optimism • Concern from those who use research: • Doctors and patients • Pharma companies • Concern from funders • Increase in studies quantifying the problem • Social media
  • 58. 58 Professor Dorothy Bishop, FRS, FMedSci, FBA, Wellcome Trust Principal Research Fellow, Department of Experimental Psychology, Anna Watts Building, Woodstock Road, Oxford, OX2 6GG. @deevybee updated-24th-nov.html