SlideShare a Scribd company logo
EXCURSIONS INTO THE GARDEN OF THE
FORKING PATHS
P-VALUE FETISHISATION, REPLICATION CRISIS, AND THE
TENSION BETWEEN INNOVATION AND CONFIRMATION
http://bit.ly/helmholtzdirnagl
Personal motivation:
Decades of futile translational stroke research
• Millions of animals killed
• Hundreds (thousands?) of neutral or
negative clinical trials
• Thousands of researchers and
clinicians globally
• Many billions spent on preclinical
research ?
Take home I:
The garden of the forking paths
http://bit.ly/2q2gtXqhttp://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
http://bit.ly/2JzblTR
Take home II
No scientific progress without reproducibility failures
To boldly go where no man…
Exploration at low base rate
Innovation
‚Paradigm shift‘
Incompetence
Bad designs
Tacit knowledge (bad reporting)
Low validity (bias)
Misconduct
The Good The Bad
Essential non-reproducibility
(Kuhn)
Detrimental non-reproducibility
(Popper)
Taken home III
Confirmation – weeding out the false positives of exploration
Jonathan
Kimmelman
PLoS Biol. (2014) 12:e1001863.
>6000 cit.
PLoS Med. 2005;2:e124
Modfied after Gary Larson
Bias: Subjective reality informed by ones preferences
Macleod MR, et al. (2015) Risk of Bias in Reports of In Vivo Research:
A Focus for Improvement. PLoS Biol 13: e1002273.
Low prevalence of methods to prevent bias
Alzheimer's disease models
models
Blinded conduct of
experiment
Blinded assessment
of outcome
Blinded assessment of
outcome
Stroke models (NXY-095)
Blinded assessment of behavioural outcome
No Yes
Improvementinbehaviouraloutcome
(StandardisedEffectSize)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Blinded assessment of behavioural outcome
No Yes
Improvementinbehaviouraloutcome
(StandardisedEffectSize) 0.0
0.2
0.4
0.6
0.8
1.0
1.2
Blinded assessment of behav
No
Improvementinbehaviouraloutcome
(StandardisedEffectSize)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Reductionininfarctsize
Reductionininfarctsize
> 30 studies > 500 animals
Bias inflates effect sizes
PLoS Biol. 2016;14:e1002331
Effects of attrition in experimental
biomedical research
PLoS Biol. 2016;14:e1002331
Bias produces false positives and inflates
effect sizes
Statistical power
Overall median power of 730 primary neuroscience studies: 21 %
Power failure in neuroscience!
Not knowing
what is false and
what is not, the
researcher sees
95 hypotheses as
true, 45 of which
are not.
α = 0.05; ß = 0.5
Mean group size n ≈ 8
Mean statistical power ≈ 45 %
False positive rate (p ≤ 0.05): ≈ 50 %
Overestimation of true effects: ≈ 50 %
“Low sample size bias“ leads to false
positives and effect size inflation
Beyond bias: HARKING –
Hypothesizing after the results are known
http://xkcd.com/882/
Beyond bias: p - Hacking
http://xkcd.com/882/
In exploratory investigation, researchers
should aim at generating robust
pathophysiological theories of disease.
Currently we often see a mixup of both modes. This prevents us
from tailoring our study designs accordingly.
In confirmatory investigation, researchers
should aim at demonstrating strong and
reproducible treatment effects in relevant
animal models.
Exploration vs Confirmation
Exploratory Confirmatory
Hypothesis (+) +++
Establish pathophysiology +++ (+)
Sequence and details of experiments established
at onset
(+) +++
Primary endpoint - ++
Sample size calculation (+) +++
Blinding +++ +++
Randomization +++ +++
External validity (aging, comorbidities, etc.) - ++
In/Exclusion criteria ++ +++
Test statistics + +++
Preregistration (-) +++
Sensitivity (Type II error) Find what might work ++ +
Specificity (Type I error) Weed out false positives + +++
Stroke 2016; 47:2148-2153
PLoS Biol. (2014) 12:e1001863.
Exploration vs Confirmation
Katharina Fritsch
Katharina Fritsch
https://www.timeshighereducation.com/
Replication (crisis?)
‚ .. non-reproducible single occurrences
are of no significance to science …‘
The Logic of Scientific Discovery (1934)
Sir Karl Popper
(1902-1994)
‘We do not take even our own
observations quite seriously, or accept
them as scientific observations, until we
have repeated and tested them. Only by
such repetitions can we convince ourselves
that we are not dealing with a mere
isolated ‘coincidence’, but with events
which, on account of their regularity and
reproducibility, are in principle inter-
subjectively testable.’
The lexicon of reproducibility
Methods reproducibility: Same data, same tools, same
results? Adds no additional evidence!
Results reproducibility (aka „replication“): Technically
competent repetition, i.e. a new study. Could be strict:
identical conditions: or conceptual: altered conditions (does
causal claim extend to previously unsampled settings?)
Inferential reproducibility: Same conclusions from study
replication or re-analysis? Not all scientists come to the
same conclusions from same results, or may make different
analytic choices. What is concluded or recommended from
a study is often the only thing that matters!
Adapted from Goodman et al. Sci Transl Med. 2016;8:341ps12.
What do we mean by 'reproducible'?
Significance and P values: Evaluating replication effect against null
hypothesis of no effect
Evaluating replication effect against original effect size: Is the
original effect size within the 95% CI of the effect size estimate
from the replication. Alternatively: Comparing original and
replication effect sizes
Meta-analysis combining original and replication effects:
Combining original and replication effect sizes for cumulative
evidence
Subjective assessment of “Did it replicate?”
From the Open Science Collaboration, Psychology Replication, Science. 2015 ;349(6251):aac4716
A false dichotomy
Replication Non-Replication
The emptiness of failed replication (?)
Mitchell J (2014) On the evidentiary evidence of failed replication
http://jasonmitchell.fas.harvard.edu/Papers/Mitchell_failed_science_2014.pdf
The emptiness of failed replication
Does a failure to replicate mean that the original
result was a false positive? Or was the failed
replication a false negative?
Does successful replication mean that the original
result was correct? Or are both results false positives?
Hidden moderators - Contextual
sensitivity – Tacit knowledge
‚We analyzed 100 replication attempts in psychology and found that the
extent to which the research topic was likely to be contextually sensitive
(varying in time, culture, or location) was associated with replication
success. This relationship remained a significant predictor of replication
success even after adjusting for characteristics of the original and
replication studies that previously had been associated with replication
success (e.g., effect size, statistical power).‘
Proc Natl Acad Sci. 2016;113:6454-9.
"Standardization fallacy":
Low external validity, poor reproducibility
Nat Methods. 2009;6:257-61.Trends Pharmacol Sci. 2016;37:509-10
p = 0.049 (p< α = 0.05)
Assume that the experimental result is correct, i.e.
measured difference equals (unknown) treatment effect.
Repeat experiment under identical conditions (i.e. 'strict
replication').
What is the probability to reproduce the significant
findings?
50 %!
How likely is strict replication ?
Replication failure as an indicator of
cutting edge research?
Dirnagl (2017) How likely are your hypotheses, really?
https://dirnagl.com/2017/04/13/how-original-are-your-scientific-hypotheses-really/
The garden of the forking paths
http://bit.ly/2q2gtXqhttp://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
http://bit.ly/2JzblTR
Fig. 6y demonstrates…
Brandt et al. Cell Metabolism 27, 2018, 118-135.e8
Resolving the tension:
Discovery & Replication
Suggested reading:
Wagenmakers EJ, Dutilh G, Sarafoglou A.
Perspect Psychol Sci. 2018 Jul;13(4):418-427
Chang and Eng Bunker circa 1865. Foto Hulton/Getty
No scientific progress without
nonreproducibility
To boldly go where no man…
Exploration at low base rate
Innovation
‚Paradigm shift‘
Incompetence
Bad designs
Tacit knowledge (bad reporting)
Low validity (bias)
Misconduct
The Good The Bad
Essential non-reproducibility
(Kuhn)
Detrimental non-reproducibility
(Popper)
Reduce Bias!
Use blinding, randomization,in/exclusion criteria.
Report results according to guidelines (e.g. ARRIVE).
Increase Power!
Check your power. Achieve at least 80%.
Do apriori sample size calculations.
Probably you need to increase n‘s.
Replicate.
Use statistics sensibly!
P-values do not provide evidence regarding a model or hypothesis.
Test statistics are overrated (and overused) in exploration.
Think biological significance, think effect size.
Replicate.
Practice Open Science
Preregister.
Publish NULL results.
Make the original data available.
Don’t get lost in the garden of the forking paths
https://dirnagl.com/2018/05/16/c
an-non-replication-be-a-sin/
https://dirnagl.com/2017/04/13/how-original-
are-your-scientific-hypotheses-really/
http://bit.ly/helmholtzdirnagl
@dirnagl

More Related Content

Similar to Excursions into the garden of the forking paths

Ruminations on replication
Ruminations on replicationRuminations on replication
Ruminations on replication
Ulrich Dirnagl
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
Dorothy Bishop
 
Reproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineReproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicine
Tim Clark
 
Sample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdfSample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdf
statsanjal
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
Laure Wynants
 
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
John Hoey
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
PyData
 
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revisedSandra Nicks
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
praveen3030
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
jemille6
 
The Research Process
The Research ProcessThe Research Process
The Research Process
K. Challinor
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
Laure Wynants
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
Bertram Ludäscher
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt
Shivraj Nile
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptx
Francois MAIGNEN
 
Statistics basics
Statistics basicsStatistics basics
Statistics basics
Sadrani Yash
 
p-values9.pdf
p-values9.pdfp-values9.pdf
p-values9.pdf
SidikSetiawan8
 

Similar to Excursions into the garden of the forking paths (20)

Ruminations on replication
Ruminations on replicationRuminations on replication
Ruminations on replication
 
Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?Biomarkers for psychological phenotypes?
Biomarkers for psychological phenotypes?
 
Reproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineReproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicine
 
Sample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdfSample Size Determination.23.11.2021.pdf
Sample Size Determination.23.11.2021.pdf
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
Published Research, Flawed, Misleading, Nefarious - Use of Reporting Guidelin...
 
Extrapolation Kent Feb10
Extrapolation Kent Feb10Extrapolation Kent Feb10
Extrapolation Kent Feb10
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Aron chpt 5 ed revised
Aron chpt 5 ed revisedAron chpt 5 ed revised
Aron chpt 5 ed revised
 
Aron chpt 5 ed
Aron chpt 5 edAron chpt 5 ed
Aron chpt 5 ed
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
The Research Process
The Research ProcessThe Research Process
The Research Process
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptx
 
Statistics basics
Statistics basicsStatistics basics
Statistics basics
 
p-values9.pdf
p-values9.pdfp-values9.pdf
p-values9.pdf
 
Experimental
ExperimentalExperimental
Experimental
 

Recently uploaded

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 

Recently uploaded (20)

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 

Excursions into the garden of the forking paths

  • 1. EXCURSIONS INTO THE GARDEN OF THE FORKING PATHS P-VALUE FETISHISATION, REPLICATION CRISIS, AND THE TENSION BETWEEN INNOVATION AND CONFIRMATION http://bit.ly/helmholtzdirnagl
  • 2. Personal motivation: Decades of futile translational stroke research • Millions of animals killed • Hundreds (thousands?) of neutral or negative clinical trials • Thousands of researchers and clinicians globally • Many billions spent on preclinical research ?
  • 3. Take home I: The garden of the forking paths http://bit.ly/2q2gtXqhttp://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://bit.ly/2JzblTR
  • 4. Take home II No scientific progress without reproducibility failures To boldly go where no man… Exploration at low base rate Innovation ‚Paradigm shift‘ Incompetence Bad designs Tacit knowledge (bad reporting) Low validity (bias) Misconduct The Good The Bad Essential non-reproducibility (Kuhn) Detrimental non-reproducibility (Popper)
  • 5. Taken home III Confirmation – weeding out the false positives of exploration Jonathan Kimmelman PLoS Biol. (2014) 12:e1001863.
  • 6. >6000 cit. PLoS Med. 2005;2:e124
  • 7. Modfied after Gary Larson Bias: Subjective reality informed by ones preferences
  • 8. Macleod MR, et al. (2015) Risk of Bias in Reports of In Vivo Research: A Focus for Improvement. PLoS Biol 13: e1002273. Low prevalence of methods to prevent bias
  • 9. Alzheimer's disease models models Blinded conduct of experiment Blinded assessment of outcome Blinded assessment of outcome Stroke models (NXY-095) Blinded assessment of behavioural outcome No Yes Improvementinbehaviouraloutcome (StandardisedEffectSize) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Blinded assessment of behavioural outcome No Yes Improvementinbehaviouraloutcome (StandardisedEffectSize) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Blinded assessment of behav No Improvementinbehaviouraloutcome (StandardisedEffectSize) 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Reductionininfarctsize Reductionininfarctsize > 30 studies > 500 animals Bias inflates effect sizes
  • 10. PLoS Biol. 2016;14:e1002331 Effects of attrition in experimental biomedical research
  • 11. PLoS Biol. 2016;14:e1002331 Bias produces false positives and inflates effect sizes
  • 13. Overall median power of 730 primary neuroscience studies: 21 % Power failure in neuroscience!
  • 14.
  • 15. Not knowing what is false and what is not, the researcher sees 95 hypotheses as true, 45 of which are not. α = 0.05; ß = 0.5
  • 16. Mean group size n ≈ 8 Mean statistical power ≈ 45 % False positive rate (p ≤ 0.05): ≈ 50 % Overestimation of true effects: ≈ 50 % “Low sample size bias“ leads to false positives and effect size inflation
  • 17. Beyond bias: HARKING – Hypothesizing after the results are known
  • 20. In exploratory investigation, researchers should aim at generating robust pathophysiological theories of disease. Currently we often see a mixup of both modes. This prevents us from tailoring our study designs accordingly. In confirmatory investigation, researchers should aim at demonstrating strong and reproducible treatment effects in relevant animal models. Exploration vs Confirmation
  • 21. Exploratory Confirmatory Hypothesis (+) +++ Establish pathophysiology +++ (+) Sequence and details of experiments established at onset (+) +++ Primary endpoint - ++ Sample size calculation (+) +++ Blinding +++ +++ Randomization +++ +++ External validity (aging, comorbidities, etc.) - ++ In/Exclusion criteria ++ +++ Test statistics + +++ Preregistration (-) +++ Sensitivity (Type II error) Find what might work ++ + Specificity (Type I error) Weed out false positives + +++ Stroke 2016; 47:2148-2153
  • 22. PLoS Biol. (2014) 12:e1001863. Exploration vs Confirmation
  • 24. ‚ .. non-reproducible single occurrences are of no significance to science …‘ The Logic of Scientific Discovery (1934) Sir Karl Popper (1902-1994) ‘We do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them. Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter- subjectively testable.’
  • 25. The lexicon of reproducibility Methods reproducibility: Same data, same tools, same results? Adds no additional evidence! Results reproducibility (aka „replication“): Technically competent repetition, i.e. a new study. Could be strict: identical conditions: or conceptual: altered conditions (does causal claim extend to previously unsampled settings?) Inferential reproducibility: Same conclusions from study replication or re-analysis? Not all scientists come to the same conclusions from same results, or may make different analytic choices. What is concluded or recommended from a study is often the only thing that matters! Adapted from Goodman et al. Sci Transl Med. 2016;8:341ps12.
  • 26. What do we mean by 'reproducible'? Significance and P values: Evaluating replication effect against null hypothesis of no effect Evaluating replication effect against original effect size: Is the original effect size within the 95% CI of the effect size estimate from the replication. Alternatively: Comparing original and replication effect sizes Meta-analysis combining original and replication effects: Combining original and replication effect sizes for cumulative evidence Subjective assessment of “Did it replicate?” From the Open Science Collaboration, Psychology Replication, Science. 2015 ;349(6251):aac4716
  • 27. A false dichotomy Replication Non-Replication
  • 28. The emptiness of failed replication (?) Mitchell J (2014) On the evidentiary evidence of failed replication http://jasonmitchell.fas.harvard.edu/Papers/Mitchell_failed_science_2014.pdf
  • 29. The emptiness of failed replication Does a failure to replicate mean that the original result was a false positive? Or was the failed replication a false negative? Does successful replication mean that the original result was correct? Or are both results false positives?
  • 30. Hidden moderators - Contextual sensitivity – Tacit knowledge ‚We analyzed 100 replication attempts in psychology and found that the extent to which the research topic was likely to be contextually sensitive (varying in time, culture, or location) was associated with replication success. This relationship remained a significant predictor of replication success even after adjusting for characteristics of the original and replication studies that previously had been associated with replication success (e.g., effect size, statistical power).‘ Proc Natl Acad Sci. 2016;113:6454-9.
  • 31. "Standardization fallacy": Low external validity, poor reproducibility Nat Methods. 2009;6:257-61.Trends Pharmacol Sci. 2016;37:509-10
  • 32. p = 0.049 (p< α = 0.05) Assume that the experimental result is correct, i.e. measured difference equals (unknown) treatment effect. Repeat experiment under identical conditions (i.e. 'strict replication'). What is the probability to reproduce the significant findings? 50 %! How likely is strict replication ?
  • 33. Replication failure as an indicator of cutting edge research? Dirnagl (2017) How likely are your hypotheses, really? https://dirnagl.com/2017/04/13/how-original-are-your-scientific-hypotheses-really/
  • 34.
  • 35. The garden of the forking paths http://bit.ly/2q2gtXqhttp://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf http://bit.ly/2JzblTR
  • 36. Fig. 6y demonstrates… Brandt et al. Cell Metabolism 27, 2018, 118-135.e8
  • 37. Resolving the tension: Discovery & Replication Suggested reading: Wagenmakers EJ, Dutilh G, Sarafoglou A. Perspect Psychol Sci. 2018 Jul;13(4):418-427 Chang and Eng Bunker circa 1865. Foto Hulton/Getty
  • 38. No scientific progress without nonreproducibility To boldly go where no man… Exploration at low base rate Innovation ‚Paradigm shift‘ Incompetence Bad designs Tacit knowledge (bad reporting) Low validity (bias) Misconduct The Good The Bad Essential non-reproducibility (Kuhn) Detrimental non-reproducibility (Popper)
  • 39. Reduce Bias! Use blinding, randomization,in/exclusion criteria. Report results according to guidelines (e.g. ARRIVE). Increase Power! Check your power. Achieve at least 80%. Do apriori sample size calculations. Probably you need to increase n‘s. Replicate. Use statistics sensibly! P-values do not provide evidence regarding a model or hypothesis. Test statistics are overrated (and overused) in exploration. Think biological significance, think effect size. Replicate. Practice Open Science Preregister. Publish NULL results. Make the original data available. Don’t get lost in the garden of the forking paths

Editor's Notes

  1. Lots of potential biases are lurking which may impact on our experimental results. From a very recent paper study from Malcolm's group we know that randomization, blinding, and conflict of interest statements are still not as prevalent as we might hope, in fact in preclinical stroke research below 40 %. I should mention that stroke research is not doing any worse than other fields in neuroscience, for example in MS research.
  2. 9
  3. This is particularly true when group sizes are small. This is in fact the case in experimental stroke research....
  4. let me explain
  5. Would this have any practical consequence? How different are these designs?
  6. Kant was perhaps the first to realize that the objectivity of scientific statements is closely connected with the construction of theories — with the use of hypotheses and universal statements. Only when certain events recur in accordance with rules or regularities, as is the case with repeatable experiments, can our observations be tested — in principle — by anyone. We do not take even our own observations quite seriously, or accept them as scientific observations, until we have repeated and tested them. Only by such repetitions can we convince ourselves that we are not dealing with a mere isolated ‘coincidence’, but with events which, on account of their regularity and reproducibility, are in principle inter-subjectively testable. Every experimental physicist knows those surprising and inexplicable apparent ‘effects’ which in his laboratory can perhaps even be reproduced for some time, but which finally disappear without trace. Of course, no physicist would say in such a case that he had made a scientific discovery (though he might try to rearrange his experiments so as to make the effect reproducible). Indeed the scientifically significant physical effect may be defined as that which can be regularly reproduced by anyone who carries out the appropriate experiment in the way prescribed. No serious physicist would offer for publication, as a scientific discovery, any such ‘occult effect,’ as I propose to call it — one for whose reproduction he could give no instructions. The ‘discovery’ would be only too soon rejected as chimerical, simply because attempts to test it would lead to negative results. (It follows that any controversy over the question whether events which are in principle unrepeatable and unique ever do occur cannot be decided by science: it would be a metaphysical controversy.) – Karl Popper (1959/2002), The Logic of Scientific Discovery, pp. 23-24.
  7. • Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value. • Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them. • Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena. • Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output. • The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their “degrees of freedom,” for example, by specifying designs in advance. • Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.
  8. power irrelevant, as experiment reproduced under identical conditions
  9. ARTICLE Switch to Standard View mTORC1 Inactivation Promotes Colitis-Induced Colorectal Cancer but Protects from APC Loss-Dependent Tumorigenesis Marta Brandt , Tatiana P. Grazioso , Mohamad-Ali Fawal , Krishna S. Tummala , Raul Torres-Ruiz , Sandra Rodriguez-Perales , Cristian Perna , Nabil Djouder4,Correspondence information about the author Nabil DjouderEmail the author Nabil Djouder 4Lead Contact PlumX Metrics DOI: http://dx.doi.org/10.1016/j.cmet.2017.11.006 | Article Info
  10. The Amazing American Story of the Original Siamese Twins Few newcomers to the U.S. have crossed more daunting barriers than Chang and Eng Bunker