SlideShare a Scribd company logo
1 of 26
"‘Repligate’: reproducibility in statistical
studies. What does it mean and in what
sense does it matter?"
Stephen Senn
(c) Stephen Senn 1
Acknowledgements
(c) Stephen Senn 2
Acknowledgements
Thanks to Rogier Kievit and Richard Morey for the invitation and to APS for support
This work is partly supported by the European Union’s 7th Framework Programme for
research, technological development and demonstration under grant agreement no.
602552. “IDEAL”
Outline
• The crisis of replication?
• A brief history of P-values
• What are we looking for in replication?
• Empirical evidence
• Conclusions
(c) Stephen Senn 3
The Crisis of Replication
(c) Stephen Senn 4
In countless tweets….The “replication police” were described as “shameless little
bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not
designed to find truth,” “second stringers” who were incapable of making novel
contributions of their own to the literature, and—most succinctly—“assholes.”
Why Psychologists’ Food Fight Matters
“Important findings” haven’t been replicated, and science may have to change
its ways.
By Michelle N. Meyer and Christopher Chabris , Science
Ioannidis (2005)
• Claimed that most
published research
findings are wrong
– By finding he means a
‘positive’ result
• 2764 citations by 18
May 2015 according to
Google Scholar
(c) Stephen Senn 5
Colquhoun’s Criticisms
(c) Stephen Senn 6
“If you want to avoid making a fool of yourself very often, do not regard anything greater than
p <0.001 as a demonstration that you have discovered something. Or, slightly less stringently,
use a three-sigma rule.”
Royal Society Open Science 2014
A Common Story
• Scientists were treading the path of Bayesian
reason
• Along came RA Fisher and persuaded them into a
path of P-value madness
• This is responsible for a lot of unrepeatable
nonsense
• We need to return them to the path of Bayesian
virtue
• In fact the history is not like this and
understanding this is a key to understanding the
problem
(c) Stephen Senn 7
(c) Stephen Senn 8
Fisher, Statistical Methods for Research Workers, 1925
The real history
• Scientists before Fisher were using tail area
probabilities to calculate posterior probabilities
• Fisher pointed out that this interpretation was
unsafe and offered a more conservative one
• Jeffreys, influenced by CD Broad’s criticism, was
unsatisfied with the Laplacian framework and
used a lump prior probability on a point
hypothesis being true
• It is Bayesian Jeffreys versus Bayesian Laplace
that makes the dramatic difference, not
frequentist Fisher versus Bayesian Laplace
(c) Stephen Senn 9
(c) Stephen Senn 10
Why the difference?
• Imagine a point estimate of two standard
errors
• Now consider the likelihood ratio for a given
value of the parameter,  under the
alternative to one under the null
– Dividing hypothesis (smooth prior) for any given
value  = ’ compare to  = -’
– Plausible hypothesis (lump prior) for any given
value  = ’ compare to  = 0
(c) Stephen Senn 11
A speculation of mine
• Scientists had noticed that for dividing
hypotheses they could get ‘significance’ rather
easily
– The result is the 1/20 rule
• However when deciding to choose a new
parameter or not in terms of probability it is 50%
not 5% that is relevant
• This explains the baffling finding that significance
test are actually more conservative than AIC (and
sometimes than BIC)
(c) Stephen Senn 12
The situations compared
(c) Stephen Senn 13
(c) Stephen Senn 14
Goodman’s Criticism
• What is the probability of repeating a result
that is just significant at the 5% level (p=0.05)?
• Answer 50%
– If true difference is observed difference
– If uninformative prior for true treatment effect
• Therefore P-values are unreliable as inferential
aids
(c) Stephen Senn 15
Sauce for the Goose and Sauce for the
Gander
• This property is shared by Bayesian
statements
– It follows from the Martingale property of
Bayesian forecasts
• Hence, either
– The property is undesirable and hence is a
criticism of Bayesian methods also
– Or it is desirable and is a point in favour of
frequentist methods
(c) Stephen Senn 16
Three Possible Questions
• Q1 What is the probability that in a future experiment,
taking that experiment's results alone, the estimate for B
would after all be worse than that for A?
• Q2 What is the probability, having conducted this
experiment, and pooled its results with the current one,
we would show that the estimate for B was, after all,
worse than that for A?
• Q3 What is the probability that having conducted a future
experiment and then calculated a Bayesian posterior using
a uniform prior and the results of this second experiment
alone, the probability that B would be worse than A
would be less than or equal to 0.05?
(c) Stephen Senn 17
(c) Stephen Senn 18
Why Goodman’s Criticism is Irrelevant
“It would be absurd if our inferences about the world, having just completed a
clinical trial, were necessarily dependent on assuming the following. 1. We are
now going to repeat this experiment. 2. We are going to repeat it only once. 3.
It must be exactly the same size as the experiment we have just run. 4. The
inferential meaning of the experiment we have just run is the extent to which
it predicts this second experiment.”
Senn, 2002
(c) Stephen Senn 19
A Paradox of Bayesian Significance
Tests
Two scientists start with the same probability 0.5 that a drug is effective.
Given that it is effective they have the same prior for how effective it is.
If it is not effective A believes that it will be useless but B believes that it may
be harmful.
Having seem the same data B now believes that it is useful with probability
0.95 and A believes that it is useless with probability 0.95.
A Tale of Two priors
(c) Stephen Senn 20
(c) Stephen Senn 21
In Other Words
The probability is 0.95
And the probability is also 0.05
Both of these probabilities can be simultaneously true.
NB This is not illogical but it is illogical to regard this sort of thing as proving that
p-values are illogical
‘…would require that a procedure is dismissed because, when combined
with information which it doesn’t require and which may not exist, it
disagrees with a procedure that disagrees with itself’
Senn, 2000
Are most research findings false?
• A dram of data is worth a pint of pontification
• Two interesting studies recently
– The many labs replications project
• This raised the Twitter storm alluded to earlier
– Jager & Leek, Biostatistics 2014
(c) Stephen Senn 22
Many Labs Replication Project
(c) Stephen Senn 23
Klein et al, Social
Psychology, 2014
Jager & Leek, 2014
• Text-mining of 77,410
abstracts yielded 5,322
P-values
• Considered a mixture
model truncated at 0.05
• Estimated that amongst
‘discoveries’ 14% are
false
𝑓 𝑝 𝑎, 𝑏, 𝜋0
= 𝜋0 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,0.05
+ 1 − 𝜋0 𝑡𝐵𝑒𝑡𝑎 𝑎, 𝑏, 0.05
Estimation using the EM algorithm
(c) Stephen Senn 24
But one must be careful
• These studies suggest that a common
threshold of 5% seems to be associated with a
manageable false positive rate
• This does not mean that the threshold is right
– It might reflect (say) that most P-values are either
>0.05 or <<0.05
– The situation might be capable of improvement
using a different threshold
• Also, are false negatives without cost?
(c) Stephen Senn 25
My Conclusion
• P-values per se are not the problem
• There may be a harmful culture of ‘significance’
however this is defined
• P-values have a limited use as rough and ready
tools using little structure
• Where you have more structure you can often do
better
– Likelihood, Bayes etc
– Point estimates and standard errors are extremely
useful for future research synthesizers and should be
provided regularly
(c) Stephen Senn 26

More Related Content

What's hot

Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."jemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Researchjemille6
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardChristian Robert
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperChristian Robert
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...jemille6
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...jemille6
 
D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1jemille6
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Warsjemille6
 

What's hot (20)

Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Research
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
 
D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1D.G. Mayo Slides LSE PH500 Meeting #1
D.G. Mayo Slides LSE PH500 Meeting #1
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Wars
 
Bayes rpp bristol
Bayes rpp bristolBayes rpp bristol
Bayes rpp bristol
 

Similar to Senn repligate

P values and replication
P values and replicationP values and replication
P values and replicationStephen Senn
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...StephenSenn2
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliryStephen Senn
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding catsStephen Senn
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significanceStephenSenn2
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3Stephen Senn
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfallsPavlos Msaouel, MD, PhD
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansStephen Senn
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500jemille6
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. EPINOR
 
Real world modified
Real world modifiedReal world modified
Real world modifiedStephen Senn
 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyMatti Heino
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets PopperStephen Senn
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxStephenSenn3
 

Similar to Senn repligate (20)

P values and replication
P values and replicationP values and replication
P values and replication
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding cats
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
P value wars
P value warsP value wars
P value wars
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
 
On being Bayesian
On being BayesianOn being Bayesian
On being Bayesian
 
P-values in crisis
P-values in crisisP-values in crisis
P-values in crisis
 
Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis
 
Real world modified
Real world modifiedReal world modified
Real world modified
 
Crisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychologyCrisis of confidence, p-hacking and the future of psychology
Crisis of confidence, p-hacking and the future of psychology
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 

Recently uploaded

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 

Recently uploaded (20)

Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 

Senn repligate

  • 1. "‘Repligate’: reproducibility in statistical studies. What does it mean and in what sense does it matter?" Stephen Senn (c) Stephen Senn 1
  • 2. Acknowledgements (c) Stephen Senn 2 Acknowledgements Thanks to Rogier Kievit and Richard Morey for the invitation and to APS for support This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL”
  • 3. Outline • The crisis of replication? • A brief history of P-values • What are we looking for in replication? • Empirical evidence • Conclusions (c) Stephen Senn 3
  • 4. The Crisis of Replication (c) Stephen Senn 4 In countless tweets….The “replication police” were described as “shameless little bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not designed to find truth,” “second stringers” who were incapable of making novel contributions of their own to the literature, and—most succinctly—“assholes.” Why Psychologists’ Food Fight Matters “Important findings” haven’t been replicated, and science may have to change its ways. By Michelle N. Meyer and Christopher Chabris , Science
  • 5. Ioannidis (2005) • Claimed that most published research findings are wrong – By finding he means a ‘positive’ result • 2764 citations by 18 May 2015 according to Google Scholar (c) Stephen Senn 5
  • 6. Colquhoun’s Criticisms (c) Stephen Senn 6 “If you want to avoid making a fool of yourself very often, do not regard anything greater than p <0.001 as a demonstration that you have discovered something. Or, slightly less stringently, use a three-sigma rule.” Royal Society Open Science 2014
  • 7. A Common Story • Scientists were treading the path of Bayesian reason • Along came RA Fisher and persuaded them into a path of P-value madness • This is responsible for a lot of unrepeatable nonsense • We need to return them to the path of Bayesian virtue • In fact the history is not like this and understanding this is a key to understanding the problem (c) Stephen Senn 7
  • 8. (c) Stephen Senn 8 Fisher, Statistical Methods for Research Workers, 1925
  • 9. The real history • Scientists before Fisher were using tail area probabilities to calculate posterior probabilities • Fisher pointed out that this interpretation was unsafe and offered a more conservative one • Jeffreys, influenced by CD Broad’s criticism, was unsatisfied with the Laplacian framework and used a lump prior probability on a point hypothesis being true • It is Bayesian Jeffreys versus Bayesian Laplace that makes the dramatic difference, not frequentist Fisher versus Bayesian Laplace (c) Stephen Senn 9
  • 11. Why the difference? • Imagine a point estimate of two standard errors • Now consider the likelihood ratio for a given value of the parameter,  under the alternative to one under the null – Dividing hypothesis (smooth prior) for any given value  = ’ compare to  = -’ – Plausible hypothesis (lump prior) for any given value  = ’ compare to  = 0 (c) Stephen Senn 11
  • 12. A speculation of mine • Scientists had noticed that for dividing hypotheses they could get ‘significance’ rather easily – The result is the 1/20 rule • However when deciding to choose a new parameter or not in terms of probability it is 50% not 5% that is relevant • This explains the baffling finding that significance test are actually more conservative than AIC (and sometimes than BIC) (c) Stephen Senn 12
  • 13. The situations compared (c) Stephen Senn 13
  • 14. (c) Stephen Senn 14 Goodman’s Criticism • What is the probability of repeating a result that is just significant at the 5% level (p=0.05)? • Answer 50% – If true difference is observed difference – If uninformative prior for true treatment effect • Therefore P-values are unreliable as inferential aids
  • 15. (c) Stephen Senn 15 Sauce for the Goose and Sauce for the Gander • This property is shared by Bayesian statements – It follows from the Martingale property of Bayesian forecasts • Hence, either – The property is undesirable and hence is a criticism of Bayesian methods also – Or it is desirable and is a point in favour of frequentist methods
  • 16. (c) Stephen Senn 16 Three Possible Questions • Q1 What is the probability that in a future experiment, taking that experiment's results alone, the estimate for B would after all be worse than that for A? • Q2 What is the probability, having conducted this experiment, and pooled its results with the current one, we would show that the estimate for B was, after all, worse than that for A? • Q3 What is the probability that having conducted a future experiment and then calculated a Bayesian posterior using a uniform prior and the results of this second experiment alone, the probability that B would be worse than A would be less than or equal to 0.05?
  • 18. (c) Stephen Senn 18 Why Goodman’s Criticism is Irrelevant “It would be absurd if our inferences about the world, having just completed a clinical trial, were necessarily dependent on assuming the following. 1. We are now going to repeat this experiment. 2. We are going to repeat it only once. 3. It must be exactly the same size as the experiment we have just run. 4. The inferential meaning of the experiment we have just run is the extent to which it predicts this second experiment.” Senn, 2002
  • 19. (c) Stephen Senn 19 A Paradox of Bayesian Significance Tests Two scientists start with the same probability 0.5 that a drug is effective. Given that it is effective they have the same prior for how effective it is. If it is not effective A believes that it will be useless but B believes that it may be harmful. Having seem the same data B now believes that it is useful with probability 0.95 and A believes that it is useless with probability 0.95.
  • 20. A Tale of Two priors (c) Stephen Senn 20
  • 21. (c) Stephen Senn 21 In Other Words The probability is 0.95 And the probability is also 0.05 Both of these probabilities can be simultaneously true. NB This is not illogical but it is illogical to regard this sort of thing as proving that p-values are illogical ‘…would require that a procedure is dismissed because, when combined with information which it doesn’t require and which may not exist, it disagrees with a procedure that disagrees with itself’ Senn, 2000
  • 22. Are most research findings false? • A dram of data is worth a pint of pontification • Two interesting studies recently – The many labs replications project • This raised the Twitter storm alluded to earlier – Jager & Leek, Biostatistics 2014 (c) Stephen Senn 22
  • 23. Many Labs Replication Project (c) Stephen Senn 23 Klein et al, Social Psychology, 2014
  • 24. Jager & Leek, 2014 • Text-mining of 77,410 abstracts yielded 5,322 P-values • Considered a mixture model truncated at 0.05 • Estimated that amongst ‘discoveries’ 14% are false 𝑓 𝑝 𝑎, 𝑏, 𝜋0 = 𝜋0 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 0,0.05 + 1 − 𝜋0 𝑡𝐵𝑒𝑡𝑎 𝑎, 𝑏, 0.05 Estimation using the EM algorithm (c) Stephen Senn 24
  • 25. But one must be careful • These studies suggest that a common threshold of 5% seems to be associated with a manageable false positive rate • This does not mean that the threshold is right – It might reflect (say) that most P-values are either >0.05 or <<0.05 – The situation might be capable of improvement using a different threshold • Also, are false negatives without cost? (c) Stephen Senn 25
  • 26. My Conclusion • P-values per se are not the problem • There may be a harmful culture of ‘significance’ however this is defined • P-values have a limited use as rough and ready tools using little structure • Where you have more structure you can often do better – Likelihood, Bayes etc – Point estimates and standard errors are extremely useful for future research synthesizers and should be provided regularly (c) Stephen Senn 26