SlideShare a Scribd company logo
P Value wars
Stephen Senn
(c) Stephen Senn 2017 1P value wars
Acknowledgements
(c) Stephen Senn 2017 2
Acknowledgements
Thanks to the EMA for inviting me and to Olivier Collignon for organizing it
This work is partly supported by the European Union’s 7th Framework Programme for
research, technological development and demonstration under grant agreement no.
602552. “IDEAL”
P value wars
Outline
• Basics
– The difference between probability and statistics
– A simple example
• Recent criticisms of P-values/significance
– Ioannides
– Repligate and other psychology fights
– ASA statement
• A brief history of P-values
– The real (hidden) reason why P-values are so
controversial
• Conclusions
(c) Stephen Senn 2017 3P value wars
BASICS
P-values, Bayes The meaning of life
(c) Stephen Senn 2017 4P value wars
Probability versus Statistics
an example
Probability theory
• Q. This die is fair, what is
the probability that I will get
two sixes in two rolls of the
die
• A.
1
6
2
=
1
36
Statistical theory
• Q I rolled this die twice and
got two sixes. What is the
probability that the die is
fair?
• A. Nobody knows
(c) Stephen Senn 2017 5P value wars
Probabilists versus Statisticians:
the difference
Probabilists
• Are mathematicians
• Deal with direct probability
statements
• Plays a formal mathematical
game
• Are involved in the divine
– God knows that dice is fair;
let’s work out the
consequences
Statisticians
• Are scientists
• Deal with inverse
probability statements
• Are dealing with the real
world
• Are involved in the human
– We are down here trying to
work out the mind of God
(c) Stephen Senn 2017 6P value wars
P value wars (c) Stephen Senn 2017 7
An Example
My compact disc (CD) player* allowed me to
press tracks in sequential order by pressing play
or in random order by playing shuffle.
One day I was playing the CD Hysteria by Def Leppard. This CD has 12 tracks.
I thought that I had pressed the shuffle button but the first track played was
‘women’, which is the first track on the CD.
Q. What is the probability that I did, in fact, press the shuffle button as
intended?
*I now have an Ipod nano
P value wars (c) Stephen Senn 2017 8
That Heavy Metal Problem
The Bayesian Solution
We have two basic hypotheses:
1) I pressed shuffle.
2) I pressed play.
First we have to establish a so-called prior probability for these
hypotheses: a probability before seeing the evidence.
Suppose that the probability that I press the shuffle button when I
mean to press the shuffle button is 9/10. The probability of
making a mistake and pressing the play button is then 1/10.
P value wars (c) Stephen Senn 2017 9
Next we establish probabilities of events given theories. These
particular sorts of probabilities are referred to as likelihoods ,a
term due to RA Fisher(1890-1962).
If I pressed shuffle, then the probability that the first track will be
‘women’ (W) is 1/12. If I pressed play, then the probability that
the first track is W is 1.
For completeness (although it is not necessary for the solution)
we consider the likelihoods had any other track apart from
‘women’ (say X) been played.
If I pressed shuffle then the probability of X is 11/12. If I pressed
play then this probability is 0.
We can put this together as follows
Hypothesis Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 W 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120
Play 1/10 X 0 0
TOTAL 120/120 = 1
P value wars (c) Stephen Senn 2017 10
After seeing (hearing) the evidence, however, only
two rows remain
Hypothesis Prior
Probability
P
Evidence Likelihood P x L
Shuffle 9/10 W 1/12 9/120
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120
Play 1/10 X 0 0
TOTAL 21/120
P value wars (c) Stephen Senn 2017 11
P value wars (c) Stephen Senn 2017 12
The probabilities of the two cases which remain do not add up
to 1.
However, since these two cases cover all the possibilities which
remain, their combined probability must be 1.
Therefore we rescale the individual probabilities to make them
add to 1.
We can do this without changing their relative value by dividing
by their total, 21/120.
This has been done in the table below.
So we rescale by dividing by the total probability
Hypothesis Prior
Probability
P
Evidence Likelihood P x L Posterior
Probability
Shuffle 9/10 W 1/12 9/120 (9/120)/(21/120)
=9/21
Shuffle 9/10 X 11/12 99/120
Play 1/10 W 1 12/120 (12/120)/(21/120)
=12/21
Play 1/10 X 0 0
TOTAL 21/120 21/21=1
P value wars (c) Stephen Senn 2017 13
What is the difference between the
Bayesian probability and the P-value?
Bayesian
• It is the probability that the
null hypothesis is true given
the evidence
– It is my posterior
probability that I pressed
shuffle
• The value is 9/21
P-value
• It is the probability of the
evidence given the null
hypothesis
– Usually more extreme
evidence must be
included
• The value is 1/12
(c) Stephen Senn 2017 14P value wars
Extremism
• For many practical
applications every result
is unlikely
• Example: the probability
that the mean difference
in blood pressure is
exactly 5mmHg is
effectively zero
• This has led to calculation
of probability of result as
extreme or more extreme
to act as ‘p-value’
(c) Stephen Senn 2017 15P value wars
NB Probability statements are not
reversible
• Is the Pope a Catholic?
– Yes
• Is a Catholic the Pope?
– Probably not
• The probability of the evidence given the
hypothesis is not the same as the probability
of the hypothesis given the evidence
– As we saw from the CD player example
(c) Stephen Senn 2017 16P value wars
To sum up
• The Bayesian approach provides a complete
and logical solution
• But it requires prior probabilities
• You can’t just use arguments of symmetry
• Different people will come to different
conclusions
(c) Stephen Senn 2017 17P value wars
RECENT CRITICISMS
Ioannides, Repligate and the ASA statement
(c) Stephen Senn 2017 18P value wars
Ioannidis (2005)
• Claimed that most
published research
findings are wrong
– By finding he means a
‘positive’ result
• 4380 citations by 16
February 2017 according
to Google Scholar
(c) Stephen Senn 2017 19
𝑇𝑃𝑅 =
𝑅 1−𝛽 1+𝑅
𝛼+𝑅 1−𝛽 1+𝑅
=
𝑅 1−𝛽
𝛼+𝑅 1−𝛽
(TPR=True Positive Rate)
P value wars
(c) Stephen Senn 2017 20P value wars
The Crisis of Replication
(c) Stephen Senn 2017 21
In countless tweets….The “replication police” were described as “shameless little
bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not
designed to find truth,” “second stringers” who were incapable of making novel
contributions of their own to the literature, and—most succinctly—“assholes.”
Why Psychologists’ Food Fight Matters
“Important findings” haven’t been replicated, and science may have to change
its ways.
By Michelle N. Meyer and Christopher Chabris , Science
P value wars
Many Labs Replication Project
(c) Stephen Senn 2017 22
Klein et al, Social
Psychology, 2014P value wars
(c) Stephen Senn 2017 23P value wars
(c) Stephen Senn 2017 24P value wars
The ASA statement
(c) Stephen Senn 2017 25
1. P-values can indicate how incompatible the data are with a specified
statistical model.
2. P-values do not measure the probability that the studied hypothesis is true,
or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based
only on whether a p-value passes a specific threshold.
4. Proper inference requires full reporting and transparency.
5. A p-value, or statistical significance, does not measure the size of an effect or
the importance of a result.
6. By itself, a p-value does not provide a good measure of evidence regarding a
model or hypothesis.
P value wars
Obligatory purloined cartoon
(c) Stephen Senn
2017
26P value wars
Sympathy for Ron Waserstein
(c) Stephen Senn 2017 27P value wars
In summary
• There has been mounting criticism of ‘null-
hypothesis-significance-testing’ (NHST) and P-
values in particular
• Some people think that ‘science is broke’
• Some people claim that P-values are to blame
– They give ‘significance’ far too easily
• There is a lot of advice, much of it conflicting,
flying around
(c) Stephen Senn 2017 28P value wars
An Example of the Problem
“If you want to avoid making
a fool of yourself very often,
do not regard anything
greater than p <0.001 as a
demonstration that you
have discovered something.
Or, slightly less stringently,
use a three-sigma rule.”
David Colquhoun
Royal Society Open Science
2014
In general, P values larger
than 0.01 should be
reported to two decimal
places, those between 0.01
and 0.001 to three decimal
places; P values smaller
than 0.001 should be
reported as P<0.001
New England Journal of
Medicine guidelines to
authors
(c) Stephen Senn 2017 29P value wars
A BRIEF HISTORY OF P-VALUES
Did Fisher really teach scientists to fish for significance?
(c) Stephen Senn 2017 30P value wars
(c) Stephen Senn 2017 31
The collective noun for statisticians
is “a quarrel”
John Tukey
P value wars
A Common Story
• Scientists were treading the path of Bayesian
reason
• Along came RA Fisher and persuaded them into a
path of P-value madness
• This is responsible for a lot of unrepeatable
nonsense
• We need to return them to the path of Bayesian
virtue
• In fact the history is not like this and
understanding this is a key to understanding the
problem
(c) Stephen Senn 2017 32P value wars
(c) Stephen Senn 2017 33
Fisher, Statistical Methods for Research Workers, 1925
P value wars
(c) Stephen Senn 2017 34
How Student sees it
P value wars
(c) Stephen Senn 2017 35
How Fisher sees its
P value wars
(c) Stephen Senn 2017 36
Both points of view
P value wars
The real history
• Scientists before Fisher were using tail area probabilities to
calculate posterior probabilities
– This was following Laplace’s use of uninformative prior
distributions
• Fisher pointed out that this interpretation was unsafe and
offered a more conservative one
• Jeffreys, influenced by CD Broad’s criticism, was unsatisfied
with the Laplacian framework and used a lump prior
probability on a point hypothesis being true
– Etz and Wagenmakers have claimed that Haldane 1932 anticipated
Jeffreys
• It is Bayesian Jeffreys versus Bayesian Laplace that makes the
dramatic difference, not frequentist Fisher versus Bayesian
Laplace
(c) Stephen Senn 2017 37P value wars
What Jeffreys Understood
(c) Stephen Senn 2017 38
Theory of Probability, 3rd edition P128
P value wars
CD Broad 1887*-1971
• Graduated Cambridge 1910
• Fellow of Trinity 1911
• Lectured at St Andrews &
Bristol
• Returned to Cambridge
1926
• Knightbridge Professor of
Philosophy 1933-1953
• Interested in epistemology
and psychic research
*NB Harold Jeffreys born 1891
(c) Stephen Senn 2017 39P value wars
CD Broad, 1918
(c) Stephen Senn 2017 40
P393
p394
P value wars
What Jeffreys concluded
• If you have an uninformative prior distribution
the probability of a precise hypothesis is very
low
• It will remain low even if you have lots of data
consistent with it
• You need to allocate a solid lump of
probability that it is true
• Nature has decided, other things being equal,
that simpler hypotheses are more likely
(c) Stephen Senn 2017 41P value wars
(c) Stephen Senn 2017 42P value wars
Why the difference?
• Imagine a point estimate of two standard
errors
• Now consider the likelihood ratio for a given
value of the parameter,  under the
alternative to one under the null
– Dividing hypothesis (smooth prior) for any given
value  = ’ compare to  = -’
– Plausible hypothesis (lump prior) for any given
value  = ’ compare to  = 0
(c) Stephen Senn 2017 43P value wars
In summary
• The major disagreement is not between P-values
and Bayes using informative prior distribution
• It’s between two Bayesian approaches
– Using uninformative prior distributions
– Using highly informative one
• The conflict is not going to go away by banning P-
values
• There is no automatic Bayesianism
– You have to do it for real
(c) Stephen Senn 2017 44P value wars
CONCLUSION
Statistics is difficult and statisticians should be paid more
(c) Stephen Senn 2017 45P value wars
What you should know
• P=0.05 is a weak standard of evidence
• Requiring two trials is a good idea
– NB If you replace them by a single larger trial or a
meta-analysis ask for P=1/800
– Variation in results from trial to trial is natural
• Don’t just rely on P-values
• Bayes is harder to do than many people think
• Pay attention to the ASA advice
(c) Stephen Senn 2017 46P value wars
In Conclusion
The Solid Six?
(c) Stephen Senn 2017 47
1. P-values can indicate how incompatible the data are with a specified
statistical model.
2. P-values do not measure the probability that the studied hypothesis is true,
or the probability that the data were produced by random chance alone.

3. Scientific conclusions and business or policy decisions should not be based
only on whether a p-value passes a specific threshold. 
4. Proper inference requires full reporting and transparency. 
5. A p-value, or statistical significance, does not measure the size of an effect or
the importance of a result. 
6. By itself, a p-value does not provide a good measure of evidence regarding a
model or hypothesis. 
P value wars
However
48
Proponents of the “Bayesian revolution” should be wary of chasing yet
another chimera: an apparently universal inference procedure. A better
path would be to promote both an understanding of the various devices in
the “statistical toolbox” and informed judgment to select among these.
Gigerenzer and Marewski,
Journal of Management, Vol. 41 No. 2, February 2015 421–440
(c) Stephen Senn 2017P value wars

More Related Content

What's hot

Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
Stephen Senn
 
Question study design
Question study designQuestion study design
Question study design
Anisur Rahman
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
StephenSenn2
 
6 asbin gimire jornal club presentation
6 asbin gimire  jornal club presentation6 asbin gimire  jornal club presentation
6 asbin gimire jornal club presentation
Pokhara University, Pokhara, Nepal
 
Critical Appraisal of systematic review and meta analysis articles
Critical Appraisal of systematic review and meta analysis articlesCritical Appraisal of systematic review and meta analysis articles
Critical Appraisal of systematic review and meta analysis articles
Dr. Majdi Al Jasim
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
Maarten van Smeden
 
Meta-analysis and systematic reviews
Meta-analysis and systematic reviews Meta-analysis and systematic reviews
Meta-analysis and systematic reviews coolboy101pk
 
Critical appraisal of meta-analysis
Critical appraisal of meta-analysisCritical appraisal of meta-analysis
Critical appraisal of meta-analysisSamir Haffar
 
Biases in meta-analysis
Biases in meta-analysisBiases in meta-analysis
Biases in meta-analysis
Rizwan S A
 
Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisation
Stephen Senn
 
Choosing Regression Models
Choosing Regression ModelsChoosing Regression Models
Choosing Regression Models
Stephen Senn
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
Stephen Senn
 
Superiority, Equivalence, and Non-Inferiority Trial Designs
Superiority, Equivalence, and Non-Inferiority Trial DesignsSuperiority, Equivalence, and Non-Inferiority Trial Designs
Superiority, Equivalence, and Non-Inferiority Trial Designs
Kevin Clauson
 
Critical appraisal presentation by mohamed taha 2
Critical appraisal presentation by mohamed taha  2Critical appraisal presentation by mohamed taha  2
Critical appraisal presentation by mohamed taha 2Cairo University
 
5.1.1 sufficient component cause model
5.1.1 sufficient component cause model5.1.1 sufficient component cause model
5.1.1 sufficient component cause model
A M
 
Cochrane
CochraneCochrane
Cochrane
Hesham Al-Inany
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
Amandeep Kaur
 
27 shweta lamsal journal-club-presentation
27 shweta lamsal journal-club-presentation27 shweta lamsal journal-club-presentation
27 shweta lamsal journal-club-presentation
Pokhara University, Pokhara, Nepal
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
StephenSenn2
 
Ethics in health research
Ethics in health researchEthics in health research
Ethics in health research
Achyut Raj Pandey
 

What's hot (20)

Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
 
Question study design
Question study designQuestion study design
Question study design
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
6 asbin gimire jornal club presentation
6 asbin gimire  jornal club presentation6 asbin gimire  jornal club presentation
6 asbin gimire jornal club presentation
 
Critical Appraisal of systematic review and meta analysis articles
Critical Appraisal of systematic review and meta analysis articlesCritical Appraisal of systematic review and meta analysis articles
Critical Appraisal of systematic review and meta analysis articles
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
Meta-analysis and systematic reviews
Meta-analysis and systematic reviews Meta-analysis and systematic reviews
Meta-analysis and systematic reviews
 
Critical appraisal of meta-analysis
Critical appraisal of meta-analysisCritical appraisal of meta-analysis
Critical appraisal of meta-analysis
 
Biases in meta-analysis
Biases in meta-analysisBiases in meta-analysis
Biases in meta-analysis
 
Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisation
 
Choosing Regression Models
Choosing Regression ModelsChoosing Regression Models
Choosing Regression Models
 
Has modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurtHas modelling killed randomisation inference frankfurt
Has modelling killed randomisation inference frankfurt
 
Superiority, Equivalence, and Non-Inferiority Trial Designs
Superiority, Equivalence, and Non-Inferiority Trial DesignsSuperiority, Equivalence, and Non-Inferiority Trial Designs
Superiority, Equivalence, and Non-Inferiority Trial Designs
 
Critical appraisal presentation by mohamed taha 2
Critical appraisal presentation by mohamed taha  2Critical appraisal presentation by mohamed taha  2
Critical appraisal presentation by mohamed taha 2
 
5.1.1 sufficient component cause model
5.1.1 sufficient component cause model5.1.1 sufficient component cause model
5.1.1 sufficient component cause model
 
Cochrane
CochraneCochrane
Cochrane
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
27 shweta lamsal journal-club-presentation
27 shweta lamsal journal-club-presentation27 shweta lamsal journal-club-presentation
27 shweta lamsal journal-club-presentation
 
Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
 
Ethics in health research
Ethics in health researchEthics in health research
Ethics in health research
 

Similar to P value wars

P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding cats
Stephen Senn
 
P values and replication
P values and replicationP values and replication
P values and replication
Stephen Senn
 
And thereby hangs a tail
And thereby hangs a tailAnd thereby hangs a tail
And thereby hangs a tail
Stephen Senn
 
On being Bayesian
On being BayesianOn being Bayesian
On being Bayesian
Stephen Senn
 
Senn repligate
Senn repligateSenn repligate
Senn repligate
jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
jemille6
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
Stephen Senn
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
jemille6
 
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper ConceptsExcursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
jemille6
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
nszakir
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
jemille6
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
Stephen Senn
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
Pavlos Msaouel, MD, PhD
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
jemille6
 
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
NUI Galway
 
2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical AnalysisNUI Galway
 
Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis
Centre for Social Initiative and Management
 
Making Sense of Statistics in HCI: part 2 - doing it
Making Sense of Statistics in HCI: part 2 - doing itMaking Sense of Statistics in HCI: part 2 - doing it
Making Sense of Statistics in HCI: part 2 - doing it
Alan Dix
 

Similar to P value wars (20)

P values and the art of herding cats
P values  and the art of herding catsP values  and the art of herding cats
P values and the art of herding cats
 
P values and replication
P values and replicationP values and replication
P values and replication
 
And thereby hangs a tail
And thereby hangs a tailAnd thereby hangs a tail
And thereby hangs a tail
 
On being Bayesian
On being BayesianOn being Bayesian
On being Bayesian
 
Senn repligate
Senn repligateSenn repligate
Senn repligate
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper ConceptsExcursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
bayesjaw.ppt
bayesjaw.pptbayesjaw.ppt
bayesjaw.ppt
 
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019Meeting #1 Slides Phil 6334/Econ 6614 SP2019
Meeting #1 Slides Phil 6334/Econ 6614 SP2019
 
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
2013.03.26 An Introduction to Modern Statistical Analysis using Bayesian Methods
 
2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis2013.03.26 Bayesian Methods for Modern Statistical Analysis
2013.03.26 Bayesian Methods for Modern Statistical Analysis
 
Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis Reporting Results of Statistical Analysis
Reporting Results of Statistical Analysis
 
Making Sense of Statistics in HCI: part 2 - doing it
Making Sense of Statistics in HCI: part 2 - doing itMaking Sense of Statistics in HCI: part 2 - doing it
Making Sense of Statistics in HCI: part 2 - doing it
 

More from Stephen Senn

What is your question
What is your questionWhat is your question
What is your question
Stephen Senn
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
Stephen Senn
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
Stephen Senn
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
Stephen Senn
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
Stephen Senn
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
Stephen Senn
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
Stephen Senn
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
Stephen Senn
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
Stephen Senn
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
Stephen Senn
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
Stephen Senn
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?
Stephen Senn
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
Stephen Senn
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
Stephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
Stephen Senn
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
Stephen Senn
 
The story of MTA/02
The story of MTA/02The story of MTA/02
The story of MTA/02
Stephen Senn
 
Confounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksConfounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricks
Stephen Senn
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
Stephen Senn
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
Stephen Senn
 

More from Stephen Senn (20)

What is your question
What is your questionWhat is your question
What is your question
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
The story of MTA/02
The story of MTA/02The story of MTA/02
The story of MTA/02
 
Confounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricksConfounding, politics, frustration and knavish tricks
Confounding, politics, frustration and knavish tricks
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
 

Recently uploaded

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 

Recently uploaded (20)

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 

P value wars

  • 1. P Value wars Stephen Senn (c) Stephen Senn 2017 1P value wars
  • 2. Acknowledgements (c) Stephen Senn 2017 2 Acknowledgements Thanks to the EMA for inviting me and to Olivier Collignon for organizing it This work is partly supported by the European Union’s 7th Framework Programme for research, technological development and demonstration under grant agreement no. 602552. “IDEAL” P value wars
  • 3. Outline • Basics – The difference between probability and statistics – A simple example • Recent criticisms of P-values/significance – Ioannides – Repligate and other psychology fights – ASA statement • A brief history of P-values – The real (hidden) reason why P-values are so controversial • Conclusions (c) Stephen Senn 2017 3P value wars
  • 4. BASICS P-values, Bayes The meaning of life (c) Stephen Senn 2017 4P value wars
  • 5. Probability versus Statistics an example Probability theory • Q. This die is fair, what is the probability that I will get two sixes in two rolls of the die • A. 1 6 2 = 1 36 Statistical theory • Q I rolled this die twice and got two sixes. What is the probability that the die is fair? • A. Nobody knows (c) Stephen Senn 2017 5P value wars
  • 6. Probabilists versus Statisticians: the difference Probabilists • Are mathematicians • Deal with direct probability statements • Plays a formal mathematical game • Are involved in the divine – God knows that dice is fair; let’s work out the consequences Statisticians • Are scientists • Deal with inverse probability statements • Are dealing with the real world • Are involved in the human – We are down here trying to work out the mind of God (c) Stephen Senn 2017 6P value wars
  • 7. P value wars (c) Stephen Senn 2017 7 An Example My compact disc (CD) player* allowed me to press tracks in sequential order by pressing play or in random order by playing shuffle. One day I was playing the CD Hysteria by Def Leppard. This CD has 12 tracks. I thought that I had pressed the shuffle button but the first track played was ‘women’, which is the first track on the CD. Q. What is the probability that I did, in fact, press the shuffle button as intended? *I now have an Ipod nano
  • 8. P value wars (c) Stephen Senn 2017 8 That Heavy Metal Problem The Bayesian Solution We have two basic hypotheses: 1) I pressed shuffle. 2) I pressed play. First we have to establish a so-called prior probability for these hypotheses: a probability before seeing the evidence. Suppose that the probability that I press the shuffle button when I mean to press the shuffle button is 9/10. The probability of making a mistake and pressing the play button is then 1/10.
  • 9. P value wars (c) Stephen Senn 2017 9 Next we establish probabilities of events given theories. These particular sorts of probabilities are referred to as likelihoods ,a term due to RA Fisher(1890-1962). If I pressed shuffle, then the probability that the first track will be ‘women’ (W) is 1/12. If I pressed play, then the probability that the first track is W is 1. For completeness (although it is not necessary for the solution) we consider the likelihoods had any other track apart from ‘women’ (say X) been played. If I pressed shuffle then the probability of X is 11/12. If I pressed play then this probability is 0.
  • 10. We can put this together as follows Hypothesis Prior Probability P Evidence Likelihood P x L Shuffle 9/10 W 1/12 9/120 Shuffle 9/10 X 11/12 99/120 Play 1/10 W 1 12/120 Play 1/10 X 0 0 TOTAL 120/120 = 1 P value wars (c) Stephen Senn 2017 10
  • 11. After seeing (hearing) the evidence, however, only two rows remain Hypothesis Prior Probability P Evidence Likelihood P x L Shuffle 9/10 W 1/12 9/120 Shuffle 9/10 X 11/12 99/120 Play 1/10 W 1 12/120 Play 1/10 X 0 0 TOTAL 21/120 P value wars (c) Stephen Senn 2017 11
  • 12. P value wars (c) Stephen Senn 2017 12 The probabilities of the two cases which remain do not add up to 1. However, since these two cases cover all the possibilities which remain, their combined probability must be 1. Therefore we rescale the individual probabilities to make them add to 1. We can do this without changing their relative value by dividing by their total, 21/120. This has been done in the table below.
  • 13. So we rescale by dividing by the total probability Hypothesis Prior Probability P Evidence Likelihood P x L Posterior Probability Shuffle 9/10 W 1/12 9/120 (9/120)/(21/120) =9/21 Shuffle 9/10 X 11/12 99/120 Play 1/10 W 1 12/120 (12/120)/(21/120) =12/21 Play 1/10 X 0 0 TOTAL 21/120 21/21=1 P value wars (c) Stephen Senn 2017 13
  • 14. What is the difference between the Bayesian probability and the P-value? Bayesian • It is the probability that the null hypothesis is true given the evidence – It is my posterior probability that I pressed shuffle • The value is 9/21 P-value • It is the probability of the evidence given the null hypothesis – Usually more extreme evidence must be included • The value is 1/12 (c) Stephen Senn 2017 14P value wars
  • 15. Extremism • For many practical applications every result is unlikely • Example: the probability that the mean difference in blood pressure is exactly 5mmHg is effectively zero • This has led to calculation of probability of result as extreme or more extreme to act as ‘p-value’ (c) Stephen Senn 2017 15P value wars
  • 16. NB Probability statements are not reversible • Is the Pope a Catholic? – Yes • Is a Catholic the Pope? – Probably not • The probability of the evidence given the hypothesis is not the same as the probability of the hypothesis given the evidence – As we saw from the CD player example (c) Stephen Senn 2017 16P value wars
  • 17. To sum up • The Bayesian approach provides a complete and logical solution • But it requires prior probabilities • You can’t just use arguments of symmetry • Different people will come to different conclusions (c) Stephen Senn 2017 17P value wars
  • 18. RECENT CRITICISMS Ioannides, Repligate and the ASA statement (c) Stephen Senn 2017 18P value wars
  • 19. Ioannidis (2005) • Claimed that most published research findings are wrong – By finding he means a ‘positive’ result • 4380 citations by 16 February 2017 according to Google Scholar (c) Stephen Senn 2017 19 𝑇𝑃𝑅 = 𝑅 1−𝛽 1+𝑅 𝛼+𝑅 1−𝛽 1+𝑅 = 𝑅 1−𝛽 𝛼+𝑅 1−𝛽 (TPR=True Positive Rate) P value wars
  • 20. (c) Stephen Senn 2017 20P value wars
  • 21. The Crisis of Replication (c) Stephen Senn 2017 21 In countless tweets….The “replication police” were described as “shameless little bullies,” “self-righteous, self-appointed sheriffs” engaged in a process “clearly not designed to find truth,” “second stringers” who were incapable of making novel contributions of their own to the literature, and—most succinctly—“assholes.” Why Psychologists’ Food Fight Matters “Important findings” haven’t been replicated, and science may have to change its ways. By Michelle N. Meyer and Christopher Chabris , Science P value wars
  • 22. Many Labs Replication Project (c) Stephen Senn 2017 22 Klein et al, Social Psychology, 2014P value wars
  • 23. (c) Stephen Senn 2017 23P value wars
  • 24. (c) Stephen Senn 2017 24P value wars
  • 25. The ASA statement (c) Stephen Senn 2017 25 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 4. Proper inference requires full reporting and transparency. 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. P value wars
  • 26. Obligatory purloined cartoon (c) Stephen Senn 2017 26P value wars
  • 27. Sympathy for Ron Waserstein (c) Stephen Senn 2017 27P value wars
  • 28. In summary • There has been mounting criticism of ‘null- hypothesis-significance-testing’ (NHST) and P- values in particular • Some people think that ‘science is broke’ • Some people claim that P-values are to blame – They give ‘significance’ far too easily • There is a lot of advice, much of it conflicting, flying around (c) Stephen Senn 2017 28P value wars
  • 29. An Example of the Problem “If you want to avoid making a fool of yourself very often, do not regard anything greater than p <0.001 as a demonstration that you have discovered something. Or, slightly less stringently, use a three-sigma rule.” David Colquhoun Royal Society Open Science 2014 In general, P values larger than 0.01 should be reported to two decimal places, those between 0.01 and 0.001 to three decimal places; P values smaller than 0.001 should be reported as P<0.001 New England Journal of Medicine guidelines to authors (c) Stephen Senn 2017 29P value wars
  • 30. A BRIEF HISTORY OF P-VALUES Did Fisher really teach scientists to fish for significance? (c) Stephen Senn 2017 30P value wars
  • 31. (c) Stephen Senn 2017 31 The collective noun for statisticians is “a quarrel” John Tukey P value wars
  • 32. A Common Story • Scientists were treading the path of Bayesian reason • Along came RA Fisher and persuaded them into a path of P-value madness • This is responsible for a lot of unrepeatable nonsense • We need to return them to the path of Bayesian virtue • In fact the history is not like this and understanding this is a key to understanding the problem (c) Stephen Senn 2017 32P value wars
  • 33. (c) Stephen Senn 2017 33 Fisher, Statistical Methods for Research Workers, 1925 P value wars
  • 34. (c) Stephen Senn 2017 34 How Student sees it P value wars
  • 35. (c) Stephen Senn 2017 35 How Fisher sees its P value wars
  • 36. (c) Stephen Senn 2017 36 Both points of view P value wars
  • 37. The real history • Scientists before Fisher were using tail area probabilities to calculate posterior probabilities – This was following Laplace’s use of uninformative prior distributions • Fisher pointed out that this interpretation was unsafe and offered a more conservative one • Jeffreys, influenced by CD Broad’s criticism, was unsatisfied with the Laplacian framework and used a lump prior probability on a point hypothesis being true – Etz and Wagenmakers have claimed that Haldane 1932 anticipated Jeffreys • It is Bayesian Jeffreys versus Bayesian Laplace that makes the dramatic difference, not frequentist Fisher versus Bayesian Laplace (c) Stephen Senn 2017 37P value wars
  • 38. What Jeffreys Understood (c) Stephen Senn 2017 38 Theory of Probability, 3rd edition P128 P value wars
  • 39. CD Broad 1887*-1971 • Graduated Cambridge 1910 • Fellow of Trinity 1911 • Lectured at St Andrews & Bristol • Returned to Cambridge 1926 • Knightbridge Professor of Philosophy 1933-1953 • Interested in epistemology and psychic research *NB Harold Jeffreys born 1891 (c) Stephen Senn 2017 39P value wars
  • 40. CD Broad, 1918 (c) Stephen Senn 2017 40 P393 p394 P value wars
  • 41. What Jeffreys concluded • If you have an uninformative prior distribution the probability of a precise hypothesis is very low • It will remain low even if you have lots of data consistent with it • You need to allocate a solid lump of probability that it is true • Nature has decided, other things being equal, that simpler hypotheses are more likely (c) Stephen Senn 2017 41P value wars
  • 42. (c) Stephen Senn 2017 42P value wars
  • 43. Why the difference? • Imagine a point estimate of two standard errors • Now consider the likelihood ratio for a given value of the parameter,  under the alternative to one under the null – Dividing hypothesis (smooth prior) for any given value  = ’ compare to  = -’ – Plausible hypothesis (lump prior) for any given value  = ’ compare to  = 0 (c) Stephen Senn 2017 43P value wars
  • 44. In summary • The major disagreement is not between P-values and Bayes using informative prior distribution • It’s between two Bayesian approaches – Using uninformative prior distributions – Using highly informative one • The conflict is not going to go away by banning P- values • There is no automatic Bayesianism – You have to do it for real (c) Stephen Senn 2017 44P value wars
  • 45. CONCLUSION Statistics is difficult and statisticians should be paid more (c) Stephen Senn 2017 45P value wars
  • 46. What you should know • P=0.05 is a weak standard of evidence • Requiring two trials is a good idea – NB If you replace them by a single larger trial or a meta-analysis ask for P=1/800 – Variation in results from trial to trial is natural • Don’t just rely on P-values • Bayes is harder to do than many people think • Pay attention to the ASA advice (c) Stephen Senn 2017 46P value wars
  • 47. In Conclusion The Solid Six? (c) Stephen Senn 2017 47 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.  4. Proper inference requires full reporting and transparency.  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.  P value wars
  • 48. However 48 Proponents of the “Bayesian revolution” should be wary of chasing yet another chimera: an apparently universal inference procedure. A better path would be to promote both an understanding of the various devices in the “statistical toolbox” and informed judgment to select among these. Gigerenzer and Marewski, Journal of Management, Vol. 41 No. 2, February 2015 421–440 (c) Stephen Senn 2017P value wars