4. “History of science teaches us that scientific endeavor has often
in the past wasted effort in fields with absolutely no yield of
true scientific information.”
“Research is not most appropriately represented and
summarized by p-value.”
“The claimed effect sizes are simply measuring nothing else but
the net bias that has been involved in the generation of this
scientific literature.”
John P.A Ioannidis PLoS Medicine 2005 2(8) e124
5. How inference from two-group comparisons goes
An example:
Two randomized groups with placebo and treatment
Estimated difference in survival time between placebo and
treatment group is 1 year
Compute p-value:
P(Such a large difference|Genuinely no effect)
Claim that treatment has an effect if p-val is low
8. Looking at the right probability
Event A: treatment has a genuine effect
Event B: treatment seems to have an effect in one experiment
p(A|B) and p(B|A) have no reason to be equal (not even close)
p(A|B) says nothing about p(B|A)
10. Overview
Points out the need to consider the right probability
Explains why evidence about treatment effect can be low
even when p-value is high
Aspects considered:
significance threshold
power
relevance of questions investigated
publication “bias”
number of studies carried out
Analyses sources of issues in various scientific contexts
Proposes solutions for improvement
11. Modelling framework
A simplified three-step context
1. Select a research hypothesis
2. Obtain data
3. Make claim based on data
12. 1. Select a research hypothesis
Research hypothesis typically: Does drug A have a genuine effect?
The hypothesis is either true or false:
Drug A has genuinely an effect
Drug A has genuinely no effect
The choice of a research question is treated here as a random
decision
Makes most sense in exploratory studies like genetic association
studies
13. Select a research hypothesis (cont’)
R ratio of number of “true hypotheses” to “false hypotheses”
R
R+1 pre-study probability for an hypothesis to be true
14. 2. Obtain data and make claim based on data
All very classical here:
Data subject to variability
Variability accounted for by statistical model
Claim follows from statistical testing procedure
Claim based on p(Data|Truth)
15. Obtain data and make claim based on data (cont’)
Processed by which claim is made depends on many things, most
conveniently summarised by
Type I error rate α
Power 1 − β
16. The Positive Predictive Value
PPV = p(Genuine effect|Claim effect)
Complementary probability of what Wacholder et al. (2004)
have called the false positive report.
PPV is a more relevant quantity to look at than a p-value
(previously pointed out by many, including Sterne and
Davey-Smith 2001)
17. PPV and p-value
PPV = p(Genuine effect|Claim effect)
1 − PPV = p(Genuinely no effect|Claim effect)
pval = p(Claim effect|Genuinely no effect)
1-PPV is the reverse probability of the p-value
1-PPV is p-value put the right way
. . . a common argument of Bayesian against frequentist statisticians.
18. How large is PPV in general?
PPV can be expressed in terms of R, α and β as:
PPV =
1
1 + α
(1−β)R
PPV increases as α decreases
PPV increases as 1 − β increases
PPV increases as R increases
A research finding is more likely true than false if PPV > 0.5
19. How large is PPV in general? (cont’)
Some orders of magnitude
alpha 1-beta R PPV
0.05 0.8 0.500 0.889
0.05 0.8 0.100 0.615
0.05 0.8 0.010 0.138
0.05 0.8 0.001 0.016
24. PPV in presence of “bias”
u: proportion of probed analyses that would not have been
“research findings,” but nevertheless end up presented and
reported as such.
Bias = diplomatic wording for more or less intentional scientific
mistake
In presence of a level u of bias, PPV becomes
PPV =
1
1 + α+u−uα
(1−β)R+uβR
25. Deriving the expression of PPV in presence of bias
We keep the distinction between whether there is genuinely an
effect or not and whether an effect is claimed or not.
Same notation as before (GE: Genuine Effect , GNE: Genuinely
no effect, CE: Claim Effect)
Now, among claimed effects, some should not have been
claimed (“bias”)
u = proportion of probed analyses that would not have been
“research findings,” but nevertheless end up presented and
reported as such, because of bias.
In other words u = p(CE|Not clamable)
26. Deriving the expression of PPV in presence of bias
(cont’)
PPV = P(GE|CE) = p(CE|GE)p(GE)/p(CE)
When a claim is made, it is either a legitimate claim (Clamable) or
an illegitimate claim.
p(CE|GE) = p(CE, Clamable|GE) + p(CE, NClamable|GE)
= p(CE, Clamable|GE) +
p(CE|NClamableGE)p(NClamable|GE)
= 1 − β + uβ (1)
27. Deriving the expression of PPV in presence of bias
(cont’)
p(CE) = p(CE|GE)p(GE) + p(CE|GNE)p(GNE)
Regarding p(CE|GNE) we have:
p(CE|GNE) = p(CE, Clamable|GNE) + p(CE, NClamable|GNE)
= p(CE, Clamable|GNE) +
p(CE|NClamable, GNE)p(NClamable|GNE)
= α + u(1 − α)
As in absence of bias p(GE) = R/(R + 1)
28. Deriving the expression of PPV in presence of bias
(cont’)
Putting p(CE|GE), p(GE) and p(CE) together we get:
PPV = [(1 − β) + uβ]
R
R + 1
×
1
[(1 − β) + uβ] R
R+1 + [α + u(1 − α)] 1
R+1
=
[(1 − β) + uβ]R
[(1 − β) + uβ]R + α + u(1 − α)
=
1
1 + α+u(1−α)
[(1−β)+uβ]R
29. Corrolaries
The smaller the studies conducted in a scientific field, the less
likely the research findings are to be true.
The smaller the effect sizes in a scientific field, the less likely
the research findings are to be true.
[. . . ]
30. Solutions?
Better powered evidence
Do not emphasize the statistically significant findings of any
single team.
Improve our understanding of the range of R values
32. References
Benjamin, Daniel J, James O Berger, Magnus Johannesson, Brian A
Nosek, E-J Wagenmakers, Richard Berk, Kenneth A Bollen, et al.
2018. “Redefine Statistical Significance.” Nature Human Behaviour
2 (1). Nature Publishing Group:6.
Sterne, JAC, and G Davey-Smith. 2001. “Sifting the
Evidence—What’s Wrong with Significance Tests?” Physical
Therapy 81 (8). Oxford University Press:1464–9.
Wacholder, S, S Chanock, M Garcia-Closas, L El Ghormli, and N
Rothman. 2004. “Assessing the Probability That a Positive Report
Is False: An Approach for Molecular Epidemiology Studies.” Journal
of the National Cancer Institute 96 (6). Oxford University
Press:434–42.