SlideShare a Scribd company logo
1 of 58
Download to read offline
Null hypothesis
p-value Chi-square test
G-test
t-test
ANOVA
Non-parametric tests
Statistical power
Multiple test corrections
Null hypothesis
p-value Chi-square test
G-test
t-test
ANOVA
Non-parametric tests
Statistical power
Multiple test corrections
WRONG
WRONG
1,000 tests
real effect in 10%
100 tests
no effect in 90%
900 tests
Pr(real) = 0.1
power = 80%
significance level = 5%
80% detected
80 true positives
20% not detected
20 false negatives
95% tested negative
855 true negatives
5% “detected”
45 false positives
False discovery rate
𝐹𝐷𝑅 =
false positives
discoveries
𝐹𝐷𝑅 =
45
45 + 80
= 0.36
Colquhoun D., 2014, “An investigation of the false discovery rate
and the misinterpretation of p-values”, R. Soc. open sci. 1: 140216.
False positive rate
𝐹𝑃𝑅 =
false positives
no effect
𝐹𝑃𝑅 =
45
900
= 0.05
If you publish a 𝑝 < 0.05
result, you have a 36%
chance of making a fool
of yourself
Colquhoun D., 2014, “An investigation of the false discovery rate
and the misinterpretation of p-values”, R. Soc. open sci. 1: 140216.
1,000 tests
real effect in 10%
100 tests
no effect in 90%
900 tests
Pr(real) = 0.1
power = 80%
significance level = 5%
80% detected
80 true positives
20% not detected
20 false negatives
95% tested negative
855 true negatives
5% “detected”
45 false positives
What’s wrong with p-values?
Hand-outs available at http://is.gd/statlec
Marek Gierliński
Division of Computational Biology
❌
❌
❌
Statistical testing
Statistical model
Null hypothesis
H0: no effect
All other assumptions
Significance level
𝛼 = 0.05
p-value: probability that the
observed effect is random
𝑝 < 𝛼
Reject H0
(at your own risk)
Effect is real
𝑝 ≥ 𝛼
Insufficient evidence
Statistical test against H0
Data
p-value:
Given that H0 is true, the probability of
observed, or more extreme, data
It is not the probability that H0 is true
P-value is the degree to which the data are
embarrassed by the null hypothesis
Nicholas Maxwell
“All other assumptions”
Null hypothesis
H0: no effect
Significance level
𝛼 = 0.05
𝑝 < 𝛼
Reject H0
𝑝 ≥ 𝛼
Do not reject H0
Experimental
protocols followed
Instruments
calibrated
Data collected
correctly
All other assumptions
about biology are
correct No other effects
No other effects
𝑝 < 𝛼
Reject H0
No silly misakes
1
p-values test not only the null hypothesis,
but everything else in the experiment
𝐹𝐷𝑅 =
45
45 + 80
= 0.36
1,000 tests
real effect in 10%
100 tests
no effect in 90%
900 tests
Pr(real) = 0.1
power = 80%
significance level = 5%
80% detected
80 true positives
20% not detected
20 false negatives
95% tested negative
855 true negatives
5% “detected”
45 false positives
Why large false discovery rate?
Simulated population of mice
13
No effect (97%)
𝜇C = 20 g
𝜎 = 5 g
Real effect (3%)
𝜇F = 30 g
𝜎 = 5 g
Power analysis
effect size Δ𝑚 = 10 g
power 𝒫 = 0.9
significance level 𝛼 = 0.05
sample size 𝑛 = 5
Null hypothesis H0: 𝜇 = 20 g
one-sample t-test
Gedankenexperiment: distribution of p-values
14
𝛼 = 0.05
positives
Gedankenexperiment: “significant” p-values
15
No effect
Real effect
True positives
False positives
𝐹𝐷𝑅 =
𝐹𝑃
𝐹𝑃 + 𝑇𝑃
≈ 0.63
𝛼 = 0.05
positives
Small 𝛼 doesn’t help
16
True positives
False positives
𝐹𝐷𝑅 =
𝐹𝑃
𝐹𝑃 + 𝑇𝑃
≈ 0.20
No effect
Real effect
positives
𝛼 = 0.001
2
The chance of making a fool of yourself
is much larger than 𝛼 = 0.05
FDR depends on the probability of real effect
18
True positives
False positives
𝛼 = 0.05
𝐹𝐷𝑅 ≈ 0.05
No effect (50%) Real effect (50%)
3
When the effect is rare,
you are screwed
What does a p-value ~ 0.05 really mean?
20
True positives
False positives
𝛼 ~ 0.05
𝐹𝐷𝑅 = 0.21
Bayesian approach: consider all prior distributions
21
Berger & Selke
(Bayesian approach)
𝑝 ~ 0.05 ⇒ 𝐹𝐷𝑅 ≥ 0.3
3-sigma approach
𝑝 ~ 0.003 ⇒ 𝐹𝐷𝑅 ≥ 0.04
Berger J.O, Selke T., “Testing a
point null hypothesis: the
irreconcilability of P values and
evidence”, 1987, JASA, 82, 112-
122
Simulation
4
When you get a 𝑝 ~ 0.05,
you are screwed
Gedankenexperiment: reliability of p-values
23
Normal population, 100% real effect
One-sample t-test
p-values can be
unreliable
Sample size = 3, power = 0.18 Sample size = 10, power = 0.80
Underpowered studies lead to
unreliable p-values
Inflation of the effect size
25
real mean effect size = 5 g
estimated
mean effect
size = 7.3 g
Underpowered studies lead to
unreliable p-values
Underpowered studies lead to
overestimated effect size
5
When your experiment is underpowered,
you are screwed
Neuroscience: most studies underpowered
Button et al. (2013) “Power failure: why small sample size undermines the
reliability of neuroscience”, Nature Reviews Neuroscience 14, 365-376
The effect size
𝑝 = 0.004
The effect size
With sample size large
enough everything is
“significant”
Effect size is more important
Looking at whole data is
even more important
𝑝 = 0.004
𝑛F = 775 𝑛Q = 392
6
When you have lots of replicates,
p-values are useless
7
Statistical significance does not imply
biological relevance
Multiple test corrections can be tricky
10,000 genes 10,000 tests
Benjamini-Hochberg
correction
RESULT
Multiple test corrections can be tricky
10,000 genes 10,000 tests
Benjamini-Hochberg
correction
RESULT
Complex experiment
Multi-dimensional data
Searching...
Nothing Searching...
Nothing
Searching...
Nothing
Searching...
RESULT
Batch effects? No
8
It is not always obvious how to correct
p-values
What’s wrong with p-values?
P-values test not only the
targeted null hypothesis,
but everything else in the
experiment
The chance of making a
fool of yourself is much
larger than 𝛼 = 0.05
When you get a
𝑝 ~ 0.05, you are
screwed
When your experiment is
underpowered, you are
screwed
Multiple test corrections
are tricky
A lot, because
statistics
When the effect is rare,
you are screwed
Statistical significance
does not imply biological
relevance
When you have lots of
replicates, p-values are
useless
P-values test not only the
targeted null hypothesis,
but everything else in the
experiment
When the effect is rare,
you are screwed
When you get a
𝑝 ~ 0.05, you are
screwed
When your experiment is
underpowered, you are
screwed
Statistical significance
does not imply biological
relevance
When you have lots of
replicates, p-values are
useless
Multiple test corrections
are tricky
The chance of making a
fool of yourself is much
larger than 𝛼 = 0.05
The plain fact is that 70 years ago
Ronald Fisher gave scientists a
mathematical machine for turning
baloney into breakthroughs, and flukes
into funding. It is time to pull the plug.
Robert Matthews, Sunday Telegraph,
13 September 1998.
Null hypothesis significance testing is a potent
but sterile intellectual rake who leaves in his
merry path a long train of ravished maidens
but no viable scientific offspring.
Paul Meehl, 1967, Philosophy of Science, 34,
103-115
The widespread use of “statistical
significance” as a license for
making a claim of a scientific
finding leads to considerable
distortion of the scientific process.
ASA statement on statistical
significance and p-values (2016)
By Jim Borgman, first published by the Cincinnati Inquirer 27 April 1997
What’s wrong with us?
Journal of the American Statistical Association,
Vol. 54, No. 285 (Mar., 1959), pp. 30-34
“There is some evidence that [...] research which yields nonsigificant results is not
published. Such research being unknown to other investigators may be repeated
independently until eventually by chance a significant result occurs [...] The
possibility thus arises that the literature [...] consists in substantial part of false
conclusions [...].”
Canonization of false facts
Nissen S.B., et al., “Research: Publication bias and the
canonization of false facts”, eLife 2016;5:e21451
Canonization of false facts
Negative publication rate
Probability
of
canonizing
false
claim
as
fact
Nissen S.B., et al., “Research: Publication bias and the
canonization of false facts”, eLife 2016;5:e21451
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
𝛼 = 0.05
Power = 0.8
If you don’t publish negative results,
science is screwed
but...
there is a thin line between “negative
result” and “no result”
Data dredging, p-hacking
Searching until you find the
result you were looking for
Massaging data
Post-hoc hypothesis
Unaccounted multiple
experiments/tests
𝑝 = 0.06?
Let’s try again
Ignoring
confounding effects
Not reporting non-
significant results
Evidence of p-hacking
46
Head M.L., et al. “The Extent and Consequences of P-Hacking
in Science”, PLoS Biol 13(3)
Distribution of p-values reported in publications
Evidence of p-hacking
𝑛F 𝑛Q H0: 𝑛F = 𝑛Q
Reproducibility crisis
Open Science Collaboration, “Estimating the reproducibility
of psychological science”, Science, 349 (2015)
Managed to reproduce only 39% results
Reproducibility crisis
Nature's survey of 1,576 researchers
The great reproducibility experiment
Are referees more likely to give red cards to black players?
51
• one data set
• 29 teams
• 61 scientists
• task: find odds ratio
Silberzahn et al., “Many analysts, one dataset: Making
transparent how variations in analytical choices affect
results”, https://osf.io/j5v8f
52
P-values are broken
We are broken
What do we do?
What the hell do we do?
Before you do the experiment
talk to us
The Data Analysis Group
http://www.compbio.dundee.ac.uk/dag.html
Specify the null
hypothesis
Design the experiment
• randomization
• statistical power
Quality control
some crap comes out in statistics
We assumed the null hypothesis
Never, ever say that large 𝑝 supports H0
Ditch the 𝜶 limit
use p-values as a continuous measure of
data incompatibility with H0
𝑝 ~ 0.05 only means ‘worth a look’
Reporting a discovery based only on
𝑝 < 0.05 is wrong
Use the three-sigma rule
that is 𝑝 < 0.003, to demonstrate a
discovery
Reporting
• Always report the effect size and its confidence limits
• Show data (not dynamite plots)
• Don’t use the word ‘significant’
• Don’t use asterisks to mark ‘significant’ results in
figures
Validation
Follow-up experiments to
confirm discoveries
Publication
Publish negative results
ASA Statement on Statistical Significance and P-Values
1. P-values can indicate how incompatible the data are with a specified statistical
model
2. P-values do not measure the probability that the studied hypothesis is true, or
the probability that the data were produced by random chance alone
3. Scientific conclusions and business or policy decisions should not be based only
on whether a p-value passes a specific threshold
4. Proper inference requires full reporting and transparency
5. A p-value, or statistical significance, does not measure the size of an effect or
the importance of a result
6. By itself, a p-value does not provide a good measure of evidence regarding a
model or hypothesis
https://is.gd/asa_stat
Hand-outs available at
http://is.gd/statlec

More Related Content

Similar to p-values9.pdf

"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 
Research Sample size by Dr Allah Yar Malik
Research Sample size by Dr Allah Yar MalikResearch Sample size by Dr Allah Yar Malik
Research Sample size by Dr Allah Yar Malikhuraismalik
 
Statistics for Lab Scientists
Statistics for Lab ScientistsStatistics for Lab Scientists
Statistics for Lab ScientistsMike LaValley
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfallsPavlos Msaouel, MD, PhD
 
How to read a paper
How to read a paperHow to read a paper
How to read a paperfaheta
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overviewi i
 
Lecture2 hypothesis testing
Lecture2 hypothesis testingLecture2 hypothesis testing
Lecture2 hypothesis testingo_devinyak
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testingWayne Lee
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statisticsJiri Haviger
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)jemille6
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptxYashwanth Rm
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 

Similar to p-values9.pdf (20)

"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 
Research Sample size by Dr Allah Yar Malik
Research Sample size by Dr Allah Yar MalikResearch Sample size by Dr Allah Yar Malik
Research Sample size by Dr Allah Yar Malik
 
Statistics for Lab Scientists
Statistics for Lab ScientistsStatistics for Lab Scientists
Statistics for Lab Scientists
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
Hypothesis testing 1.0
Hypothesis testing 1.0Hypothesis testing 1.0
Hypothesis testing 1.0
 
How to read a paper
How to read a paperHow to read a paper
How to read a paper
 
To p or not to p
To p or not to pTo p or not to p
To p or not to p
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overview
 
Lecture2 hypothesis testing
Lecture2 hypothesis testingLecture2 hypothesis testing
Lecture2 hypothesis testing
 
On p-values
On p-valuesOn p-values
On p-values
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Crash Course in A/B testing
Crash Course in A/B testingCrash Course in A/B testing
Crash Course in A/B testing
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statistics
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptx
 
Copenhagen 2008
Copenhagen 2008Copenhagen 2008
Copenhagen 2008
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 

Recently uploaded

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

p-values9.pdf

  • 1. Null hypothesis p-value Chi-square test G-test t-test ANOVA Non-parametric tests Statistical power Multiple test corrections
  • 2. Null hypothesis p-value Chi-square test G-test t-test ANOVA Non-parametric tests Statistical power Multiple test corrections WRONG WRONG
  • 3. 1,000 tests real effect in 10% 100 tests no effect in 90% 900 tests Pr(real) = 0.1 power = 80% significance level = 5% 80% detected 80 true positives 20% not detected 20 false negatives 95% tested negative 855 true negatives 5% “detected” 45 false positives False discovery rate 𝐹𝐷𝑅 = false positives discoveries 𝐹𝐷𝑅 = 45 45 + 80 = 0.36 Colquhoun D., 2014, “An investigation of the false discovery rate and the misinterpretation of p-values”, R. Soc. open sci. 1: 140216. False positive rate 𝐹𝑃𝑅 = false positives no effect 𝐹𝑃𝑅 = 45 900 = 0.05
  • 4. If you publish a 𝑝 < 0.05 result, you have a 36% chance of making a fool of yourself Colquhoun D., 2014, “An investigation of the false discovery rate and the misinterpretation of p-values”, R. Soc. open sci. 1: 140216. 1,000 tests real effect in 10% 100 tests no effect in 90% 900 tests Pr(real) = 0.1 power = 80% significance level = 5% 80% detected 80 true positives 20% not detected 20 false negatives 95% tested negative 855 true negatives 5% “detected” 45 false positives
  • 5. What’s wrong with p-values? Hand-outs available at http://is.gd/statlec Marek Gierliński Division of Computational Biology
  • 7. Statistical testing Statistical model Null hypothesis H0: no effect All other assumptions Significance level 𝛼 = 0.05 p-value: probability that the observed effect is random 𝑝 < 𝛼 Reject H0 (at your own risk) Effect is real 𝑝 ≥ 𝛼 Insufficient evidence Statistical test against H0 Data
  • 8. p-value: Given that H0 is true, the probability of observed, or more extreme, data It is not the probability that H0 is true
  • 9. P-value is the degree to which the data are embarrassed by the null hypothesis Nicholas Maxwell
  • 10. “All other assumptions” Null hypothesis H0: no effect Significance level 𝛼 = 0.05 𝑝 < 𝛼 Reject H0 𝑝 ≥ 𝛼 Do not reject H0 Experimental protocols followed Instruments calibrated Data collected correctly All other assumptions about biology are correct No other effects No other effects 𝑝 < 𝛼 Reject H0 No silly misakes
  • 11. 1 p-values test not only the null hypothesis, but everything else in the experiment
  • 12. 𝐹𝐷𝑅 = 45 45 + 80 = 0.36 1,000 tests real effect in 10% 100 tests no effect in 90% 900 tests Pr(real) = 0.1 power = 80% significance level = 5% 80% detected 80 true positives 20% not detected 20 false negatives 95% tested negative 855 true negatives 5% “detected” 45 false positives Why large false discovery rate?
  • 13. Simulated population of mice 13 No effect (97%) 𝜇C = 20 g 𝜎 = 5 g Real effect (3%) 𝜇F = 30 g 𝜎 = 5 g Power analysis effect size Δ𝑚 = 10 g power 𝒫 = 0.9 significance level 𝛼 = 0.05 sample size 𝑛 = 5 Null hypothesis H0: 𝜇 = 20 g one-sample t-test
  • 14. Gedankenexperiment: distribution of p-values 14 𝛼 = 0.05 positives
  • 15. Gedankenexperiment: “significant” p-values 15 No effect Real effect True positives False positives 𝐹𝐷𝑅 = 𝐹𝑃 𝐹𝑃 + 𝑇𝑃 ≈ 0.63 𝛼 = 0.05 positives
  • 16. Small 𝛼 doesn’t help 16 True positives False positives 𝐹𝐷𝑅 = 𝐹𝑃 𝐹𝑃 + 𝑇𝑃 ≈ 0.20 No effect Real effect positives 𝛼 = 0.001
  • 17. 2 The chance of making a fool of yourself is much larger than 𝛼 = 0.05
  • 18. FDR depends on the probability of real effect 18 True positives False positives 𝛼 = 0.05 𝐹𝐷𝑅 ≈ 0.05 No effect (50%) Real effect (50%)
  • 19. 3 When the effect is rare, you are screwed
  • 20. What does a p-value ~ 0.05 really mean? 20 True positives False positives 𝛼 ~ 0.05 𝐹𝐷𝑅 = 0.21
  • 21. Bayesian approach: consider all prior distributions 21 Berger & Selke (Bayesian approach) 𝑝 ~ 0.05 ⇒ 𝐹𝐷𝑅 ≥ 0.3 3-sigma approach 𝑝 ~ 0.003 ⇒ 𝐹𝐷𝑅 ≥ 0.04 Berger J.O, Selke T., “Testing a point null hypothesis: the irreconcilability of P values and evidence”, 1987, JASA, 82, 112- 122 Simulation
  • 22. 4 When you get a 𝑝 ~ 0.05, you are screwed
  • 23. Gedankenexperiment: reliability of p-values 23 Normal population, 100% real effect One-sample t-test p-values can be unreliable Sample size = 3, power = 0.18 Sample size = 10, power = 0.80
  • 24. Underpowered studies lead to unreliable p-values
  • 25. Inflation of the effect size 25 real mean effect size = 5 g estimated mean effect size = 7.3 g
  • 26. Underpowered studies lead to unreliable p-values Underpowered studies lead to overestimated effect size
  • 27. 5 When your experiment is underpowered, you are screwed
  • 28. Neuroscience: most studies underpowered Button et al. (2013) “Power failure: why small sample size undermines the reliability of neuroscience”, Nature Reviews Neuroscience 14, 365-376
  • 30. The effect size With sample size large enough everything is “significant” Effect size is more important Looking at whole data is even more important 𝑝 = 0.004 𝑛F = 775 𝑛Q = 392
  • 31. 6 When you have lots of replicates, p-values are useless
  • 32. 7 Statistical significance does not imply biological relevance
  • 33. Multiple test corrections can be tricky 10,000 genes 10,000 tests Benjamini-Hochberg correction RESULT
  • 34. Multiple test corrections can be tricky 10,000 genes 10,000 tests Benjamini-Hochberg correction RESULT Complex experiment Multi-dimensional data Searching... Nothing Searching... Nothing Searching... Nothing Searching... RESULT Batch effects? No
  • 35. 8 It is not always obvious how to correct p-values
  • 36. What’s wrong with p-values? P-values test not only the targeted null hypothesis, but everything else in the experiment The chance of making a fool of yourself is much larger than 𝛼 = 0.05 When you get a 𝑝 ~ 0.05, you are screwed When your experiment is underpowered, you are screwed Multiple test corrections are tricky A lot, because statistics When the effect is rare, you are screwed Statistical significance does not imply biological relevance When you have lots of replicates, p-values are useless P-values test not only the targeted null hypothesis, but everything else in the experiment When the effect is rare, you are screwed When you get a 𝑝 ~ 0.05, you are screwed When your experiment is underpowered, you are screwed Statistical significance does not imply biological relevance When you have lots of replicates, p-values are useless Multiple test corrections are tricky The chance of making a fool of yourself is much larger than 𝛼 = 0.05
  • 37.
  • 38. The plain fact is that 70 years ago Ronald Fisher gave scientists a mathematical machine for turning baloney into breakthroughs, and flukes into funding. It is time to pull the plug. Robert Matthews, Sunday Telegraph, 13 September 1998. Null hypothesis significance testing is a potent but sterile intellectual rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring. Paul Meehl, 1967, Philosophy of Science, 34, 103-115 The widespread use of “statistical significance” as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process. ASA statement on statistical significance and p-values (2016)
  • 39. By Jim Borgman, first published by the Cincinnati Inquirer 27 April 1997
  • 41. Journal of the American Statistical Association, Vol. 54, No. 285 (Mar., 1959), pp. 30-34 “There is some evidence that [...] research which yields nonsigificant results is not published. Such research being unknown to other investigators may be repeated independently until eventually by chance a significant result occurs [...] The possibility thus arises that the literature [...] consists in substantial part of false conclusions [...].”
  • 42. Canonization of false facts Nissen S.B., et al., “Research: Publication bias and the canonization of false facts”, eLife 2016;5:e21451
  • 43. Canonization of false facts Negative publication rate Probability of canonizing false claim as fact Nissen S.B., et al., “Research: Publication bias and the canonization of false facts”, eLife 2016;5:e21451 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 𝛼 = 0.05 Power = 0.8
  • 44. If you don’t publish negative results, science is screwed but... there is a thin line between “negative result” and “no result”
  • 45. Data dredging, p-hacking Searching until you find the result you were looking for Massaging data Post-hoc hypothesis Unaccounted multiple experiments/tests 𝑝 = 0.06? Let’s try again Ignoring confounding effects Not reporting non- significant results
  • 46. Evidence of p-hacking 46 Head M.L., et al. “The Extent and Consequences of P-Hacking in Science”, PLoS Biol 13(3) Distribution of p-values reported in publications Evidence of p-hacking 𝑛F 𝑛Q H0: 𝑛F = 𝑛Q
  • 47. Reproducibility crisis Open Science Collaboration, “Estimating the reproducibility of psychological science”, Science, 349 (2015) Managed to reproduce only 39% results
  • 49.
  • 51. Are referees more likely to give red cards to black players? 51 • one data set • 29 teams • 61 scientists • task: find odds ratio Silberzahn et al., “Many analysts, one dataset: Making transparent how variations in analytical choices affect results”, https://osf.io/j5v8f
  • 52. 52
  • 54. What do we do? What the hell do we do?
  • 55. Before you do the experiment talk to us The Data Analysis Group http://www.compbio.dundee.ac.uk/dag.html
  • 56. Specify the null hypothesis Design the experiment • randomization • statistical power Quality control some crap comes out in statistics We assumed the null hypothesis Never, ever say that large 𝑝 supports H0 Ditch the 𝜶 limit use p-values as a continuous measure of data incompatibility with H0 𝑝 ~ 0.05 only means ‘worth a look’ Reporting a discovery based only on 𝑝 < 0.05 is wrong Use the three-sigma rule that is 𝑝 < 0.003, to demonstrate a discovery Reporting • Always report the effect size and its confidence limits • Show data (not dynamite plots) • Don’t use the word ‘significant’ • Don’t use asterisks to mark ‘significant’ results in figures Validation Follow-up experiments to confirm discoveries Publication Publish negative results
  • 57. ASA Statement on Statistical Significance and P-Values 1. P-values can indicate how incompatible the data are with a specified statistical model 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold 4. Proper inference requires full reporting and transparency 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis https://is.gd/asa_stat