SlideShare a Scribd company logo
1 of 43
Download to read offline
The ASA president Task Force Statement on
Statistical Significance and Replicability
Yoav Benjamini
Dept. of Statistics & Operations Research
School of Math Sciences and Sagol School for Neuroscience
Tel Aviv University
www. math.tau.ac.il/~ybenja
January 11, 2022
Research supported by NIH-BSF grant
The Reproducibility and Replicability Concern
In Medicine, Soric (‘89)
“There is a danger that a large part of science can thus be
made untrue”
In Genomics, Lander and Kruglyak (‘95) warned
“There is danger that a substantial proportion of our claims
cannot be replicated”
The beginning of the Industrial revolution in science
The industrialization of the scientific process
1888 1999
1950 2010
Their common argument: select the significant results
20 true effects
180 false effects
Power 0.8
a =0.05
On the average : 9 false discoveries 16 true ones
False Discovery Proportion 9/(9+16)=0.36
20 true effects
180 false effects
Power 0.8
a =0.05
On the average : 9 false discoveries 16 true ones
False Discovery Proportion 36/(36+16)=0.68
720
36
Lander& Kruglyak: use Bonferroni 0.05->.05/740
Soric: motivated Hochberg and YB work on FDR and BH
Their common argument: select the significant results
The Reproducibility and Replicability Crisis
Psychological Science Journal:
Do not trust any p value. …
Routinely report 95% CIs…
Basic and Applied Social Psychology (‘15):
Bans the Null Hypothesis Statistical Testing…
American Stat. Assoc. (ASA) Board’s statement about p-values 1 (‘16)
Opens: The p-value “can be useful”
Then comes: a list of “do not” ”is not” and “should not”
“leads to distortion” –
All warnings phrased about the p-value
7
Is it the p-value’s fault?
Bayesian vs Frequentists rivalry focusing on p-value,/test/selection
It concludes: “In view of the prevalent misuses of and
misconceptions concerning p-values, some statisticians prefer to
supplement or even replace p-values with other approaches. “
It is the p-value’s fault
“We’re finally starting to get rid of the p-value tyranny”
8
Bayesian vs Frequentist rivalry focusing on p-value/test/selection
What other approaches were mentioned in ASA statement?
Confidence intervals
Prediction intervals
Estimation
Likelihood ratios
Bayesian methods
Bayes factor
Credibility intervals
9
The American Statistician March 2019 Issue
43 papers by participants
An editorial by Ron Wasserstein, Allen Schirm & Nicole Lazar
Moving to a world
beyond ‘p<.05’
Summary of the 43
papers by
Wasseerstein, Schirm &
Lazar
• Don’t use p<.05
• Don’t say
“statistically
significant”
• or use any bright-line
rule
There are many Do’s
Accept Uncertainty.
Be Thoughtful, Open
And Modest
A Replicability Crisis turned into a Statistical Crisis
The American Statistical Association (ASA) has recently issued
two statements on the “p-value < 0.05” criterion for statistical
significance widely used in many fields of science: see
Wasserstein and Lazar (2016) and Wasserstein et al. (2019). The
statements are mainly concerned with the abuse and misuse of
the p-value criterion,
Kim, Review of managerial science (2021)
1/2020 8/’20 10/’20 11/’20 12/’20 4/2021 6/’21
C
o
n
v
e
n
e
d
D
r
a
f
t
s
u
b
m
i
t
t
e
d
t
o
A
S
A
B
o
a
r
d
M
o
d
i
f
i
c
a
t
i
o
n
r
e
q
u
e
s
t
e
d
2
n
d
D
r
a
f
t
N
o
t
a
d
o
p
t
e
d
b
y
A
S
A
B
o
a
r
d
1
s
t
s
u
b
m
i
s
s
i
o
n
f
o
r
p
u
b
l
i
c
a
t
i
o
n
P
u
b
l
i
s
h
e
d
i
n
A
O
A
S
The indispensable p-value
• It’s the first defense line against being fooled by randomness –
needs minimal modeling assumptions
Which can be guaranteed
In a properly designed and executed experiment
• The meaning of p-value is shared across fields of science
(after adjustment)
• In some emerging branches of science it’s the only way to compare
across conditions: GWAS, fMRI, Brain Networks, or make a totally
new discovery (Higgs Bosson)
Goes back to the 17th century Robert Boyle, during his debate with
Huygens over his air pump and the nature of vacuum
The gold standard of science: A discovery should be replicable by others
Replicability with significance
“We may say that a phenomenon is
experimentally demonstrable when we
know how to conduct an experiment
which will rarely fail to give us
statistically significant results.”
Fisher (1935) “The Design of Experiments”.
Inference on a selected subset of the parameters that turned out to
be of interest after viewing the data!
Relevant to all statistical methods
1. Out-of-study selection - not evident in the published work
File drawer problem / publication bias
The garden of forking paths, p-hacking, cherry picking
significance chasing, Data dredging,…
All are widely discussed and
Addressed by Transparent and Reproducible research
Selective inference
In-study selection - evident in the published work:
Selection by the Abstract , Table, Figure
Selection by highlighting those passing a threshold
p<.05, p<.005, p<5*10-8,2 fold
Confidence / Credible Interval not covering 0 or 1
Selection by modeling: AIC, Cp, BIC, LASSO,…
Selection by size of estimated parameter
In complex research problems - in-study selection is unavoidable!
Selective inference
• Giovannucci et al. (‘95) look for relationships between more
than 100 types of food intakes and the risk of prostate cancer
• The abstract reports three (marginal) 95% confidence intervals
(CIs), apparently only for those relative risks whose CIs do not
cover 1.
“Eat Ketchup and Pizza and avoid Prostate Cancer”
22
Selection by the Abstract
“…this association has remained nonsignificant
since 2000 after the addition of 7 studies…”
Rowles et al (‘17)
Schnall et al ’08, presented subjects with 6 moral dilemmas asking
“how wrong each action was”
The research goal: Does priming for cleanliness affect the response?
1 Assessment of wrong-doing & 9 emotions rating per dilemma;
Methods of priming verbal & physical (separate experiments) (m>84)
Results: Only a contrast for disgust, a summary of moral judgment
over all dilemmas, and 3 particular dilemmas priming made a
significant difference. All tests at 0.05;
Their Conclusion: The findings support the idea that moral judgment
is affected by priming for cleanliness.
The replication attempt: failed
Selection by highlighting
Analysis of the 100 in the Reproducibility Project in Psychology:
# inferences per study (4-700, average 72); 11 partially adjusted.
56/88 results were not replicable.
After adjusting for selection using hierarchical FDR method
22 should not have been reported as significant (padj>.05)
21 non-replicable results appropriately screened
1 replicable discovery was lost
Zeevi, Estachenko, Mudrik, YB (‘21+)
The status: Experimental psychology
NEJM editorial July 18, 2019
“….including a requirement to replace P values with estimates
of effects or association and 95% confidence intervals when
neither the protocol nor the statistical analysis plan has
specified methods used to adjust for multiplicity. “
NEJM editorial (describing Manson et al 2018)
“The n−3 fatty acids did not significantly reduce the
rate of either the primary cardiovascular outcome or
the cancer outcome. If reported as independent
findings, the P values for two of the secondary
outcomes would have been less than 0.05;
The Abstract of Manson et al 2018
Wu et al, citing results of Manson et al NEJM 2018
Nature Reviews Cardiology (2019)
Fish oil supplementation …
had no significant effect on the composite primary end point of
CHD, stroke or death from CVD
but reduced the risk of
total CHD* (HR 0.83, 95% CI 0.71–0.97),
percutaneous corona intervention (HR 0.78, 95% CI 0.63–0.95),
total myocardial infarction* (HR 0.72, 95% CI 0.59–0.90),
fatal myocardial infarction (HR 0.50, 95% CI 0.26–0.97).
The only 4 out of 22 that excluded 1. These are no longer 95% CI !
20 parameters to be estimated with 90% CIs
● ● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ●
● ● ●
5 10 15 20
−4
−2
0
2
4
Index
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ●
● ● ●
5 10 15 20
−4
−2
0
2
4
Index
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ●
● ● ●
5 10 15 20
−4
−2
0
2
4
Index
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ●
● ● ●
5 10 15 20
−4
−2
0
2
4
Index
● ● ●
●
● ●
● ● ●
●
● ●
●
●
● ● ●
● ● ●
5 10 15 20
−4
−2
0
2
4
Index
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
●
●
3/20 do not cover
3/4 do not cover
when selected
These so selected 4
will tend to fail,
or shrink back,
when replicated.
FCR CIs have level
(1-.1*4/20)100%
30
Are Bayesian intervals immune from selection’s harms?
Assumed Prior µi~N(0,0.52); yi~N(µi ,1); i=1,2,…,106 (Gelman’s Ex.)
Parameters generated by N(0,.52) 0.999*N(0,.52)+0.001*N(0,.52+32)
Type of 95%
confidence/credence intervals
Marginal Bayesian
Credibility
Marginal
FCR-
adjusted
Bayesian
Credibility
BH-Selected FCR-
adjusted
Intervals not covering
their parameter
5.0% 5.0% 5.1% 2.1%
Intervals not covering 0:
Selected
7.3% 0.01% 0.03% 0.03%
Intervals not covering their
parameter: Out of the Selected
48% 3.4% 1.0% 71.5% 2.1%
Reporting the Pfizer-BioNTech COVID-19 Vaccine trial
Threshold for decision – action –or for selection
(Israel4 Cloud seeding experiment)
and selection is essential in modern science
Likelihood ratio, posterior odds,…, are all practically subject
to selection at a (sometimes) arbitrary threshold
Nature Magaszine
'Scientists rise up
against statistical
significance’
Amrhein, Greenland & McShane (’19)
They object to ‘Statistical Significance’ and eventually
to any bright line
Relying on The Am. Statistician & Hurlbert , Lavine & Utts therein:
Coup de Grâce for a Tough Old Bull:
“Statistically Significant” Expires
They ‘ask’: “how can we address multiple comparisons without a
thrshold?” And answer : “We can’t. And should not try”.
Recommending instead :
“nuanced reporting” & “no need for bright line” as in Reifel et al ‘07
Reifel et al ‘17
Influence of river inflows on plankton distribution
Around the southern perimeter of the Salton Sea,
California
Only results with p ≤ 0.1
Are specifically discussed
in the Abstract
Out of 41 results
Ban the use of Abstracts!
Mouse phenotyping example: opposite single lab results
Kafkafi et al (’17 Nature Methods)
Addressing the relevant variability
GxL interaction is “a fact of life”
Genotype-by-Lab effect for a genotype in a new lab is not known; but
If its variability s2
GxLcan be estimated, use
Mean(MG1) – Mean(MG2) .
(s2
Within (1/n+1/n)+ 2s2
GxL )1/2
We call it GxL- adjustment
It’s the relevant “yardstick” against which genetic differences should
be compared, when concerned with replicability across laboratories.
Mouse phenotyping example: opposite single lab results
Kafkafi et al (’17 Nature Methods)
Addressing the relevant variability
Single-lab analyses in all known replication studies
6
6
7
7
#
Labs
#
Genotypes
#
Phenotypes
Datasets
Characteristics
Giessen
Mannheim
Muenster
Utrecht
Munich
Zurich
6 Laboratories 8 Multi-lab Datasets
2 Genotypes
a b c d
Figure 1|Genotype-by-Laboratory interaction (G×L).Comparing2genotypesacross6laboratories(codedbycolor),usingthreephenotypesoutofdataset1(Supplementary
Table 1). Each line connects genotype means within the same laboratory, so its slope reflects their difference. Dashed/ thin lines denote within-lab non-significance/significance
using the standard t-test. Bold lines denote significance after GxL-adjustment (all at 0.05). a. illustrates significant genotype effect according to the Random Lab Model (RLM),
with similar slopes indicating a small G×L effect. b illustrates more variation of the laboratory lines, yet the genotype effect appears fairly replicable, and is significant according
to the RLM. c exhibits substantial G×L: using the standard single-lab analysis Giessen would have reported DBA/2 significantly larger than C57BL/6, while Mannheim, Muenster
and Munich would have reported the opposite significant discovery. Such“opposite significant”(Supplementary Methods S1.1.3) cases were not rare using the standard
method, but disappeared after GxL-adjustment. d. GxL-adjustment decreases non-replicable discoveries in 8 multi-lab datasets: average single-labType-I error rate
.using the standard t-test is much higher than the prescribed 5%. The GxL-adjustment brings it close to 5%, see Supplementary Table 1
C57BL/6 DBA/2
RLM-NS
C57BL/6 DBA/2
pRLM
≤ 0.01
C57BL/6 DBA/2
pRLM
≤ 0.002
Utilizing large database to get sGxL
Scientists conducting experiments in their lab can get an
estimate of the relevant GxL variability from public Mouse
Phenotyping Database e.g. Jackson Labs JAX, Bar Harbor)
“Replicability Adjuster” Implemented there
In current 3-labs experiment in TAU and Jax the proportion of
non-replicable results reduced from 60% to 12%
YB
Thanks!
www.replicability.tau.ac.il
The industrialization of the scientific process
1888 1999
1950 2010
Yoav Benjamini
Pearson ISI @ IBC 2020

More Related Content

What's hot

The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling Maarten van Smeden
 
Development and evaluation of prediction models: pitfalls and solutions (Part...
Development and evaluation of prediction models: pitfalls and solutions (Part...Development and evaluation of prediction models: pitfalls and solutions (Part...
Development and evaluation of prediction models: pitfalls and solutions (Part...BenVanCalster
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyondMaarten van Smeden
 
Meta analysis
Meta analysisMeta analysis
Meta analysisSethu S
 
Meta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManMeta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManGaurav Kamboj
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 modelsLaure Wynants
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great againBenVanCalster
 
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIStr-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIBenVanCalster
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altmanrgveroniki
 
Statistics in meta analysis
Statistics in meta analysisStatistics in meta analysis
Statistics in meta analysisDr Shri Sangle
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪Beckett Hsieh
 
Sudhakar singh meta analysis
Sudhakar singh meta analysisSudhakar singh meta analysis
Sudhakar singh meta analysisSudhakarSingh66
 
Meta analysis
Meta analysisMeta analysis
Meta analysisJunaidAKG
 
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisday1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisrgveroniki
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis pptSKVA
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm downBenVanCalster
 

What's hot (20)

The basics of prediction modeling
The basics of prediction modeling The basics of prediction modeling
The basics of prediction modeling
 
Development and evaluation of prediction models: pitfalls and solutions (Part...
Development and evaluation of prediction models: pitfalls and solutions (Part...Development and evaluation of prediction models: pitfalls and solutions (Part...
Development and evaluation of prediction models: pitfalls and solutions (Part...
 
Clinical prediction models: development, validation and beyond
Clinical prediction models:development, validation and beyondClinical prediction models:development, validation and beyond
Clinical prediction models: development, validation and beyond
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Meta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevManMeta analysis: Made Easy with Example from RevMan
Meta analysis: Made Easy with Example from RevMan
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 models
 
Make clinical prediction models great again
Make clinical prediction models great againMake clinical prediction models great again
Make clinical prediction models great again
 
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AIStr-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman
 
Statistics in meta analysis
Statistics in meta analysisStatistics in meta analysis
Statistics in meta analysis
 
Seminar in Meta-analysis
Seminar in Meta-analysisSeminar in Meta-analysis
Seminar in Meta-analysis
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪
 
Sudhakar singh meta analysis
Sudhakar singh meta analysisSudhakar singh meta analysis
Sudhakar singh meta analysis
 
Introduction to meta analysis
Introduction to meta analysisIntroduction to meta analysis
Introduction to meta analysis
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisday1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
 
Meta analysis ppt
Meta analysis pptMeta analysis ppt
Meta analysis ppt
 
Machine learning in medicine: calm down
Machine learning in medicine: calm downMachine learning in medicine: calm down
Machine learning in medicine: calm down
 

Similar to The ASA president Task Force Statement on Statistical Significance and Replicability

The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."jemille6
 
25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.pptPriyankaSharma89719
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptxFrancois MAIGNEN
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
COM 301   INFERENTIAL STATISTICS SLIDES.pptCOM 301   INFERENTIAL STATISTICS SLIDES.ppt
COM 301 INFERENTIAL STATISTICS SLIDES.pptdanielayo912
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Ahmed Negida
 
MethodUnderstandingStatisticalSignificanceMatthew .docx
MethodUnderstandingStatisticalSignificanceMatthew .docxMethodUnderstandingStatisticalSignificanceMatthew .docx
MethodUnderstandingStatisticalSignificanceMatthew .docxARIV4
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxMrs S Sen
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test SelectionVaggelis Vergoulas
 
1559221358_week 6-MCQ in EBP-1.pdf
1559221358_week 6-MCQ in EBP-1.pdf1559221358_week 6-MCQ in EBP-1.pdf
1559221358_week 6-MCQ in EBP-1.pdfSajidaCheema1
 
Sample size calculation - Animal experimentation
Sample size calculation - Animal experimentationSample size calculation - Animal experimentation
Sample size calculation - Animal experimentationDr Anudeep Reddy
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...GaryCollins74
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Evangelos Kritsotakis
 
Biostatistics clinical research & trials
Biostatistics clinical research & trialsBiostatistics clinical research & trials
Biostatistics clinical research & trialseclinicaltools
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical testsAkiode Noah
 

Similar to The ASA president Task Force Statement on Statistical Significance and Replicability (20)

The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt25_Anderson_Biostatistics_and_Epidemiology.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt
 
Copenhagen 2008
Copenhagen 2008Copenhagen 2008
Copenhagen 2008
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
BIOSTATISTICS
BIOSTATISTICSBIOSTATISTICS
BIOSTATISTICS
 
RCT to causal inference.pptx
RCT to causal inference.pptxRCT to causal inference.pptx
RCT to causal inference.pptx
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
COM 301   INFERENTIAL STATISTICS SLIDES.pptCOM 301   INFERENTIAL STATISTICS SLIDES.ppt
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2
 
MethodUnderstandingStatisticalSignificanceMatthew .docx
MethodUnderstandingStatisticalSignificanceMatthew .docxMethodUnderstandingStatisticalSignificanceMatthew .docx
MethodUnderstandingStatisticalSignificanceMatthew .docx
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 
Lund 2009
Lund 2009Lund 2009
Lund 2009
 
1559221358_week 6-MCQ in EBP-1.pdf
1559221358_week 6-MCQ in EBP-1.pdf1559221358_week 6-MCQ in EBP-1.pdf
1559221358_week 6-MCQ in EBP-1.pdf
 
Sample size calculation - Animal experimentation
Sample size calculation - Animal experimentationSample size calculation - Animal experimentation
Sample size calculation - Animal experimentation
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
 
Biostatistics clinical research & trials
Biostatistics clinical research & trialsBiostatistics clinical research & trials
Biostatistics clinical research & trials
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical tests
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 

The ASA president Task Force Statement on Statistical Significance and Replicability

  • 1. The ASA president Task Force Statement on Statistical Significance and Replicability Yoav Benjamini Dept. of Statistics & Operations Research School of Math Sciences and Sagol School for Neuroscience Tel Aviv University www. math.tau.ac.il/~ybenja January 11, 2022 Research supported by NIH-BSF grant
  • 2. The Reproducibility and Replicability Concern In Medicine, Soric (‘89) “There is a danger that a large part of science can thus be made untrue” In Genomics, Lander and Kruglyak (‘95) warned “There is danger that a substantial proportion of our claims cannot be replicated” The beginning of the Industrial revolution in science
  • 3. The industrialization of the scientific process 1888 1999 1950 2010
  • 4. Their common argument: select the significant results 20 true effects 180 false effects Power 0.8 a =0.05 On the average : 9 false discoveries 16 true ones False Discovery Proportion 9/(9+16)=0.36
  • 5. 20 true effects 180 false effects Power 0.8 a =0.05 On the average : 9 false discoveries 16 true ones False Discovery Proportion 36/(36+16)=0.68 720 36 Lander& Kruglyak: use Bonferroni 0.05->.05/740 Soric: motivated Hochberg and YB work on FDR and BH Their common argument: select the significant results
  • 6. The Reproducibility and Replicability Crisis Psychological Science Journal: Do not trust any p value. … Routinely report 95% CIs… Basic and Applied Social Psychology (‘15): Bans the Null Hypothesis Statistical Testing…
  • 7. American Stat. Assoc. (ASA) Board’s statement about p-values 1 (‘16) Opens: The p-value “can be useful” Then comes: a list of “do not” ”is not” and “should not” “leads to distortion” – All warnings phrased about the p-value 7 Is it the p-value’s fault? Bayesian vs Frequentists rivalry focusing on p-value,/test/selection
  • 8. It concludes: “In view of the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches. “ It is the p-value’s fault “We’re finally starting to get rid of the p-value tyranny” 8 Bayesian vs Frequentist rivalry focusing on p-value/test/selection
  • 9. What other approaches were mentioned in ASA statement? Confidence intervals Prediction intervals Estimation Likelihood ratios Bayesian methods Bayes factor Credibility intervals 9
  • 10. The American Statistician March 2019 Issue 43 papers by participants An editorial by Ron Wasserstein, Allen Schirm & Nicole Lazar
  • 11. Moving to a world beyond ‘p<.05’ Summary of the 43 papers by Wasseerstein, Schirm & Lazar • Don’t use p<.05 • Don’t say “statistically significant” • or use any bright-line rule There are many Do’s Accept Uncertainty. Be Thoughtful, Open And Modest
  • 12. A Replicability Crisis turned into a Statistical Crisis The American Statistical Association (ASA) has recently issued two statements on the “p-value < 0.05” criterion for statistical significance widely used in many fields of science: see Wasserstein and Lazar (2016) and Wasserstein et al. (2019). The statements are mainly concerned with the abuse and misuse of the p-value criterion, Kim, Review of managerial science (2021)
  • 13. 1/2020 8/’20 10/’20 11/’20 12/’20 4/2021 6/’21 C o n v e n e d D r a f t s u b m i t t e d t o A S A B o a r d M o d i f i c a t i o n r e q u e s t e d 2 n d D r a f t N o t a d o p t e d b y A S A B o a r d 1 s t s u b m i s s i o n f o r p u b l i c a t i o n P u b l i s h e d i n A O A S
  • 14.
  • 15. The indispensable p-value • It’s the first defense line against being fooled by randomness – needs minimal modeling assumptions Which can be guaranteed In a properly designed and executed experiment • The meaning of p-value is shared across fields of science (after adjustment) • In some emerging branches of science it’s the only way to compare across conditions: GWAS, fMRI, Brain Networks, or make a totally new discovery (Higgs Bosson)
  • 16.
  • 17. Goes back to the 17th century Robert Boyle, during his debate with Huygens over his air pump and the nature of vacuum The gold standard of science: A discovery should be replicable by others
  • 18. Replicability with significance “We may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us statistically significant results.” Fisher (1935) “The Design of Experiments”.
  • 19.
  • 20. Inference on a selected subset of the parameters that turned out to be of interest after viewing the data! Relevant to all statistical methods 1. Out-of-study selection - not evident in the published work File drawer problem / publication bias The garden of forking paths, p-hacking, cherry picking significance chasing, Data dredging,… All are widely discussed and Addressed by Transparent and Reproducible research Selective inference
  • 21. In-study selection - evident in the published work: Selection by the Abstract , Table, Figure Selection by highlighting those passing a threshold p<.05, p<.005, p<5*10-8,2 fold Confidence / Credible Interval not covering 0 or 1 Selection by modeling: AIC, Cp, BIC, LASSO,… Selection by size of estimated parameter In complex research problems - in-study selection is unavoidable! Selective inference
  • 22. • Giovannucci et al. (‘95) look for relationships between more than 100 types of food intakes and the risk of prostate cancer • The abstract reports three (marginal) 95% confidence intervals (CIs), apparently only for those relative risks whose CIs do not cover 1. “Eat Ketchup and Pizza and avoid Prostate Cancer” 22 Selection by the Abstract “…this association has remained nonsignificant since 2000 after the addition of 7 studies…” Rowles et al (‘17)
  • 23. Schnall et al ’08, presented subjects with 6 moral dilemmas asking “how wrong each action was” The research goal: Does priming for cleanliness affect the response? 1 Assessment of wrong-doing & 9 emotions rating per dilemma; Methods of priming verbal & physical (separate experiments) (m>84) Results: Only a contrast for disgust, a summary of moral judgment over all dilemmas, and 3 particular dilemmas priming made a significant difference. All tests at 0.05; Their Conclusion: The findings support the idea that moral judgment is affected by priming for cleanliness. The replication attempt: failed Selection by highlighting
  • 24. Analysis of the 100 in the Reproducibility Project in Psychology: # inferences per study (4-700, average 72); 11 partially adjusted. 56/88 results were not replicable. After adjusting for selection using hierarchical FDR method 22 should not have been reported as significant (padj>.05) 21 non-replicable results appropriately screened 1 replicable discovery was lost Zeevi, Estachenko, Mudrik, YB (‘21+) The status: Experimental psychology
  • 25.
  • 26. NEJM editorial July 18, 2019 “….including a requirement to replace P values with estimates of effects or association and 95% confidence intervals when neither the protocol nor the statistical analysis plan has specified methods used to adjust for multiplicity. “
  • 27. NEJM editorial (describing Manson et al 2018) “The n−3 fatty acids did not significantly reduce the rate of either the primary cardiovascular outcome or the cancer outcome. If reported as independent findings, the P values for two of the secondary outcomes would have been less than 0.05;
  • 28. The Abstract of Manson et al 2018
  • 29. Wu et al, citing results of Manson et al NEJM 2018 Nature Reviews Cardiology (2019) Fish oil supplementation … had no significant effect on the composite primary end point of CHD, stroke or death from CVD but reduced the risk of total CHD* (HR 0.83, 95% CI 0.71–0.97), percutaneous corona intervention (HR 0.78, 95% CI 0.63–0.95), total myocardial infarction* (HR 0.72, 95% CI 0.59–0.90), fatal myocardial infarction (HR 0.50, 95% CI 0.26–0.97). The only 4 out of 22 that excluded 1. These are no longer 95% CI !
  • 30. 20 parameters to be estimated with 90% CIs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 −4 −2 0 2 4 Index ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 −4 −2 0 2 4 Index ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 −4 −2 0 2 4 Index ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 −4 −2 0 2 4 Index ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 −4 −2 0 2 4 Index ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3/20 do not cover 3/4 do not cover when selected These so selected 4 will tend to fail, or shrink back, when replicated. FCR CIs have level (1-.1*4/20)100% 30
  • 31. Are Bayesian intervals immune from selection’s harms? Assumed Prior µi~N(0,0.52); yi~N(µi ,1); i=1,2,…,106 (Gelman’s Ex.) Parameters generated by N(0,.52) 0.999*N(0,.52)+0.001*N(0,.52+32) Type of 95% confidence/credence intervals Marginal Bayesian Credibility Marginal FCR- adjusted Bayesian Credibility BH-Selected FCR- adjusted Intervals not covering their parameter 5.0% 5.0% 5.1% 2.1% Intervals not covering 0: Selected 7.3% 0.01% 0.03% 0.03% Intervals not covering their parameter: Out of the Selected 48% 3.4% 1.0% 71.5% 2.1% Reporting the Pfizer-BioNTech COVID-19 Vaccine trial
  • 32. Threshold for decision – action –or for selection (Israel4 Cloud seeding experiment) and selection is essential in modern science Likelihood ratio, posterior odds,…, are all practically subject to selection at a (sometimes) arbitrary threshold
  • 33. Nature Magaszine 'Scientists rise up against statistical significance’ Amrhein, Greenland & McShane (’19)
  • 34. They object to ‘Statistical Significance’ and eventually to any bright line Relying on The Am. Statistician & Hurlbert , Lavine & Utts therein: Coup de Grâce for a Tough Old Bull: “Statistically Significant” Expires They ‘ask’: “how can we address multiple comparisons without a thrshold?” And answer : “We can’t. And should not try”. Recommending instead : “nuanced reporting” & “no need for bright line” as in Reifel et al ‘07
  • 35. Reifel et al ‘17 Influence of river inflows on plankton distribution Around the southern perimeter of the Salton Sea, California Only results with p ≤ 0.1 Are specifically discussed in the Abstract Out of 41 results Ban the use of Abstracts!
  • 36.
  • 37. Mouse phenotyping example: opposite single lab results Kafkafi et al (’17 Nature Methods) Addressing the relevant variability
  • 38. GxL interaction is “a fact of life” Genotype-by-Lab effect for a genotype in a new lab is not known; but If its variability s2 GxLcan be estimated, use Mean(MG1) – Mean(MG2) . (s2 Within (1/n+1/n)+ 2s2 GxL )1/2 We call it GxL- adjustment It’s the relevant “yardstick” against which genetic differences should be compared, when concerned with replicability across laboratories.
  • 39. Mouse phenotyping example: opposite single lab results Kafkafi et al (’17 Nature Methods) Addressing the relevant variability
  • 40. Single-lab analyses in all known replication studies 6 6 7 7 # Labs # Genotypes # Phenotypes Datasets Characteristics Giessen Mannheim Muenster Utrecht Munich Zurich 6 Laboratories 8 Multi-lab Datasets 2 Genotypes a b c d Figure 1|Genotype-by-Laboratory interaction (G×L).Comparing2genotypesacross6laboratories(codedbycolor),usingthreephenotypesoutofdataset1(Supplementary Table 1). Each line connects genotype means within the same laboratory, so its slope reflects their difference. Dashed/ thin lines denote within-lab non-significance/significance using the standard t-test. Bold lines denote significance after GxL-adjustment (all at 0.05). a. illustrates significant genotype effect according to the Random Lab Model (RLM), with similar slopes indicating a small G×L effect. b illustrates more variation of the laboratory lines, yet the genotype effect appears fairly replicable, and is significant according to the RLM. c exhibits substantial G×L: using the standard single-lab analysis Giessen would have reported DBA/2 significantly larger than C57BL/6, while Mannheim, Muenster and Munich would have reported the opposite significant discovery. Such“opposite significant”(Supplementary Methods S1.1.3) cases were not rare using the standard method, but disappeared after GxL-adjustment. d. GxL-adjustment decreases non-replicable discoveries in 8 multi-lab datasets: average single-labType-I error rate .using the standard t-test is much higher than the prescribed 5%. The GxL-adjustment brings it close to 5%, see Supplementary Table 1 C57BL/6 DBA/2 RLM-NS C57BL/6 DBA/2 pRLM ≤ 0.01 C57BL/6 DBA/2 pRLM ≤ 0.002
  • 41. Utilizing large database to get sGxL Scientists conducting experiments in their lab can get an estimate of the relevant GxL variability from public Mouse Phenotyping Database e.g. Jackson Labs JAX, Bar Harbor) “Replicability Adjuster” Implemented there In current 3-labs experiment in TAU and Jax the proportion of non-replicable results reduced from 60% to 12% YB
  • 42.
  • 43. Thanks! www.replicability.tau.ac.il The industrialization of the scientific process 1888 1999 1950 2010 Yoav Benjamini Pearson ISI @ IBC 2020