SlideShare a Scribd company logo
A Bayesian Perspective On
The Reproducibility Project:
Psychology
Alexander Etz & Joachim Vandekerckhove
@alxetz ßMy Twitter (no ‘e’ in alex)
alexanderetz.com ß My website/blog
Purpose
•  Revisit Reproducibility Project: Psychology (RPP)
•  Compute Bayes factors
•  Account for publication bias in original studies
•  Evaluate and compare levels of statistical evidence
TLDR: Conclusions first
•  75% of studies find qualitatively similar levels of evidence
in original and replication
•  64% find weak evidence (BF < 10) in both attempts
•  11% of studies find strong evidence (BF > 10) in both attempts
TLDR: Conclusions first
•  75% of studies find qualitatively similar levels of evidence
in original and replication
•  64% find weak evidence (BF < 10) in both attempts
•  11% of studies find strong evidence (BF > 10) in both attempts
•  10% find strong evidence in replication but not original
•  15% find strong evidence in original but not replication
The RPP
•  270 scientists attempt to closely replicate 100 psychology
studies
•  Use original materials (when possible)
•  Work with original authors
•  Pre-registered to avoid bias
•  Analysis plan specified in advance
•  Guaranteed to be published regardless of outcome
The RPP
•  2 main criteria for grading replication
The RPP
•  2 main criteria for grading replication
•  Is the replication result statistically significant (p < .05) in
the same direction as original?
•  39% success rate
The RPP
•  2 main criteria for grading replication
•  Is the replication result statistically significant (p < .05) in
the same direction as original?
•  39% success rate
•  Does the replication’s confidence interval capture original
reported effect?
•  47% success rate
The RPP
•  Neither of these metrics are any good
•  (at least not as used)
•  Neither make predictions about out-of-sample data
•  Comparing significance levels is bad
•  “The difference between significant and not significant is not
necessarily itself significant”
•  -Gelman & Stern (2006)
The RPP
•  Nevertheless, .51 correlation between original &
replication effect sizes
•  Indicates at least some level of robustness
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
Original Effect Size
ReplicationEffectSize
p−value
Not Significant
Significant
Replication Power
0.6
0.7
0.8
0.9
Source: Open Science Collaboration. (2015).
Estimating the reproducibility of psychological
science. Science, 349(6251), aac4716.
What can explain the discrepancies?
Moderators
•  Two study attempts are in different contexts
•  Texas vs. California
•  Different context = different results?
•  Conservative vs. Liberal sample
Low power in original studies
•  Statistical power:
•  The frequency with which a study will yield a statistically significant
effect in repeated sampling, assuming that the underlying effect is
of a given size.
•  Low powered designs undermine credibility of statistically
significant results
•  Button et al. (2013)
•  Type M / Type S errors (Gelman & Carlin, 2014)
Low power in original studies
•  Replications planned to have minimum 80% power
•  Report average power of 92%
Publication bias
•  Most published results are “positive” findings
•  Statistically significant results
•  Most studies designed to reject H0
•  Most published studies succeed
•  Selective preference = bias
•  “Statistical significance filter”
Statistical significance filter
•  Incentive to have results that reach p < .05
•  “Statistically significant”
•  Evidential standard
•  Studies with large effect size achieve significance
•  Get published
Statistical significance filter
•  Studies with smaller effect size don’t reach significance
•  Get suppressed
•  Average effect size inevitably inflates
•  Replication power calculations meaningless
Can we account for this bias?
•  Consider publication as part of data collection process
•  This enters through likelihood function
•  Data generating process
•  Sampling distribution
Mitigation of publication bias
•  Remember the statistical significance filter
•  We try to build a statistical model of it
Mitigation of publication bias
•  We formally model 4 possible significance filters
•  4 models comprise overall H0
•  4 models comprise overall H1
•  If result consistent with bias, then Bayes factor penalized
•  Raise the evidence bar
Mitigation of publication bias
•  Expected distribution of test statistics that make it to the
literature.
Mitigation of publication bias
•  None of these are probably right
•  (Definitely all wrong)
•  But it is a reasonable start
•  Doesn’t matter really
•  We’re going to mix and mash them all together
•  “Bayesian Model Averaging”
The Bayes Factor
•  How the data shift the balance of evidence
•  Ratio of predictive success of the models
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  H0: Null hypothesis
•  H1: Alternative hypothesis
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  BF10 > 1 means evidence favors H1
•  BF10 < 1 means evidence favors H0
•  Need to be clear what H0 and H1 represent
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  H0: d = 0
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  H0: d = 0
•  H1: d ≠ 0
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  H0: d = 0
•  H1: d ≠ 0 (BAD)
•  Too vague
•  Doesn’t make predictions
BF10 =
p(data | H1)
p(data | H0 )
The Bayes Factor
•  H0: d = 0
•  H1: d ~ Normal(0, 1)
•  The effect is probably small
•  Almost certainly -2 < d < 2”
BF10 =
p(data | H1)
p(data | H0 )
Statistical evidence
•  Do independent study attempts obtain similar amounts of
evidence?
Statistical evidence
•  Do independent study attempts obtain similar amounts of
evidence?
•  Same prior distribution for both attempts
•  Measuring general evidential content
•  We want to evaluate evidence from outsider perspective
Interpreting evidence
•  “How convincing would these data be to a neutral
observer?”
•  1:1 prior odds for H1 vs. H0
•  50% prior probability for each
Interpreting evidence
•  BF > 10 is sufficiently evidential
•  10:1 posterior odds for H1 vs. H0 (or vice versa)
•  91% posterior probability for H1 (or vice versa)
Interpreting evidence
•  BF > 10 is sufficiently evidential
•  10:1 posterior odds for H1 vs. H0 (or vice versa)
•  91% posterior probability for H1 (or vice versa)
•  BF of 3 is too weak
•  3:1 posterior odds for H1 vs. H0 (or vice versa)
•  Only 75% posterior probability for H1 (or vice versa)
Interpreting evidence
•  It depends on context (of course)
•  You can have higher or lower standards of evidence
Interpreting evidence
•  How do p values stack up?
•  American Statistical Association:
•  “Researchers should recognize that a p-value … near 0.05 taken
by itself offers only weak evidence against the null hypothesis. “
Interpreting evidence
•  How do p values stack up?
•  p < .05 is weak standard
•  p = .05 corresponds to BF ≤ 2.5 (at BEST)
•  p = .01 corresponds to BF ≤ 8 (at BEST)
Face-value BFs
•  Standard Bayes factor
•  Bias free
•  Results taken at face-value
Bias-mitigated BFs
•  Bayes factor accounting for possible bias
Illustrative Bayes factors
•  Study 27
•  t(31) = 2.27, p=.03
•  Maximum BF10 = 3.4
Illustrative Bayes factors
•  Study 27
•  t(31) = 2.27, p=.03
•  Maximum BF10 = 3.4
•  Face-value BF10 = 2.9
Illustrative Bayes factors
•  Study 27
•  t(31) = 2.27, p=.03
•  Maximum BF10 = 3.4
•  Face-value BF10 = 2.9
•  Bias-mitigated BF10 = .81
Illustrative Bayes factors
•  Study 71
•  t(373) = 4.4, p < .001
•  Maximum BF10 = ~2300
Illustrative Bayes factors
•  Study 71
•  t(373) = 4.4, p < .001
•  Maximum BF10 = ~2300
•  Face-value BF10 = 947
Illustrative Bayes factors
•  Study 71
•  t(373) = 4.4, p < .001
•  Maximum BF10 = ~2300
•  Face-value BF10 = 947
•  Bias-mitigated BF10 = 142
RPP Sample
•  N=72
•  All univariate tests (t test, anova w/ 1 model df, etc.)
Results
•  Original studies, face-value
•  Ignoring pub bias
•  43% obtain BF10 > 10
•  57% obtain 1/10 < BF10 < 10
•  0 obtain BF10 < 1/10
Results
•  Original studies, bias-corrected
•  26% obtain BF10 > 10
•  74% obtain 1/10 < BF10 < 10
•  0 obtain BF10 < 1/10
Results
•  Replication studies, face value
•  No chance for bias, no need for correction
•  21% obtain BF10 > 10
•  79% obtain 1/10 < BF10 < 10
•  0 obtain BF10 < 1/10
Consistency of results
•  No alarming inconsistencies
•  46 cases where both original and replication show only
weak evidence
•  Only 8 cases where both show BF10 > 10
Consistency of results
•  11 cases where original BF10 > 10, but not replication
•  7 cases where replication BF10 > 10, but not original
•  In every case, the study obtaining strong evidence had
the larger sample size
Moderators?
•  As Laplace would say, we have no need for that
hypothesis
•  Results adequately explained by:
•  Publication bias in original studies
•  Generally weak standards of evidence
Take home message
•  Recalibrate our intuitions about statistical evidence
•  Reevaluate expectations for replications
•  Given weak evidence in original studies
Thank you
Thank you
@alxetz ßMy Twitter (no ‘e’ in alex)
alexanderetz.com ß My website/blog
@VandekerckhoveJ ß Joachim’s Twitter
joachim.cidlab.com ß Joachim’s website
Want to learn more Bayes?
•  My blog:
•  alexanderetz.com/understanding-bayes
•  “How to become a Bayesian in eight easy steps”
•  http://tinyurl.com/eightstepsdraft
•  JASP summer workshop
•  https://jasp-stats.org/
•  Amsterdam, August 22-23, 2016
Technical Appendix
Mitigation of publication bias
•  Model 1: No bias
•  Every study has same chance of publication
•  Regular t distributions
•  H0 true: Central t
•  H0 false: Noncentral t
•  These are used in standard BFs
Mitigation of publication bias
•  Model 2: Extreme bias
•  Only statistically significant results published
•  t distributions but…
•  Zero density in the middle
•  Spikes in significant regions
Mitigation of publication bias
•  Model 3: Constant-bias
•  Nonsignificant results published x% as often as significant results
•  t distributions but…
•  Central regions downweighted
•  Large spikes over significance regions
Mitigation of publication bias
•  Model 4: Exponential bias
•  “Marginally significant” results have a chance to be published
•  Harder as p gets larger
•  t likelihoods but…
•  Spikes over significance regions
•  Quick decay to zero density as
(p – α) increases
Calculate face-value BF
•  Take the likelihood:
•  Integrate with respect to prior distribution, p(δ)
tn (x |δ)
Calculate face-value BF
•  Take the likelihood:
•  Integrate with respect to prior distribution, p(δ)
•  “What is the average likelihood of the data given this model?”
•  Result is marginal likelihood, M
tn (x |δ)
Calculate face-value BF
•  For H1: δ ~ Normal(0 , 1)
•  For H0: δ = 0
M− = tn (x |δ = 0)
M+ = tn (x |δ)
Δ
∫ p(δ)dδ
Calculate face-value BF
•  For H1: δ ~ Normal(0 , 1)
•  For H0: δ = 0
M− = tn (x |δ = 0)
M+ = tn (x |δ)
Δ
∫ p(δ)dδ
BF10=
p(data | H1) = M+
p(data | H0 ) = M−
Calculate mitigated BF
•  Start with regular t likelihood function
•  Multiply it by bias function: w = {1, 2, 3, 4}
•  Where w=1 is no bias, w=2 is extreme bias, etc.
tn (x |δ)
Calculate mitigated BF
•  Start with regular t likelihood function
•  Multiply it by bias function: w = {1, 2, 3, 4}
•  Where w=1 is no bias, w=2 is extreme bias, etc.
•  E.g., when w=3 (constant bias):
tn (x |δ)
tn (x |δ)× w(x |θ)
Calculate mitigated BF
•  Too messy
•  Rewrite as a new function
Calculate mitigated BF
•  Too messy
•  Rewrite as a new function
•  H1: δ ~ Normal(0 , 1):
pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ)
Calculate mitigated BF
•  Too messy
•  Rewrite as a new function
•  H1: δ ~ Normal(0 , 1):
•  H0: δ = 0:
pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ)
pw− (x | n,θ) = pw+ (x | n,δ = 0,θ)
Calculate mitigated BF
•  Integrate w.r.t. p(θ) and p(δ)
•  “What is the average likelihood of the data given each bias model?”
•  p(data | bias model w):
Calculate mitigated BF
•  Integrate w.r.t. p(θ) and p(δ)
•  “What is the average likelihood of the data given each bias model?”
•  p(data | bias model w):
Mw+ =
Δ
∫ pw+ (x | n,δ,θ)
Θ
∫ p(δ)p(θ)dδdθ
Calculate mitigated BF
•  Integrate w.r.t. p(θ) and p(δ)
•  “What is the average likelihood of the data given each bias model?”
•  p(data | bias model w):
Mw+ =
Δ
∫ pw+ (x | n,δ,θ)
Θ
∫ p(δ)p(θ)dδdθ
Mw− = pw− (x | n,θ)
Θ
∫ p(θ)dθ
Calculate mitigated BF
•  Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
Calculate mitigated BF
•  Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
•  For H1:
p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+
Calculate mitigated BF
•  Take Mw+ and Mw- and multiply by the weights of the
corresponding bias model, then sum within each
hypothesis
•  For H1:
•  For H0:
p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+
p(w =1)M1− + p(w = 2)M2− + p(w = 3)M3− + p(w = 4)M4−
Calculate mitigated BF
•  Messy, so we restate as sums:
Calculate mitigated BF
•  Messy, so we restate as sums:
•  For H1:
w
∑ p(w)Mw+
Calculate mitigated BF
•  Messy, so we restate as sums:
•  For H1:
•  For H0:
w
∑ p(w)Mw+
w
∑ p(w)Mw−
Calculate mitigated BF
•  Messy, so we restate as sums:
•  For H1:
BF10 =
•  For H0:
w
∑ p(w)Mw+
w
∑ p(w)Mw−
p(data | H1) =
w
∑ p(w)Mw+
p(data | H0 ) =
w
∑ p(w)Mw−

More Related Content

What's hot

Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
jemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
jemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
jemille6
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
jemille6
 
What is your question
What is your questionWhat is your question
What is your question
StephenSenn2
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
StephenSenn2
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
jemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
jemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
jemille6
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
Dorothy Bishop
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
Stephen Senn
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
Christian Robert
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
Christian Robert
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
StephenSenn2
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
Stephen Senn
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
jemille6
 

What's hot (20)

Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 
What is your question
What is your questionWhat is your question
What is your question
 
Trends towards significance
Trends towards significanceTrends towards significance
Trends towards significance
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 
Yates and cochran
Yates and cochranYates and cochran
Yates and cochran
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 

Similar to Bayes rpp bristol

Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
UC Davis
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
EPINOR
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
Vikash Keshri
 
Topic 2 - More on Hypothesis Testing
Topic 2 - More on Hypothesis TestingTopic 2 - More on Hypothesis Testing
Topic 2 - More on Hypothesis Testing
Ryan Herzog
 
Exploratory
Exploratory Exploratory
Exploratory toby2036
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
Stephen Senn
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
Neny Isharyanti
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
MultivariateDataAnal
 
Using Bayes factors in biobehavioral research
Using Bayes factors in biobehavioral researchUsing Bayes factors in biobehavioral research
Using Bayes factors in biobehavioral research
Daniel Quintana
 
Testing of Hypothesis
Testing of Hypothesis Testing of Hypothesis
Testing of Hypothesis
Chintan Trivedi
 
Sti2018 jws
Sti2018 jwsSti2018 jws
Sti2018 jws
Jesper Schneider
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
Stephen Senn
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
StephenSenn2
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
Pavlos Msaouel, MD, PhD
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
jemille6
 
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
UCLA CTSI
 
11-Statistical-Tests.pptx
11-Statistical-Tests.pptx11-Statistical-Tests.pptx
11-Statistical-Tests.pptx
Shree Shree
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Some nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal dataSome nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal data
Regent University
 

Similar to Bayes rpp bristol (20)

Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 
Topic 2 - More on Hypothesis Testing
Topic 2 - More on Hypothesis TestingTopic 2 - More on Hypothesis Testing
Topic 2 - More on Hypothesis Testing
 
Exploratory
Exploratory Exploratory
Exploratory
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Exploratory Data Analysis
Exploratory Data AnalysisExploratory Data Analysis
Exploratory Data Analysis
 
Using Bayes factors in biobehavioral research
Using Bayes factors in biobehavioral researchUsing Bayes factors in biobehavioral research
Using Bayes factors in biobehavioral research
 
Testing of Hypothesis
Testing of Hypothesis Testing of Hypothesis
Testing of Hypothesis
 
Sti2018 jws
Sti2018 jwsSti2018 jws
Sti2018 jws
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
K-to-R Workshop: How to Structure the "Approach" Section (Part 1)
 
11-Statistical-Tests.pptx
11-Statistical-Tests.pptx11-Statistical-Tests.pptx
11-Statistical-Tests.pptx
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Some nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal dataSome nonparametric statistic for categorical &amp; ordinal data
Some nonparametric statistic for categorical &amp; ordinal data
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 

Bayes rpp bristol

  • 1. A Bayesian Perspective On The Reproducibility Project: Psychology Alexander Etz & Joachim Vandekerckhove @alxetz ßMy Twitter (no ‘e’ in alex) alexanderetz.com ß My website/blog
  • 2. Purpose •  Revisit Reproducibility Project: Psychology (RPP) •  Compute Bayes factors •  Account for publication bias in original studies •  Evaluate and compare levels of statistical evidence
  • 3. TLDR: Conclusions first •  75% of studies find qualitatively similar levels of evidence in original and replication •  64% find weak evidence (BF < 10) in both attempts •  11% of studies find strong evidence (BF > 10) in both attempts
  • 4. TLDR: Conclusions first •  75% of studies find qualitatively similar levels of evidence in original and replication •  64% find weak evidence (BF < 10) in both attempts •  11% of studies find strong evidence (BF > 10) in both attempts •  10% find strong evidence in replication but not original •  15% find strong evidence in original but not replication
  • 5. The RPP •  270 scientists attempt to closely replicate 100 psychology studies •  Use original materials (when possible) •  Work with original authors •  Pre-registered to avoid bias •  Analysis plan specified in advance •  Guaranteed to be published regardless of outcome
  • 6. The RPP •  2 main criteria for grading replication
  • 7. The RPP •  2 main criteria for grading replication •  Is the replication result statistically significant (p < .05) in the same direction as original? •  39% success rate
  • 8. The RPP •  2 main criteria for grading replication •  Is the replication result statistically significant (p < .05) in the same direction as original? •  39% success rate •  Does the replication’s confidence interval capture original reported effect? •  47% success rate
  • 9. The RPP •  Neither of these metrics are any good •  (at least not as used) •  Neither make predictions about out-of-sample data •  Comparing significance levels is bad •  “The difference between significant and not significant is not necessarily itself significant” •  -Gelman & Stern (2006)
  • 10. The RPP •  Nevertheless, .51 correlation between original & replication effect sizes •  Indicates at least some level of robustness
  • 11. −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Original Effect Size ReplicationEffectSize p−value Not Significant Significant Replication Power 0.6 0.7 0.8 0.9 Source: Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
  • 12. What can explain the discrepancies?
  • 13. Moderators •  Two study attempts are in different contexts •  Texas vs. California •  Different context = different results? •  Conservative vs. Liberal sample
  • 14. Low power in original studies •  Statistical power: •  The frequency with which a study will yield a statistically significant effect in repeated sampling, assuming that the underlying effect is of a given size. •  Low powered designs undermine credibility of statistically significant results •  Button et al. (2013) •  Type M / Type S errors (Gelman & Carlin, 2014)
  • 15. Low power in original studies •  Replications planned to have minimum 80% power •  Report average power of 92%
  • 16. Publication bias •  Most published results are “positive” findings •  Statistically significant results •  Most studies designed to reject H0 •  Most published studies succeed •  Selective preference = bias •  “Statistical significance filter”
  • 17. Statistical significance filter •  Incentive to have results that reach p < .05 •  “Statistically significant” •  Evidential standard •  Studies with large effect size achieve significance •  Get published
  • 18. Statistical significance filter •  Studies with smaller effect size don’t reach significance •  Get suppressed •  Average effect size inevitably inflates •  Replication power calculations meaningless
  • 19. Can we account for this bias? •  Consider publication as part of data collection process •  This enters through likelihood function •  Data generating process •  Sampling distribution
  • 20. Mitigation of publication bias •  Remember the statistical significance filter •  We try to build a statistical model of it
  • 21. Mitigation of publication bias •  We formally model 4 possible significance filters •  4 models comprise overall H0 •  4 models comprise overall H1 •  If result consistent with bias, then Bayes factor penalized •  Raise the evidence bar
  • 22. Mitigation of publication bias •  Expected distribution of test statistics that make it to the literature.
  • 23. Mitigation of publication bias •  None of these are probably right •  (Definitely all wrong) •  But it is a reasonable start •  Doesn’t matter really •  We’re going to mix and mash them all together •  “Bayesian Model Averaging”
  • 24. The Bayes Factor •  How the data shift the balance of evidence •  Ratio of predictive success of the models BF10 = p(data | H1) p(data | H0 )
  • 25. The Bayes Factor •  H0: Null hypothesis •  H1: Alternative hypothesis BF10 = p(data | H1) p(data | H0 )
  • 26. The Bayes Factor •  BF10 > 1 means evidence favors H1 •  BF10 < 1 means evidence favors H0 •  Need to be clear what H0 and H1 represent BF10 = p(data | H1) p(data | H0 )
  • 27. The Bayes Factor •  H0: d = 0 BF10 = p(data | H1) p(data | H0 )
  • 28. The Bayes Factor •  H0: d = 0 •  H1: d ≠ 0 BF10 = p(data | H1) p(data | H0 )
  • 29. The Bayes Factor •  H0: d = 0 •  H1: d ≠ 0 (BAD) •  Too vague •  Doesn’t make predictions BF10 = p(data | H1) p(data | H0 )
  • 30. The Bayes Factor •  H0: d = 0 •  H1: d ~ Normal(0, 1) •  The effect is probably small •  Almost certainly -2 < d < 2” BF10 = p(data | H1) p(data | H0 )
  • 31. Statistical evidence •  Do independent study attempts obtain similar amounts of evidence?
  • 32. Statistical evidence •  Do independent study attempts obtain similar amounts of evidence? •  Same prior distribution for both attempts •  Measuring general evidential content •  We want to evaluate evidence from outsider perspective
  • 33. Interpreting evidence •  “How convincing would these data be to a neutral observer?” •  1:1 prior odds for H1 vs. H0 •  50% prior probability for each
  • 34. Interpreting evidence •  BF > 10 is sufficiently evidential •  10:1 posterior odds for H1 vs. H0 (or vice versa) •  91% posterior probability for H1 (or vice versa)
  • 35. Interpreting evidence •  BF > 10 is sufficiently evidential •  10:1 posterior odds for H1 vs. H0 (or vice versa) •  91% posterior probability for H1 (or vice versa) •  BF of 3 is too weak •  3:1 posterior odds for H1 vs. H0 (or vice versa) •  Only 75% posterior probability for H1 (or vice versa)
  • 36. Interpreting evidence •  It depends on context (of course) •  You can have higher or lower standards of evidence
  • 37. Interpreting evidence •  How do p values stack up? •  American Statistical Association: •  “Researchers should recognize that a p-value … near 0.05 taken by itself offers only weak evidence against the null hypothesis. “
  • 38. Interpreting evidence •  How do p values stack up? •  p < .05 is weak standard •  p = .05 corresponds to BF ≤ 2.5 (at BEST) •  p = .01 corresponds to BF ≤ 8 (at BEST)
  • 39. Face-value BFs •  Standard Bayes factor •  Bias free •  Results taken at face-value
  • 40. Bias-mitigated BFs •  Bayes factor accounting for possible bias
  • 41. Illustrative Bayes factors •  Study 27 •  t(31) = 2.27, p=.03 •  Maximum BF10 = 3.4
  • 42. Illustrative Bayes factors •  Study 27 •  t(31) = 2.27, p=.03 •  Maximum BF10 = 3.4 •  Face-value BF10 = 2.9
  • 43. Illustrative Bayes factors •  Study 27 •  t(31) = 2.27, p=.03 •  Maximum BF10 = 3.4 •  Face-value BF10 = 2.9 •  Bias-mitigated BF10 = .81
  • 44. Illustrative Bayes factors •  Study 71 •  t(373) = 4.4, p < .001 •  Maximum BF10 = ~2300
  • 45. Illustrative Bayes factors •  Study 71 •  t(373) = 4.4, p < .001 •  Maximum BF10 = ~2300 •  Face-value BF10 = 947
  • 46. Illustrative Bayes factors •  Study 71 •  t(373) = 4.4, p < .001 •  Maximum BF10 = ~2300 •  Face-value BF10 = 947 •  Bias-mitigated BF10 = 142
  • 47. RPP Sample •  N=72 •  All univariate tests (t test, anova w/ 1 model df, etc.)
  • 48. Results •  Original studies, face-value •  Ignoring pub bias •  43% obtain BF10 > 10 •  57% obtain 1/10 < BF10 < 10 •  0 obtain BF10 < 1/10
  • 49. Results •  Original studies, bias-corrected •  26% obtain BF10 > 10 •  74% obtain 1/10 < BF10 < 10 •  0 obtain BF10 < 1/10
  • 50. Results •  Replication studies, face value •  No chance for bias, no need for correction •  21% obtain BF10 > 10 •  79% obtain 1/10 < BF10 < 10 •  0 obtain BF10 < 1/10
  • 51.
  • 52. Consistency of results •  No alarming inconsistencies •  46 cases where both original and replication show only weak evidence •  Only 8 cases where both show BF10 > 10
  • 53. Consistency of results •  11 cases where original BF10 > 10, but not replication •  7 cases where replication BF10 > 10, but not original •  In every case, the study obtaining strong evidence had the larger sample size
  • 54.
  • 55. Moderators? •  As Laplace would say, we have no need for that hypothesis •  Results adequately explained by: •  Publication bias in original studies •  Generally weak standards of evidence
  • 56. Take home message •  Recalibrate our intuitions about statistical evidence •  Reevaluate expectations for replications •  Given weak evidence in original studies
  • 58. Thank you @alxetz ßMy Twitter (no ‘e’ in alex) alexanderetz.com ß My website/blog @VandekerckhoveJ ß Joachim’s Twitter joachim.cidlab.com ß Joachim’s website
  • 59. Want to learn more Bayes? •  My blog: •  alexanderetz.com/understanding-bayes •  “How to become a Bayesian in eight easy steps” •  http://tinyurl.com/eightstepsdraft •  JASP summer workshop •  https://jasp-stats.org/ •  Amsterdam, August 22-23, 2016
  • 61. Mitigation of publication bias •  Model 1: No bias •  Every study has same chance of publication •  Regular t distributions •  H0 true: Central t •  H0 false: Noncentral t •  These are used in standard BFs
  • 62. Mitigation of publication bias •  Model 2: Extreme bias •  Only statistically significant results published •  t distributions but… •  Zero density in the middle •  Spikes in significant regions
  • 63. Mitigation of publication bias •  Model 3: Constant-bias •  Nonsignificant results published x% as often as significant results •  t distributions but… •  Central regions downweighted •  Large spikes over significance regions
  • 64. Mitigation of publication bias •  Model 4: Exponential bias •  “Marginally significant” results have a chance to be published •  Harder as p gets larger •  t likelihoods but… •  Spikes over significance regions •  Quick decay to zero density as (p – α) increases
  • 65. Calculate face-value BF •  Take the likelihood: •  Integrate with respect to prior distribution, p(δ) tn (x |δ)
  • 66. Calculate face-value BF •  Take the likelihood: •  Integrate with respect to prior distribution, p(δ) •  “What is the average likelihood of the data given this model?” •  Result is marginal likelihood, M tn (x |δ)
  • 67. Calculate face-value BF •  For H1: δ ~ Normal(0 , 1) •  For H0: δ = 0 M− = tn (x |δ = 0) M+ = tn (x |δ) Δ ∫ p(δ)dδ
  • 68. Calculate face-value BF •  For H1: δ ~ Normal(0 , 1) •  For H0: δ = 0 M− = tn (x |δ = 0) M+ = tn (x |δ) Δ ∫ p(δ)dδ BF10= p(data | H1) = M+ p(data | H0 ) = M−
  • 69. Calculate mitigated BF •  Start with regular t likelihood function •  Multiply it by bias function: w = {1, 2, 3, 4} •  Where w=1 is no bias, w=2 is extreme bias, etc. tn (x |δ)
  • 70. Calculate mitigated BF •  Start with regular t likelihood function •  Multiply it by bias function: w = {1, 2, 3, 4} •  Where w=1 is no bias, w=2 is extreme bias, etc. •  E.g., when w=3 (constant bias): tn (x |δ) tn (x |δ)× w(x |θ)
  • 71. Calculate mitigated BF •  Too messy •  Rewrite as a new function
  • 72. Calculate mitigated BF •  Too messy •  Rewrite as a new function •  H1: δ ~ Normal(0 , 1): pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ)
  • 73. Calculate mitigated BF •  Too messy •  Rewrite as a new function •  H1: δ ~ Normal(0 , 1): •  H0: δ = 0: pw+ (x | n,δ,θ)∝tn (x |δ)× w(x |θ) pw− (x | n,θ) = pw+ (x | n,δ = 0,θ)
  • 74. Calculate mitigated BF •  Integrate w.r.t. p(θ) and p(δ) •  “What is the average likelihood of the data given each bias model?” •  p(data | bias model w):
  • 75. Calculate mitigated BF •  Integrate w.r.t. p(θ) and p(δ) •  “What is the average likelihood of the data given each bias model?” •  p(data | bias model w): Mw+ = Δ ∫ pw+ (x | n,δ,θ) Θ ∫ p(δ)p(θ)dδdθ
  • 76. Calculate mitigated BF •  Integrate w.r.t. p(θ) and p(δ) •  “What is the average likelihood of the data given each bias model?” •  p(data | bias model w): Mw+ = Δ ∫ pw+ (x | n,δ,θ) Θ ∫ p(δ)p(θ)dδdθ Mw− = pw− (x | n,θ) Θ ∫ p(θ)dθ
  • 77. Calculate mitigated BF •  Take Mw+ and Mw- and multiply by the weights of the corresponding bias model, then sum within each hypothesis
  • 78. Calculate mitigated BF •  Take Mw+ and Mw- and multiply by the weights of the corresponding bias model, then sum within each hypothesis •  For H1: p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+
  • 79. Calculate mitigated BF •  Take Mw+ and Mw- and multiply by the weights of the corresponding bias model, then sum within each hypothesis •  For H1: •  For H0: p(w =1)M1+ + p(w = 2)M2+ + p(w = 3)M3+ + p(w = 4)M4+ p(w =1)M1− + p(w = 2)M2− + p(w = 3)M3− + p(w = 4)M4−
  • 80. Calculate mitigated BF •  Messy, so we restate as sums:
  • 81. Calculate mitigated BF •  Messy, so we restate as sums: •  For H1: w ∑ p(w)Mw+
  • 82. Calculate mitigated BF •  Messy, so we restate as sums: •  For H1: •  For H0: w ∑ p(w)Mw+ w ∑ p(w)Mw−
  • 83. Calculate mitigated BF •  Messy, so we restate as sums: •  For H1: BF10 = •  For H0: w ∑ p(w)Mw+ w ∑ p(w)Mw− p(data | H1) = w ∑ p(w)Mw+ p(data | H0 ) = w ∑ p(w)Mw−