SlideShare a Scribd company logo
1 of 69
1
LSE Research Seminar PH500
Current Controversies in Phil Stat or
The Statistics Wars (and their casualties)
Deborah G. Mayo
Virginia Tech
The Statistics Wars
2
Role of Probability: performance or
probabilism?
(Frequentist vs. Bayesian)
• End of foundations? (Unifications and
Eclecticism)
• Long-standing battles still simmer below the
surface (agreement on numbers)
3
Let’s brush the dust off the pivotal debates,
walk into the museums to hear the founders:
Fisher, Neyman, Pearson, Savage and
many others in relation to today’s statistical
crisis in science (xi)
4
Stat Sci/Phil Sci
Statistical inference as severe
testing
• Main source of the statistical crisis in science?
• We set sail with a simple tool: if little or nothing
has been done to rule out flaws in inferring a
claim, it has not passed a severe test
• Sufficiently general to apply to any methods
now in use
• Excavation tool: You needn’t accept this philosophy
to use it to get beyond today’s statistical wars and
appraise reforms
5
Statistical reforms
• Several are welcome: preregistration, avoidance of
cookbook statistics, calls for more replication
research
• Others are quite radical, and even violate our minimal
principle of evidence
6
The statistics wars have had serious casualties: self-
defeating “reforms” and unthinking bandwagon effects
7
Chutzpah, No Proselytizing
“You will need to critically evaluate …brilliant
leaders, high priests, maybe even royalty. Are
they asking the most unbiased questions in
examining methods, or are they like admen
touting their brand, dragging out howlers to
make their favorite method look good? (I am not
sparing any of the statistical tribes here.)” (p. 12)
• Statistics wars as proxy wars
8
A philosophical excursion
“I will not be proselytizing for a given statistical
school…they all have shortcomings, insofar as one
can even glean a clear statement of a given
‘school’” (p.12)
Taking the severity principle, and the aim that we
desire to find things out… let’s set sail on a
philosophical excursion to illuminate statistical
inference.”
9
10
Rejected cruise ship
Excursion 1 How to Tell What’s
True About Statistical inference
Tour I: Beyond Probabilism and
Performance
11
Most often used tools are most
criticized
“Several methodologists have pointed out that the
high rate of nonreplication of research discoveries is
a consequence of the convenient, yet ill-founded
strategy of claiming conclusive research findings
solely on the basis of a single study assessed by
formal statistical significance, typically for a p-value
less than 0.05. …” (Ioannidis 2005, 696)
Do researchers do that?
12
R.A. Fisher
“[W]e need, not an isolated record, but a reliable
method of procedure. In relation to the test of
significance, we may say that a phenomenon is
experimentally demonstrable when we know how to
conduct an experiment which will rarely fail to give
us a statistically significant result.” (Fisher 1947, 14)
13
Simple significance tests (Fisher)
p-value. …to test the conformity of the particular data
under analysis with H0 in some respect:
…we find a function d(X) of the data, the test statistic,
such that
• the larger the value of d(X) the more inconsistent
are the data with H0;
• d(X) has a known probability distribution when H0 is
true.
…the p-value corresponding to any d(x) (or d0bs)
p = p(t) = Pr(d(X) ≥ d(x); H0)
(Mayo and Cox 2006, 81; d for t, x for y) 14
Testing Reasoning
• If even larger differences than d0bs occur fairly
frequently under H0 (i.e., P-value is not small),
there’s scarcely evidence of incompatibility
with H0
• Small P-value indicates some underlying
discrepancy from H0 because very probably
you would have seen a less impressive
difference than d0bs were H0 true.
• This still isn’t evidence of a genuine statistical
effect H1, let alone a scientific conclusion H*
Stat-Sub fallacy H => H*
15
16
“It would be a mistake to allow the tail to wag the
dog by being overly influenced by flawed statistical
inferences” (Cook et al. 2019)
In response to suggesting the concept of statistical
significance be dropped (Amhrein et al. 2019)
Don’t let the tail wag the dog
Fallacy of rejection
• H* makes claims that haven’t been probed by the
statistical test
• The moves from experimental interventions to H*
don’t get enough attention–but your statistical
account should block them
17
Neyman-Pearson (N-P) tests:
A null and alternative hypotheses H0, H1
that are exhaustive*
H0: μ ≤ 0 vs. H1: μ > 0
“no effect” vs. “some positive effect”
• So this fallacy of rejection H1H* is blocked
• Rejecting H0 only indicates statistical alternatives
H1 (how discrepant from null)
*(introduces Type 2 error)
18
Casualties from (Pathological)
Fisher-N-P Battles
• Neyman Pearson (N-P) give a rationale for
Fisherian tests
• All is hunky dory until Fisher and Neyman
began fighting (from 1935) largely due to
professional and personality disputes
• A huge philosophical difference is read into
their in-fighting
19
Get beyond the inconsistent hybrid
charge
• Major casualties from the “inconsistent hybrid”:
Fisher–inferential; N-P–long run performance
• Fisherians can’t use power
• N-P testers must adhere to fixed error
probabilities (P <); can’t report P-values (P =)
20
21
Erich Lehmann (Neyman’s student)
“It is good practice to determine … the smallest
significance level…at which the hypothesis would
be rejected for the given observation. ..the P-
value gives an idea of how strongly the data
contradict the hypothesis [and] enables others
to reach a verdict based on the significance
level of their choice.”
(Lehmann and Romano 2005, 63-4)
Error Statistics
• Fisher and N-P both fall under tools for “appraising
and bounding the probabilities (under respective
hypotheses) of seriously misleading interpretations
of data” (Birnbaum 1970, 1033)–error probabilities
• I place all under the rubric of error statistics
• Confidence intervals, N-P and Fisherian tests,
resampling, randomization
22
Both Fisher & N-P: it’s easy to lie
with biasing selection effects
• Sufficient finagling—cherry-picking, significance
seeking, multiple testing, post-data subgroups,
trying and trying again—may practically
guarantee a preferred claim H gets support,
even if it’s unwarranted by evidence
• Violates severity
23
Severity Requirement:
If the test had little or no capability of finding
flaws with H (even if H is incorrect), then
agreement between data x0 and H provides
poor (or no) evidence for H
• Such a test fails a minimal requirement for
a stringent or severe test
• N-P and Fisher did not put it in these terms
but our severe tester does
24
Need to Reformulate Tests
Severity function: SEV(Test T, data x, claim C)
• Tests are reformulated in terms of a discrepancy γ
from H0
• Instead of a binary cut-off (significant or not) the
particular outcome is used to infer discrepancies
that are or are not warranted
25
Requires a third role for probability
Probabilism. To assign a degree of probability,
confirmation, support or belief in a hypothesis, given
data x0 (absolute or comparative)
(e.g., Bayesian, likelihoodist, Fisher (at times))
Performance. Ensure long-run reliability of methods,
coverage probabilities (frequentist, behavioristic
Neyman-Pearson, Fisher (at times))
Only probabilism is thought to be inferential or
evidential
26
What happened to using
probability to assess error probing
capacity?
• Neither “probabilism” nor “performance” directly
captures it
• Good long-run performance is a necessary, not
a sufficient, condition for severity
27
Key to solving a key problem for
frequentists
• Why is good performance relevant for
inference in the case at hand?
• What bothers you with selective reporting,
cherry picking, stopping when the data look
good, P-hacking
• Not problems about long-runs—
28
We cannot say the case at hand has done a
good job of avoiding the sources of
misinterpreting data
A claim C is not warranted _______
• Probabilism: unless C is true or probable (gets
a probability boost, made comparatively firmer)
• Performance: unless it stems from a method
with low long-run error
• Probativism (severe testing) unless something
(a fair amount) has been done to probe ways we
can be wrong about C
30
Rogues Gallery of informal
Examples of BENT
Bad Evidence No Test (BENT): Texas marksman
(Having drawn a bull’s eye around tightly clustered
shots, claims marksman ability) (p. 19)
31
A severe test: My weight
Informal example: To test if I’ve gained weight
between the time I left for London and my return, I
use a series of well-calibrated and stable scales,
both before leaving and upon my return.
All show an over 4 lb gain, none shows a difference
in weighing EGEK, I infer:
H: I’ve gained at least 4 pounds
32
33
• Properties of the scales are akin to the properties of
statistical tests (performance).
• No one claims the justification is merely long run,
and can say nothing about my weight.
• We infer something about the source of the
readings from the high capability to reveal if any
scales were wrong
• “lift-off”: an overall inference is more reliable that the
premises
34
The severe tester assumed to be in a
context of wanting to find things out
• I could insist all the scales are wrong—they work
fine with weighing known objects—but this would
prevent correctly finding out about weight…..
• What sort of extraordinary circumstance could
cause them all to go astray just when we don’t
know the weight of the test object?
• So we have weak and strong severity (p. 23)
Jump to Tour II: Error Probing Tools
vs. Logics of Evidence (p. 30)
To understand the stat wars, start with the holy
grail–a purely formal (syntactical) logic of
evidence
It should be like deductive logic but with
probabilities
Engine behind probabilisms (e.g., Carnapian
confirmation theories Likelihood accounts,
Bayesian posteriors)
35
36
According to modern logical empiricist orthodoxy,
in deciding whether hypothesis h is confirmed by
evidence e, . . . we must consider only the
statements h and e, and the logical relations
[C(h,e)] between them. It is quite irrelevant
whether e was known first and h proposed to
explain it, or whether e resulted from testing
predictions drawn from h”.
(Popperian, Alan Musgrave 1974, p. 2)
Battles about roles of probability
trace to philosophies of inference
Likelihood Principle (LP)
In logics of induction, like probabilist accounts (as
I’m using the term) the import of the data is via the
ratios of likelihoods of hypotheses
Pr(x0;H0)/Pr(x0;H1)
The data x0 are fixed, while the hypotheses vary
A pivotal disagreement in the philosophy of
statistics battles
37
Comparative logic of support
• Ian Hacking (1965) “Law of Likelihood”:
x support hypothesis H0 less well than H1 if,
Pr(x;H0) < Pr(x;H1)
(rejects in 1980)
• Any hypothesis that perfectly fits the data is
maximally likely (even if data-dredged)
• “there always is such a rival hypothesis viz., that
things just had to turn out the way they actually
did” (Barnard 1972, 129). 38
Royall’s trick deck
”According to the [LL], the hypothesis that the
deck consists of 52 aces of diamonds is better
supported than the hypothesis that the deck is
normal [by the factor 52]. (p. 38)
“even if the deck is normal we will always claim to
have found strong evidence that it is not”.
39
Error Probability:
• Pr(H0 is less well supported than H1 ;H0 ) is high
for some H1 or other
“In order to fix a limit between ‘small’ and ‘large’
values of [the likelihood ratio] we must know how
often such values appear when we deal with a
true hypothesis.” (Pearson and Neyman 1967,
106)
40
On the LP, error probabilities
appeal to something irrelevant
“Sampling distributions, significance levels,
power, all depend on something more [than
the likelihood function]–something that is
irrelevant in Bayesian inference–namely the
sample space”
(Lindley 1971, 436)
41
Optional Stopping:
Error probing capacities are altered not just by
data dredging, but also via data dependent
stopping rules:
H0: no effect vs. H1: some effect
Xi ~ N(μ, σ2), 2-sided H0: μ = 0 vs. H1: μ ≠ 0.
Instead of fixing the sample size n in advance, in
some tests, n is determined by a stopping rule:
42
• Keep sampling until H0 is rejected at
(“nominal”) 0.05 level
Keep sampling until sample mean M > 1.96 SE
• Trying and trying again: Having failed to rack up
a 1.96SE difference after 10 trials, go to 20, 30
and so on until obtaining a 1.96 SE difference
SE = s/√n
43
Nominal vs. Actual
significance levels:
• With n fixed the Type 1 error probability is 0.05
• With this stopping rule the actual significance
level differs from, and will be greater than 0.05
(proper stopping rule)
44
Optional Stopping
45
• “if an experimenter uses this [optional stopping]
procedure, then with probability 1 he will
eventually reject any sharp null hypothesis,
even though it be true”
(Edwards, Lindman, and Savage 1963, 239)
• Understandably, they observe, the significance
tester frowns on this, or at least requires
adjustment of the P-values
• “Imagine instead if an account advertised itself as
ignoring stopping rules” (43)
• “[the] irrelevance of stopping rules to statistical
inference restores a simplicity and freedom to
experimental design that had been lost by
classical emphasis on significance levels (in the
sense of Neyman and Pearson).” (Edwards,
Lindman, and Savage 1963, 239)
• “…these same authors who warn that to ignore
stopping rules is to guarantee rejecting the null
hypothesis even if it’s true” (43) declare it
irrelevant.
46
What counts as cheating depends
on statistical philosophy
• Are they contradicting themselves?
• “No. It is just that what looks to be, and
indeed is, cheating from the significance
testing perspective is not cheating from
[their] Bayesian perspective.” (43)
47
21 Word Solution
• Replication researchers (re)discovered that data-
dependent hypotheses and stopping are a major
source of spurious significance levels.
• Statistical critics, Simmons, Nelson, and
Simonsohn (2011) place at the top of their list:
“Authors must decide the rule for terminating data
collection before data collection begins and report
this rule in the articles” (Simmons, Nelson, and
Simonsohn 2011, 1362).
48
49
Berger and Wolpert: LP
• It seems very strange that a frequentist could not
analyze a given set of data…if the stopping rule is not
given….Data should be able to speak for itself.
(Berger and Wolpert 1988, 78)
• “A significance test inference, therefore, depends not
only on the outcome that a trial produced, but also on
the outcomes that it could have produced but did not”
(Howson and Urbach 1993, 212 , SIST p. 49))
• “intentions” should be irrelevant
• “How Stopping Rules Drop Out” (ibid., 45) is derived
50
Probabilists can still block intuitively
unwarranted inferences
(without error probabilities)?
• Supplement with subjective beliefs: What do I
believe? As opposed to What is the evidence?
(Royall)
• Likelihoodists + prior probabilities
51
Richard Royall
1. What do I believe, given x
2. What should I do, given x
3. How should I interpret this observation x as
evidence? (comparing 2 hypotheses)
For #1–degrees of belief, Bayesian posteriors
For #2–frequentist performance
For #3–LL (p. 33)
Don’t confuse evidence and belief (in the case of
the trick deck), p. 38
52
Bayes Rule
Pr(H|x) = Pr(x|H)Pr(H)
Pr(x|H)Pr(H) + Pr(x|~H)Pr(~H)
(p. 24)
• Requires an exhaustive set of hypotheses
53
Problems with appealing to priors to
block inferences based on
selection effects
• Could work in some cases, but still wouldn’t show
what researchers had done wrong—battle of
beliefs
• The believability of data-dredged hypotheses is
what makes them so seductive
• Additional source of flexibility, priors and biasing
selection effects
54
No help with the severe tester’s
key problem
• How to distinguish the warrant for a single
hypothesis H with different methods
(e.g., one has biasing selection effects,
another, pre-registered results and
precautions)?
• Since there’s a single H, its prior would be the
same
55
Casualties: Criticisms of data-
dredgers lose force
If a Bayesian critic turns to giving H0 (no effect) a
high prior probability to mount a criticism, the data-
dredger can deflect this saying “you can always
counter any effect this way”
56
Current State of Play in
Bayesian-Frequentist Wars
1.3 View from a Hot-Air Balloon (p. 23)
How can a discipline, central to science and to critical
thinking, have two methodologies, two logics, two
approaches that frequently give substantively different
answers to the same problems? Is complacency in the
face of contradiction acceptable for a central discipline
of science? (Donald Fraser 2011, p. 329)
57
Bayesian-Frequentist Wars
The Goal of Frequentist Inference: Construct
procedures with frequentist guarantees
good long-run performance
The Goal of Bayesian Inference: Quantify and
manipulate your degrees of beliefs
belief probabilism
Larry Wasserman (p. 24)
But now we have marriages and reconciliations (pp.
25-8) 58
Most Bayesians (last decade) use
“default” priors: unification (pp 25-8)
• ‘Eliciting’ subjective priors too difficult, scientists
reluctant for subjective beliefs to overshadow data
“[V]irtually never would different experts give prior
distributions that even overlapped” (J. Berger 2006,
392)
• Default priors are supposed to prevent prior beliefs
from influencing the posteriors–data dominant
59
Marriages of Convenience
Why?
For subjective Bayesians, to be less
subjective
For frequentists, to have an inferential or
epistemic interpretation of error probabilities
60
How should we interpret them?
• “The priors are not to be considered expressions
of uncertainty, ignorance, or degree of belief.
Conventional priors may not even be
probabilities…” (Cox and Mayo 2010, 299)
• No agreement on rival systems for default/non-
subjective priors
61
Unificationist Bayesians
Contemporary nonsubjective Bayesians concede they
“have to live with some violations of the likelihood and
stopping rule principles” (Ghosh et al.,2006, 148), since
their prior probability distributions are influenced by the
sampling distribution.
Is it because ignoring stopping rules can wreak havoc
with the well-testedness of inferences?
If that is their aim too, then that is very welcome. Stay
tuned (SIST p. 51)
62
Some Bayesians reject probabilism
(Falsificationist Bayesians)
• “[C]rucial parts of Bayesian data analysis, such
as model checking, can be understood as ‘error
probes’ in Mayo’s sense” which might be seen as
using modern statistics to implement the
Popperian criteria of severe tests.
(Andrew Gelman and Cosma Shalizi 2013, 10).
63
Decoupling
Break off stat methods from their traditional
philosophies
Can Bayesian methods find a new foundation in
error statistical ideas? (p. 27)
Excursion 6: (probabilist) foundations lost;
(probative) foundations found
64
Sum-up
To get beyond today’s statistics wars, we need to
understand the jumble of philosophical, statistical,
historical and other debates
These are largely hidden in today’s debates & the
reforms put forward for restoring integrity
There are ago-old: significance test controversy,
Bayes-Frequentist battles
Newer debates: reconciliations (default vs.
subjective vs. empirical) Bayesians
To evaluate the consequences of reforms need to
excavate the jungle
65
• As impossible as it seems, I set out to do this
• I begin with a simple tool, underwritten by today’s
handwringing: the minimal requirement for evidence
(evidence for C only if it has been subjected to and passes a
test it probably would have failed if false)
• Biasing selection effects make it easy to find impressive-
looking effects erroneously
• They alter a method’s error probing capacities
• They may not alter evidence (in traditional probabilisms):
Likelihood Principle
• Members of different tribes talk past each other
• To the LP holder: worry about what could have happened
but didn’t is to consider “imaginary data” and “intentions”
66
• To the severe tester, probabilists are robbed from
a main way to block spurious results
• Constructive role of replication crisis:
Biasing selection effects impinge on error
probabilities
Error probabilities impinge on well-testedness
• It directs the reinterpretation of significance tests
and other methods
• Probabilists may block inferences without appeal
to error probabilities: high prior to H0 (no effect)
can result in a high posterior probability to H0
67
• Gives a life-raft to the P-hacker and cherry picker;
puts blame in the wrong place
• Severe probing (formal or informal) must take place at
every level: from data to statistical hypothesis; from
there to substantive claims
• A silver lining to distinguishing highly probable and
highly probed–can use different methods for different
contexts
• Some Bayesian tribes may find their foundations in
error statistics
• Last excursion: (probabilist) foundations lost;
(probative) foundations found 68
69

More Related Content

What's hot

Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...jemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correctionjemille6
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"jemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13jemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration  as a Tool to Evaluate the Severity of a TestD. Lakens: Preregistration  as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Testjemille6
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)jemille6
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statisticsjemille6
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16jemille6
 
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and BoundariesT. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundariesjemille6
 

What's hot (20)

Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
Fusion Confusion? Comments on Nancy Reid: "BFF Four-Are we Converging?"
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration  as a Tool to Evaluate the Severity of a TestD. Lakens: Preregistration  as a Tool to Evaluate the Severity of a Test
D. Lakens: Preregistration as a Tool to Evaluate the Severity of a Test
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
Mayod@psa 21(na)
Mayod@psa 21(na)Mayod@psa 21(na)
Mayod@psa 21(na)
 
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
Frequentist Statistics as a Theory of Inductive Inference (2/27/14)
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statistics
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and BoundariesT. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
T. Pradeu & M. Lemoine: Philosophy in Science: Definition and Boundaries
 

Similar to D.G. Mayo Slides LSE PH500 Meeting #1

The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualtiesjemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 
Hypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, ManagementHypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, Managementdr m m bagali, phd in hr
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. EPINOR
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.pptABDULRAUF411
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively jemille6
 
Hypothesis 151221131534
Hypothesis 151221131534Hypothesis 151221131534
Hypothesis 151221131534Uvise P M
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 

Similar to D.G. Mayo Slides LSE PH500 Meeting #1 (20)

The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualties
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 
Hypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, ManagementHypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, Management
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
 
Hypothesis 151221131534
Hypothesis 151221131534Hypothesis 151221131534
Hypothesis 151221131534
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 

More from jemille6

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...jemille6
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides jemille6
 

More from jemille6 (20)

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 

D.G. Mayo Slides LSE PH500 Meeting #1

  • 1. 1 LSE Research Seminar PH500 Current Controversies in Phil Stat or The Statistics Wars (and their casualties) Deborah G. Mayo Virginia Tech
  • 3. Role of Probability: performance or probabilism? (Frequentist vs. Bayesian) • End of foundations? (Unifications and Eclecticism) • Long-standing battles still simmer below the surface (agreement on numbers) 3
  • 4. Let’s brush the dust off the pivotal debates, walk into the museums to hear the founders: Fisher, Neyman, Pearson, Savage and many others in relation to today’s statistical crisis in science (xi) 4 Stat Sci/Phil Sci
  • 5. Statistical inference as severe testing • Main source of the statistical crisis in science? • We set sail with a simple tool: if little or nothing has been done to rule out flaws in inferring a claim, it has not passed a severe test • Sufficiently general to apply to any methods now in use • Excavation tool: You needn’t accept this philosophy to use it to get beyond today’s statistical wars and appraise reforms 5
  • 6. Statistical reforms • Several are welcome: preregistration, avoidance of cookbook statistics, calls for more replication research • Others are quite radical, and even violate our minimal principle of evidence 6
  • 7. The statistics wars have had serious casualties: self- defeating “reforms” and unthinking bandwagon effects 7
  • 8. Chutzpah, No Proselytizing “You will need to critically evaluate …brilliant leaders, high priests, maybe even royalty. Are they asking the most unbiased questions in examining methods, or are they like admen touting their brand, dragging out howlers to make their favorite method look good? (I am not sparing any of the statistical tribes here.)” (p. 12) • Statistics wars as proxy wars 8
  • 9. A philosophical excursion “I will not be proselytizing for a given statistical school…they all have shortcomings, insofar as one can even glean a clear statement of a given ‘school’” (p.12) Taking the severity principle, and the aim that we desire to find things out… let’s set sail on a philosophical excursion to illuminate statistical inference.” 9
  • 11. Excursion 1 How to Tell What’s True About Statistical inference Tour I: Beyond Probabilism and Performance 11
  • 12. Most often used tools are most criticized “Several methodologists have pointed out that the high rate of nonreplication of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. …” (Ioannidis 2005, 696) Do researchers do that? 12
  • 13. R.A. Fisher “[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher 1947, 14) 13
  • 14. Simple significance tests (Fisher) p-value. …to test the conformity of the particular data under analysis with H0 in some respect: …we find a function d(X) of the data, the test statistic, such that • the larger the value of d(X) the more inconsistent are the data with H0; • d(X) has a known probability distribution when H0 is true. …the p-value corresponding to any d(x) (or d0bs) p = p(t) = Pr(d(X) ≥ d(x); H0) (Mayo and Cox 2006, 81; d for t, x for y) 14
  • 15. Testing Reasoning • If even larger differences than d0bs occur fairly frequently under H0 (i.e., P-value is not small), there’s scarcely evidence of incompatibility with H0 • Small P-value indicates some underlying discrepancy from H0 because very probably you would have seen a less impressive difference than d0bs were H0 true. • This still isn’t evidence of a genuine statistical effect H1, let alone a scientific conclusion H* Stat-Sub fallacy H => H* 15
  • 16. 16 “It would be a mistake to allow the tail to wag the dog by being overly influenced by flawed statistical inferences” (Cook et al. 2019) In response to suggesting the concept of statistical significance be dropped (Amhrein et al. 2019) Don’t let the tail wag the dog
  • 17. Fallacy of rejection • H* makes claims that haven’t been probed by the statistical test • The moves from experimental interventions to H* don’t get enough attention–but your statistical account should block them 17
  • 18. Neyman-Pearson (N-P) tests: A null and alternative hypotheses H0, H1 that are exhaustive* H0: μ ≤ 0 vs. H1: μ > 0 “no effect” vs. “some positive effect” • So this fallacy of rejection H1H* is blocked • Rejecting H0 only indicates statistical alternatives H1 (how discrepant from null) *(introduces Type 2 error) 18
  • 19. Casualties from (Pathological) Fisher-N-P Battles • Neyman Pearson (N-P) give a rationale for Fisherian tests • All is hunky dory until Fisher and Neyman began fighting (from 1935) largely due to professional and personality disputes • A huge philosophical difference is read into their in-fighting 19
  • 20. Get beyond the inconsistent hybrid charge • Major casualties from the “inconsistent hybrid”: Fisher–inferential; N-P–long run performance • Fisherians can’t use power • N-P testers must adhere to fixed error probabilities (P <); can’t report P-values (P =) 20
  • 21. 21 Erich Lehmann (Neyman’s student) “It is good practice to determine … the smallest significance level…at which the hypothesis would be rejected for the given observation. ..the P- value gives an idea of how strongly the data contradict the hypothesis [and] enables others to reach a verdict based on the significance level of their choice.” (Lehmann and Romano 2005, 63-4)
  • 22. Error Statistics • Fisher and N-P both fall under tools for “appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data” (Birnbaum 1970, 1033)–error probabilities • I place all under the rubric of error statistics • Confidence intervals, N-P and Fisherian tests, resampling, randomization 22
  • 23. Both Fisher & N-P: it’s easy to lie with biasing selection effects • Sufficient finagling—cherry-picking, significance seeking, multiple testing, post-data subgroups, trying and trying again—may practically guarantee a preferred claim H gets support, even if it’s unwarranted by evidence • Violates severity 23
  • 24. Severity Requirement: If the test had little or no capability of finding flaws with H (even if H is incorrect), then agreement between data x0 and H provides poor (or no) evidence for H • Such a test fails a minimal requirement for a stringent or severe test • N-P and Fisher did not put it in these terms but our severe tester does 24
  • 25. Need to Reformulate Tests Severity function: SEV(Test T, data x, claim C) • Tests are reformulated in terms of a discrepancy γ from H0 • Instead of a binary cut-off (significant or not) the particular outcome is used to infer discrepancies that are or are not warranted 25
  • 26. Requires a third role for probability Probabilism. To assign a degree of probability, confirmation, support or belief in a hypothesis, given data x0 (absolute or comparative) (e.g., Bayesian, likelihoodist, Fisher (at times)) Performance. Ensure long-run reliability of methods, coverage probabilities (frequentist, behavioristic Neyman-Pearson, Fisher (at times)) Only probabilism is thought to be inferential or evidential 26
  • 27. What happened to using probability to assess error probing capacity? • Neither “probabilism” nor “performance” directly captures it • Good long-run performance is a necessary, not a sufficient, condition for severity 27
  • 28. Key to solving a key problem for frequentists • Why is good performance relevant for inference in the case at hand? • What bothers you with selective reporting, cherry picking, stopping when the data look good, P-hacking • Not problems about long-runs— 28
  • 29. We cannot say the case at hand has done a good job of avoiding the sources of misinterpreting data
  • 30. A claim C is not warranted _______ • Probabilism: unless C is true or probable (gets a probability boost, made comparatively firmer) • Performance: unless it stems from a method with low long-run error • Probativism (severe testing) unless something (a fair amount) has been done to probe ways we can be wrong about C 30
  • 31. Rogues Gallery of informal Examples of BENT Bad Evidence No Test (BENT): Texas marksman (Having drawn a bull’s eye around tightly clustered shots, claims marksman ability) (p. 19) 31
  • 32. A severe test: My weight Informal example: To test if I’ve gained weight between the time I left for London and my return, I use a series of well-calibrated and stable scales, both before leaving and upon my return. All show an over 4 lb gain, none shows a difference in weighing EGEK, I infer: H: I’ve gained at least 4 pounds 32
  • 33. 33 • Properties of the scales are akin to the properties of statistical tests (performance). • No one claims the justification is merely long run, and can say nothing about my weight. • We infer something about the source of the readings from the high capability to reveal if any scales were wrong • “lift-off”: an overall inference is more reliable that the premises
  • 34. 34 The severe tester assumed to be in a context of wanting to find things out • I could insist all the scales are wrong—they work fine with weighing known objects—but this would prevent correctly finding out about weight….. • What sort of extraordinary circumstance could cause them all to go astray just when we don’t know the weight of the test object? • So we have weak and strong severity (p. 23)
  • 35. Jump to Tour II: Error Probing Tools vs. Logics of Evidence (p. 30) To understand the stat wars, start with the holy grail–a purely formal (syntactical) logic of evidence It should be like deductive logic but with probabilities Engine behind probabilisms (e.g., Carnapian confirmation theories Likelihood accounts, Bayesian posteriors) 35
  • 36. 36 According to modern logical empiricist orthodoxy, in deciding whether hypothesis h is confirmed by evidence e, . . . we must consider only the statements h and e, and the logical relations [C(h,e)] between them. It is quite irrelevant whether e was known first and h proposed to explain it, or whether e resulted from testing predictions drawn from h”. (Popperian, Alan Musgrave 1974, p. 2) Battles about roles of probability trace to philosophies of inference
  • 37. Likelihood Principle (LP) In logics of induction, like probabilist accounts (as I’m using the term) the import of the data is via the ratios of likelihoods of hypotheses Pr(x0;H0)/Pr(x0;H1) The data x0 are fixed, while the hypotheses vary A pivotal disagreement in the philosophy of statistics battles 37
  • 38. Comparative logic of support • Ian Hacking (1965) “Law of Likelihood”: x support hypothesis H0 less well than H1 if, Pr(x;H0) < Pr(x;H1) (rejects in 1980) • Any hypothesis that perfectly fits the data is maximally likely (even if data-dredged) • “there always is such a rival hypothesis viz., that things just had to turn out the way they actually did” (Barnard 1972, 129). 38
  • 39. Royall’s trick deck ”According to the [LL], the hypothesis that the deck consists of 52 aces of diamonds is better supported than the hypothesis that the deck is normal [by the factor 52]. (p. 38) “even if the deck is normal we will always claim to have found strong evidence that it is not”. 39
  • 40. Error Probability: • Pr(H0 is less well supported than H1 ;H0 ) is high for some H1 or other “In order to fix a limit between ‘small’ and ‘large’ values of [the likelihood ratio] we must know how often such values appear when we deal with a true hypothesis.” (Pearson and Neyman 1967, 106) 40
  • 41. On the LP, error probabilities appeal to something irrelevant “Sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space” (Lindley 1971, 436) 41
  • 42. Optional Stopping: Error probing capacities are altered not just by data dredging, but also via data dependent stopping rules: H0: no effect vs. H1: some effect Xi ~ N(μ, σ2), 2-sided H0: μ = 0 vs. H1: μ ≠ 0. Instead of fixing the sample size n in advance, in some tests, n is determined by a stopping rule: 42
  • 43. • Keep sampling until H0 is rejected at (“nominal”) 0.05 level Keep sampling until sample mean M > 1.96 SE • Trying and trying again: Having failed to rack up a 1.96SE difference after 10 trials, go to 20, 30 and so on until obtaining a 1.96 SE difference SE = s/√n 43
  • 44. Nominal vs. Actual significance levels: • With n fixed the Type 1 error probability is 0.05 • With this stopping rule the actual significance level differs from, and will be greater than 0.05 (proper stopping rule) 44
  • 45. Optional Stopping 45 • “if an experimenter uses this [optional stopping] procedure, then with probability 1 he will eventually reject any sharp null hypothesis, even though it be true” (Edwards, Lindman, and Savage 1963, 239) • Understandably, they observe, the significance tester frowns on this, or at least requires adjustment of the P-values
  • 46. • “Imagine instead if an account advertised itself as ignoring stopping rules” (43) • “[the] irrelevance of stopping rules to statistical inference restores a simplicity and freedom to experimental design that had been lost by classical emphasis on significance levels (in the sense of Neyman and Pearson).” (Edwards, Lindman, and Savage 1963, 239) • “…these same authors who warn that to ignore stopping rules is to guarantee rejecting the null hypothesis even if it’s true” (43) declare it irrelevant. 46
  • 47. What counts as cheating depends on statistical philosophy • Are they contradicting themselves? • “No. It is just that what looks to be, and indeed is, cheating from the significance testing perspective is not cheating from [their] Bayesian perspective.” (43) 47
  • 48. 21 Word Solution • Replication researchers (re)discovered that data- dependent hypotheses and stopping are a major source of spurious significance levels. • Statistical critics, Simmons, Nelson, and Simonsohn (2011) place at the top of their list: “Authors must decide the rule for terminating data collection before data collection begins and report this rule in the articles” (Simmons, Nelson, and Simonsohn 2011, 1362). 48
  • 49. 49
  • 50. Berger and Wolpert: LP • It seems very strange that a frequentist could not analyze a given set of data…if the stopping rule is not given….Data should be able to speak for itself. (Berger and Wolpert 1988, 78) • “A significance test inference, therefore, depends not only on the outcome that a trial produced, but also on the outcomes that it could have produced but did not” (Howson and Urbach 1993, 212 , SIST p. 49)) • “intentions” should be irrelevant • “How Stopping Rules Drop Out” (ibid., 45) is derived 50
  • 51. Probabilists can still block intuitively unwarranted inferences (without error probabilities)? • Supplement with subjective beliefs: What do I believe? As opposed to What is the evidence? (Royall) • Likelihoodists + prior probabilities 51
  • 52. Richard Royall 1. What do I believe, given x 2. What should I do, given x 3. How should I interpret this observation x as evidence? (comparing 2 hypotheses) For #1–degrees of belief, Bayesian posteriors For #2–frequentist performance For #3–LL (p. 33) Don’t confuse evidence and belief (in the case of the trick deck), p. 38 52
  • 53. Bayes Rule Pr(H|x) = Pr(x|H)Pr(H) Pr(x|H)Pr(H) + Pr(x|~H)Pr(~H) (p. 24) • Requires an exhaustive set of hypotheses 53
  • 54. Problems with appealing to priors to block inferences based on selection effects • Could work in some cases, but still wouldn’t show what researchers had done wrong—battle of beliefs • The believability of data-dredged hypotheses is what makes them so seductive • Additional source of flexibility, priors and biasing selection effects 54
  • 55. No help with the severe tester’s key problem • How to distinguish the warrant for a single hypothesis H with different methods (e.g., one has biasing selection effects, another, pre-registered results and precautions)? • Since there’s a single H, its prior would be the same 55
  • 56. Casualties: Criticisms of data- dredgers lose force If a Bayesian critic turns to giving H0 (no effect) a high prior probability to mount a criticism, the data- dredger can deflect this saying “you can always counter any effect this way” 56
  • 57. Current State of Play in Bayesian-Frequentist Wars 1.3 View from a Hot-Air Balloon (p. 23) How can a discipline, central to science and to critical thinking, have two methodologies, two logics, two approaches that frequently give substantively different answers to the same problems? Is complacency in the face of contradiction acceptable for a central discipline of science? (Donald Fraser 2011, p. 329) 57
  • 58. Bayesian-Frequentist Wars The Goal of Frequentist Inference: Construct procedures with frequentist guarantees good long-run performance The Goal of Bayesian Inference: Quantify and manipulate your degrees of beliefs belief probabilism Larry Wasserman (p. 24) But now we have marriages and reconciliations (pp. 25-8) 58
  • 59. Most Bayesians (last decade) use “default” priors: unification (pp 25-8) • ‘Eliciting’ subjective priors too difficult, scientists reluctant for subjective beliefs to overshadow data “[V]irtually never would different experts give prior distributions that even overlapped” (J. Berger 2006, 392) • Default priors are supposed to prevent prior beliefs from influencing the posteriors–data dominant 59
  • 60. Marriages of Convenience Why? For subjective Bayesians, to be less subjective For frequentists, to have an inferential or epistemic interpretation of error probabilities 60
  • 61. How should we interpret them? • “The priors are not to be considered expressions of uncertainty, ignorance, or degree of belief. Conventional priors may not even be probabilities…” (Cox and Mayo 2010, 299) • No agreement on rival systems for default/non- subjective priors 61
  • 62. Unificationist Bayesians Contemporary nonsubjective Bayesians concede they “have to live with some violations of the likelihood and stopping rule principles” (Ghosh et al.,2006, 148), since their prior probability distributions are influenced by the sampling distribution. Is it because ignoring stopping rules can wreak havoc with the well-testedness of inferences? If that is their aim too, then that is very welcome. Stay tuned (SIST p. 51) 62
  • 63. Some Bayesians reject probabilism (Falsificationist Bayesians) • “[C]rucial parts of Bayesian data analysis, such as model checking, can be understood as ‘error probes’ in Mayo’s sense” which might be seen as using modern statistics to implement the Popperian criteria of severe tests. (Andrew Gelman and Cosma Shalizi 2013, 10). 63
  • 64. Decoupling Break off stat methods from their traditional philosophies Can Bayesian methods find a new foundation in error statistical ideas? (p. 27) Excursion 6: (probabilist) foundations lost; (probative) foundations found 64
  • 65. Sum-up To get beyond today’s statistics wars, we need to understand the jumble of philosophical, statistical, historical and other debates These are largely hidden in today’s debates & the reforms put forward for restoring integrity There are ago-old: significance test controversy, Bayes-Frequentist battles Newer debates: reconciliations (default vs. subjective vs. empirical) Bayesians To evaluate the consequences of reforms need to excavate the jungle 65
  • 66. • As impossible as it seems, I set out to do this • I begin with a simple tool, underwritten by today’s handwringing: the minimal requirement for evidence (evidence for C only if it has been subjected to and passes a test it probably would have failed if false) • Biasing selection effects make it easy to find impressive- looking effects erroneously • They alter a method’s error probing capacities • They may not alter evidence (in traditional probabilisms): Likelihood Principle • Members of different tribes talk past each other • To the LP holder: worry about what could have happened but didn’t is to consider “imaginary data” and “intentions” 66
  • 67. • To the severe tester, probabilists are robbed from a main way to block spurious results • Constructive role of replication crisis: Biasing selection effects impinge on error probabilities Error probabilities impinge on well-testedness • It directs the reinterpretation of significance tests and other methods • Probabilists may block inferences without appeal to error probabilities: high prior to H0 (no effect) can result in a high posterior probability to H0 67
  • 68. • Gives a life-raft to the P-hacker and cherry picker; puts blame in the wrong place • Severe probing (formal or informal) must take place at every level: from data to statistical hypothesis; from there to substantive claims • A silver lining to distinguishing highly probable and highly probed–can use different methods for different contexts • Some Bayesian tribes may find their foundations in error statistics • Last excursion: (probabilist) foundations lost; (probative) foundations found 68
  • 69. 69

Editor's Notes

  1. They are at once ancient and up to the minute. They reflect disagreements on one of the deepest, oldest, philosophical questions: How do humans learn about the world despite threats of error? At the same time, they are the engine behind current controversies surrounding high-profile failures of replication in the social and biological sciences.
  2. Should probability enter to ensure we won’t reach mistaken interpretations of data too often in the long run of experience? Or to capture degrees of belief about claims? (performance or probabilism) The field has been marked by disagreements between competing tribes of frequentists and Bayesians that have been so contentious that everyone wants to believe we are long past them. We now enjoy unifications and reconciliations between rival schools, it will be said, and practitioners are eclectic, prepared to use whatever method “works.” The truth is, long-standing battles still simmer below the surface of today’ s debates about scientific integrity, irreproducibility Reluctance to reopen wounds from old battles has allowed them to fester. The reconciliations and unifications have been revealed to have serious problems, and there’s little agreement on which to use or how to interpret them. As for eclecticism, it’s often not clear what is even meant by “works.” The presumption that all we need is an agreement on numbers–never mind if they’re measuring different things–leads to statistical schizophrenia.
  3. What’s behind the constant drum beat today that science is in crisis? The problem is that high powered methods can make it easy to uncover impressive-looking findings even if they are false: spurious correlations and other errors have not been severely probed. We set sail with a simple tool: If little or nothing has been done to rule out flaws in inferring a claim, then it has not passed a severe test. In the severe testing view, probability arises in scientific contexts to assess and control how capable methods are at uncovering and avoiding erroneous interpretations of data. That’s what it means to view statistical inference as severe testing. In saying we may view statistical inference as severe testing, I’m not saying statistical inference is always about formal statistical testing. The concept of severe testing is sufficiently general to apply to any of the methods now in use, whether for exploration, estimation, or prediction. Regardless of the type of claim, you don’t have evidence for it if nothing has been done to have found it flawed.
  4. Can likewise ensure that 0 is excluded from the confidence intervals
  5. ; problem with unifications ----- Meeting Notes (11/24/18 20:22) ----- third problem