SlideShare a Scribd company logo
1 of 63
1
PHIL 6334/ ECON 6614: Spring 2019
Current Debates on Statistical Inference and
Modeling
Wed. 4-6:30 Derring 1076
Deborah G. Mayo Aris Spanos
2
What is the Philosophy of
Statistics (PhilStat)?
At one level of analysis, statisticians and
philosophers of science ask many of the same
questions:
What should be observed and what may justifiably
be inferred from the resulting data?
How well do data confirm or fit a model?
What is a good test?
3
Must predictions be “novel” in some sense? (selection
effects, double counting, data mining)
How can spurious relationships be distinguished from
genuine regularities? from causal regularities?
How can we infer more accurate and reliable
observations from less accurate ones?
When does a fitted model account for regularities in
the data?
• These very general questions are entwined with long
standing debates in philosophy of science
• No wonder the field of statistics tends to cross over
so often into philosophical territory.
• Statistics is a kind of “applied philosophy of science”
(Kempthorne, 1976).
4
5
Statistics  Philosophy
3 ways statistical accounts are used in philosophy
of science
(1) Model Scientific Inference—capture the actual or
rational ways to arrive at evidence and inference
(2) Resolve Philosophical Problems about scientific
inference, observation, experiment;
(problem of induction, objectivity, underdetermination).
(3) Perform a Metamethodological Critique
-scrutinize methodological rules (e.g., novel data)
6
Philosophy  Statistics
job is to help resolve the conceptual, logical, and
methodological discomforts of scientists
In tackling the statistics wars one also makes
progress on:
the philosopher's problems of induction, objective
evidence, underdetermination.
“2-way street”
Start from the preface:
1. The Statistics Wars
7
2. Role of Probability: performance
or probabilism?
(Frequentist vs. Bayesian)
• End of foundations? (Unifications and
Eclecticism)
• Long-standing battles still simmer below the
surface (agreement on numbers)
8
Let’s brush the dust off the pivotal debates,
walk into the museums to hear the founders:
Fisher, Neyman, Pearson, Savage and
many others in relation to today’s statistical
crisis in science (xi)
9
Stat Sci/Phil Sci
3. Statistical inference as severe
testing
• Main source of the statistical crisis in science?
• We set sail with a simple tool: if little or nothing
has been done to rule out flaws in inferring a
claim, it has not passed a severe test
• Sufficiently general to apply to any methods
now in use
• You needn’t accept this philosophy to use it to
excavate the statistics wars
10
4. A philosophical excursion
“Taking the severity principle, along with the aim
that we desire to find things out… let’s set sail on a
philosophical excursion to illuminate statistical
inference.” (8)
“…and engage with a host of tribes marked by
family quarrels, peace treaties, and shifting
alliances” (xiv)
11
5. Getting Beyond the Statistics
Wars
Revisit some taboos: problems of induction &
falsification, science vs. pseudoscience
12
Excursion 1 How to Tell What’s
True About Statistical inference
Tour I: Beyond Probabilism and
Performance
13
Most findings are false?
“Several methodologists have pointed out that the
high rate of nonreplication of research discoveries
is a consequence of the convenient, yet ill-
founded strategy of claiming conclusive research
findings solely on the basis of a single study
assessed by formal statistical significance,
typically for a p-value less than 0.05. …
It can be proven that most claimed research
findings are false.” (John Ioannidis 2005, 0696)
14
15
R.A. Fisher
“[W]e need, not an isolated record, but a reliable
method of procedure. In relation to the test of
significance, we may say that a phenomenon is
experimentally demonstrable when we know how to
conduct an experiment which will rarely fail to give
us a statistically significant result.” (Fisher 1947, 14)
16
Simple significance tests (Fisher)
p-value. …to test the conformity of the particular data
under analysis with H0 in some respect:
…we find a function d(X) of the data, the test statistic,
such that
• the larger the value of d(X) the more inconsistent
are the data with H0;
• d(X) has a known probability distribution when H0 is
true.
…the p-value corresponding to any d(x) (or d0bs)
p = p(t) = Pr(d(X) ≥ d(x); H0)
(Mayo and Cox 2006, 81; d for t, x for y) 17
Testing Reasoning
• If even larger differences than d0bs occur fairly
frequently under H0 (i.e., P-value is not small),
there’s scarcely evidence of incompatibility
with H0
• Small P-value indicates some underlying
discrepancy from H0 because very probably
you would have seen a less impressive
difference than d0bs were H0 true.
• This still isn’t evidence of a genuine statistical
effect H1, let alone a scientific conclusion H*
Stat-Sub fallacy H => H*
18
Fallacy of rejection
• H* makes claims that haven’t been probed by the
statistical test
• The moves from experimental interventions to H*
don’t get enough attention–but your statistical
account should block them
19
Neyman-Pearson (N-P) tests:
A null and alternative hypotheses H0, H1
that are exhaustive*
H0: μ ≤ 0 vs. H1: μ > 0
“no effect” vs. “some positive effect”
• So this fallacy of rejection H1H* is blocked
• Rejecting H0 only indicates statistical alternatives
H1 (how discrepant from null)
*(introduces Type 2 error)
20
Not an incompatible hybrid
• They both fall under tools for “appraising and
bounding the probabilities (under respective
hypotheses) of seriously misleading
interpretations of data” (Birnbaum 1970, 1033)–
error probabilities
• Can place all under the rubric of error statistics
• Confidence intervals, N-P and Fisherian tests,
resampling, randomization
• Details of tests do not enter until Excursion 3 21
We must get beyond N-P/ Fisher
incompatibilist caricature
• Beyond “inconsistent hybrid”: Fisher–inferential; N-P–
long run performance
• “Incompatibilism” results in P-value users robbed of
features from N-P tests they need (power)
• Wrongly supposes N-P testers have only fixed error
probabilities & are not allowed to report P-values
22
Both Fisher and N-P methods: it’s
easy to lie with statistics
by selective reporting
• Sufficient finagling—cherry-picking, significance
seeking, multiple testing, post-data subgroups,
trying and trying again—may practically
guarantee a preferred claim H gets support,
even if it’s unwarranted by evidence
23
Severity Requirement:
If the test had little or no capability of finding
flaws with H (even if H is incorrect), then
agreement between data x0 and H provides
poor (or no) evidence for H
• Such a test fails a minimal requirement for
a stringent or severe test
• N-P and Fisher did not put it in these terms
but our severe tester does
24
Need to Reformulate Tests
Severity function: SEV(Test T, data x, claim C)
• Tests are reformulated in terms of a discrepancy γ
from H0
• Instead of a binary cut-off (significant or not) the
particular outcome is used to infer discrepancies
that are or are not warranted
25
This alters the role of probability:
Probabilism. To assign a degree of probability,
confirmation, support or belief in a hypothesis,
given data x0 (absolute or comparative)
(e.g., Bayesian, likelihoodist, Fisher (at times))
Performance. Ensure long-run reliability of
methods, coverage probabilities (frequentist,
behavioristic Neyman-Pearson, Fisher (at times))
26
What happened to using
probability to assess error probing
capacity?
• Neither “probabilism” nor “performance” directly
captures it
• Good long-run performance is a necessary, not
a sufficient, condition for severity
27
Key to revising the role of error
probabilities
• What bothers you with selective reporting,
cherry picking, stopping when the data look
good, P-hacking
• Not problems about long-runs—
28
We cannot say the case at hand has done a
good job of avoiding the sources of
misinterpreting data
A claim C is not warranted _______
• Probabilism: unless C is true or probable (gets
a probability boost, made comparatively firmer)
• Performance: unless it stems from a method
with low long-run error
• Probativism (severe testing) unless something
(a fair amount) has been done to probe ways we
can be wrong about C
30
Spurious P-Value
He reports: Such results would be difficult to
achieve under the assumption of H0
When in fact such results are common under the
assumption of H0
There are many more ways to be wrong with
hunting (different sample space)
(Formally):
He says Pr(P-value ≤ Pobs; H0) ~ Pobs = small
But in fact it is high
31
Informal Examples
Bad Evidence No Test (BENT): Texas marksman
(Having drawn a bull’s eye around tightly clustered
shots, claims marksman ability)
Strong Severity (argument from coincidence to
weight gain):
Lift-off (p. 14)
arguing from error (p. 16)
32
Jump to Tour II: Error Probing Tools
vs. Logics of Evidence (p. 30)
To understand the stat wars, start with the holy
grail–a purely formal (syntactical) logic of
evidence
It should be like deductive logic but with
probabilities
Engine behind probabilisms (e.g., Carnapian
confirmation theories Likelihood accounts,
Bayesian posteriors)
33
Likelihood Principle (LP)
In probabilisms, the import of the data is via the
ratios of likelihoods of hypotheses
Pr(x0;H0)/Pr(x0;H1)
The data x0 are fixed, while the hypotheses vary
A pivotal disagreement in the philosophy of
statistics battles
34
A note on Likelihoods
• To see the difference between the likelihood of
a hypothesis and its probability, any hypothesis
that perfectly fits the data gets maximal
likelihood (1)
• Two rival hypotheses can both have likelihood 1
Lik(H0) = Pr(x0;H0)
35
Logic of Support
• Ian Hacking (1965) “Law of Likelihood”: x
support hypothesis H0 less well than H1 if,
Pr(x;H0) < Pr(x;H1)
(rejects in 1980)
• “there always is such a rival hypothesis viz., that
things just had to turn out the way they actually
did” (Barnard 1972, 129).
36
Error Probability:
• Pr(H0 is less well supported than H1 ;H0 ) is high
for some H1 or other
“In order to fix a limit between ‘small’ and ‘large’
values of [the likelihood ratio] we must know how
often such values appear when we deal with a
true hypothesis.” (Pearson and Neyman 1967,
106)
37
All error probabilities
violate the LP
(even without selection effects):
Sampling distributions, significance levels, power,
all depend on something more [than the likelihood
function]–something that is irrelevant in Bayesian
inference–namely the sample space
(Lindley 1971, 436)
The LP implies…the irrelevance of predesignation,
of whether a hypothesis was thought of before
hand or was introduced to explain known effects
(Rosenkrantz 1977, 122) 38
Optional Stopping:
Error probing capacities are altered not just by data
dredging, but also via data dependent stopping rules:
H0: no effect vs. H1: some effect
Xi ~ N(μ, σ2), 2-sided H0: μ = 0 vs. H1: μ ≠ 0.
Instead of fixing the sample size n in advance, in some
tests, n is determined by a stopping rule:
39
• Keep sampling until H0 is rejected at (“nominal”)
0.05 level
Keep sampling until sample mean M > 1.96 SE
• Trying and trying again: Having failed to rack up a
1.96SE difference after 10 trials, go to 20, 30 and
so on until obtaining a 1.96 SE difference
SE = s/√n
40
Nominal vs. Actual
significance levels again:
• With n fixed the Type 1 error probability is 0.05
• With this stopping rule the actual significance
level differs from, and will be greater than 0.05
(proper stopping rule)
41
Optional Stopping
42
• “if an experimenter uses this [optional stopping]
procedure, then with probability 1 he will
eventually reject any sharp null hypothesis,
even though it be true”
(Edwards, Lindman, and Savage 1963, 239)
• Understandably, they observe, the significance
tester frowns on this, or at least requires
adjustment of the P-values
• “Imagine instead if an account advertised itself as
ignoring stopping rules” (43)
• “[the] irrelevance of stopping rules to statistical
inference restores a simplicity and freedom to
experimental design that had been lost by
classical emphasis on significance levels (in the
sense of Neyman and Pearson).” (Edwards,
Lindman, and Savage 1963, 239)
• “…these same authors who warn that to ignore
stopping rules is to guarantee rejecting the null
hypothesis even if it’s true” (43) declare it
irrelevant.
43
What counts as cheating depends
on statistical philosophy
• Are they contradicting themselves?
• “No. It is just that what looks to be, and
indeed is, cheating from the significance
testing perspective is not cheating from
[their] Bayesian perspective.” (43)
44
21 Word Solution
• Replication researchers (re)discovered that data-
dependent hypotheses and stopping are a major
source of spurious significance levels.
• Statistical critics, Simmons, Nelson, and
Simonsohn (2011) place at the top of their list the
need to block flexible stopping
“Authors must decide the rule for terminating data
collection before data collection begins and report
this rule in the articles” (Simmons, Nelson, and
Simonsohn 2011, 1362).
45
Berger and Wolpert: “Likelihood
Principle”
• It seems very strange that a frequentist could
not analyze a given set of data…if the stopping
rule is not given….Data should be able to speak
for itself. (Berger and Wolpert 1988, 78)
• “intentions” should be irrelevant
• Inference by Bayes Theorem satisfies this:
“How Stopping Rules Drop Out” (ibid., 45) is
derived in 1 page.
46
Jimmy Savage on the LP:
“According to Bayes' theorem,…. if y is the
datum of some other experiment, and if it
happens that P(x|µ) and P(y|µ) are
proportional functions of µ (that is,
constant multiples of each other), then
each of the two data x and y have exactly
the same thing to say about the values of
µ…” (Savage 1962, 17)
47
Probabilists can still block intuitively
unwarranted inferences
(without error probabilities)?
• Supplement with subjective beliefs: What do I
believe? As opposed to What is the evidence?
(Royall)
• Likelihoodists + prior probabilities
48
Richard Royall
1. What do I believe, given x
2. What should I do, given x
3. How should I interpret this observation x as
evidence? (comparing 2 hypotheses)
For #1–degrees of belief, Bayesian posteriors
For #2–frequentist performance
For #3–LL
p. 33 49
Bayesian accounts
• Statistical hypotheses assign probabilities to data
Pr(x0;H1), but it’s rare to assign frequentist
probabilities to hypotheses
Hypotheses
• The deflection of light due to the sun is 1.75 degrees
• Deficiency in vitamin D increases chances of
fractures
• HRT decreases age-related dementia
• IQ is more variable in men than women
You might assign degrees of belief: betting
50
Bayes Rule
Pr(H|x) = Pr(x|H)Pr(H)
Pr(x|H)Pr(H) + Pr(x|~H)Pr(~H)
(p. 24)
• Requires an exhaustive set of
hypotheses
51
Problems with appealing to priors to
block inferences based on
selection effects
• Could work in some cases, but still wouldn’t show
what researchers had done wrong—battle of
beliefs
• The believability of data-dredged hypotheses is
what makes them so seductive
• Additional source of flexibility, priors and biasing
selection effects
52
No help with the severe tester’s
key problem
• How to distinguish the warrant for a single
hypothesis H with different methods
(e.g., one has biasing selection effects,
another, pre-registered results and
precautions)?
• Since there’s a single H, its prior would be the
same
53
Current State of Play in
Bayesian-Frequentist Wars
1.3 View from a Hot-Air Balloon (p. 23)
How can a discipline, central to science and to critical
thinking, have two methodologies, two logics, two
approaches that frequently give substantively different
answers to the same problems? Is complacency in the
face of contradiction acceptable for a central discipline
of science? (Donald Fraser 2011, p. 329)
54
Bayesian-Frequentist Wars
The Goal of Frequentist Inference: Construct
procedures with frequentist guarantees
good long-run performance
The Goal of Bayesian Inference: Quantify and
manipulate your degrees of beliefs
belief probabilism
Larry Wasserman (p. 24)
But now we have marriages and reconciliations (pp.
25-8) 55
Most Bayesians (last decade) use
“default” priors: unification
• ‘Eliciting’ subjective priors too difficult, scientists
reluctant for subjective beliefs to overshadow data
“[V]irtually never would different experts give prior
distributions that even overlapped” (J. Berger 2006,
392)
• Default priors are supposed to prevent prior beliefs
from influencing the posteriors–data dominant
56
Marriages of Convenience
Why?
For subjective Bayesians, to be less
subjective
For frequentists, to have an inferential or
epistemic interpretation of error probabilities
57
How should we interpret them?
• “The priors are not to be considered expressions
of uncertainty, ignorance, or degree of belief.
Conventional priors may not even be
probabilities…” (Cox and Mayo 2010, 299)
• No agreement on rival systems for default/non-
subjective priors
58
Some Bayesians reject probabilism
(Falsificationist Bayesians)
• “[C]rucial parts of Bayesian data analysis, such
as model checking, can be understood as ‘error
probes’ in Mayo’s sense” which might be seen as
using modern statistics to implement the
Popperian criteria of severe tests.
(Andrew Gelman and Cosma Shalizi 2013, 10).
59
Decoupling
Break off stat methods from their traditional
philosophies
Can Bayesian methods find a new foundation in
error statistical ideas? (p. 27)
Excursion 6: (probabilist) foundations lost;
(probative) foundations found
60
Choose Your Stat War: Old or New
• Replication crisis: should replication be
required? What kind?
• Does science need fixing?
• Change “perverse incentives” (publishing, merit
awarding, open science), badges? Less
competition?
• Should P-values be redefined? (lower p-value)
• Does data dredging, selective reporting alter the
import of data (always? Sometimes?)
61
• What is the role of probability in inference (probabilism,
performance, probativism, or something else?)
• Different roles for different contexts? Or a unification?
• Issues in the philosophy and history of statistics,
confirmation, experimentation, replication
• Neyman-Fisher wars: what were they really about?
• Frequentist vs Bayesian wars: then and now?
• What’s missing from current stat wars?
Underlying philosophical disputes
Need to test assumptions of statistical models
Relating statistical to substantive claims, problems of
measurement
62
• Should tests be replaced by confidence
intervals?
• How to interpret non-significant results?
• New Eclecticism: model selection, Bayes
Factors, empirical Bayes posterior predictive
checking, AI, ML
• Is experimental philosophy a better way to do
philosophy?
• What about experimental economics?
• What about the ethics of research & Big Data
Modeling?
63

More Related Content

What's hot

Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13jemille6
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
Spanos lecture 7: An Introduction to Bayesian Inference
Spanos lecture 7: An Introduction to Bayesian Inference Spanos lecture 7: An Introduction to Bayesian Inference
Spanos lecture 7: An Introduction to Bayesian Inference jemille6
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correctionjemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma jemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Warsjemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statisticsjemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...jemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...jemille6
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaperjemille6
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2jemille6
 

What's hot (20)

Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13Phil6334 day#4slidesfeb13
Phil6334 day#4slidesfeb13
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
Spanos lecture 7: An Introduction to Bayesian Inference
Spanos lecture 7: An Introduction to Bayesian Inference Spanos lecture 7: An Introduction to Bayesian Inference
Spanos lecture 7: An Introduction to Bayesian Inference
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
D. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics WarsD. Mayo: Philosophical Interventions in the Statistics Wars
D. Mayo: Philosophical Interventions in the Statistics Wars
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
Philosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of StatisticsPhilosophy of Science and Philosophy of Statistics
Philosophy of Science and Philosophy of Statistics
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
D. Mayo: Putting the brakes on the breakthrough: An informal look at the argu...
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
Mayo: 2nd half “Frequentist Statistics as a Theory of Inductive Inference” (S...
 
Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 

Similar to Statistical Inference: An Excursion into Philosophy and Methodology

The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualtiesjemille6
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500jemille6
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)jemille6
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy jemille6
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper ConceptsExcursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper Conceptsjemille6
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.pptABDULRAUF411
 

Similar to Statistical Inference: An Excursion into Philosophy and Methodology (20)

The Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and CasualtiesThe Statistics Wars: Errors and Casualties
The Statistics Wars: Errors and Casualties
 
D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500D.g. mayo 1st mtg lse ph 500
D.g. mayo 1st mtg lse ph 500
 
Mayod@psa 21(na)
Mayod@psa 21(na)Mayod@psa 21(na)
Mayod@psa 21(na)
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)Mayo minnesota 28 march 2 (1)
Mayo minnesota 28 march 2 (1)
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy D. Mayo: Replication Research Under an Error Statistical Philosophy
D. Mayo: Replication Research Under an Error Statistical Philosophy
 
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”"The Statistical Replication Crisis: Paradoxes and Scapegoats”
"The Statistical Replication Crisis: Paradoxes and Scapegoats”
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper ConceptsExcursion 3 Tour III, Capability and Severity: Deeper Concepts
Excursion 3 Tour III, Capability and Severity: Deeper Concepts
 
Bayesian intro
Bayesian introBayesian intro
Bayesian intro
 
8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt8. Hypothesis Testing.ppt
8. Hypothesis Testing.ppt
 

More from jemille6

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...jemille6
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides jemille6
 

More from jemille6 (20)

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...The ASA president Task Force Statement on Statistical Significance and Replic...
The ASA president Task Force Statement on Statistical Significance and Replic...
 
D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides D. G. Mayo jan 11 slides
D. G. Mayo jan 11 slides
 

Recently uploaded

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Statistical Inference: An Excursion into Philosophy and Methodology

  • 1. 1 PHIL 6334/ ECON 6614: Spring 2019 Current Debates on Statistical Inference and Modeling Wed. 4-6:30 Derring 1076 Deborah G. Mayo Aris Spanos
  • 2. 2 What is the Philosophy of Statistics (PhilStat)? At one level of analysis, statisticians and philosophers of science ask many of the same questions: What should be observed and what may justifiably be inferred from the resulting data? How well do data confirm or fit a model? What is a good test?
  • 3. 3 Must predictions be “novel” in some sense? (selection effects, double counting, data mining) How can spurious relationships be distinguished from genuine regularities? from causal regularities? How can we infer more accurate and reliable observations from less accurate ones? When does a fitted model account for regularities in the data?
  • 4. • These very general questions are entwined with long standing debates in philosophy of science • No wonder the field of statistics tends to cross over so often into philosophical territory. • Statistics is a kind of “applied philosophy of science” (Kempthorne, 1976). 4
  • 5. 5 Statistics  Philosophy 3 ways statistical accounts are used in philosophy of science (1) Model Scientific Inference—capture the actual or rational ways to arrive at evidence and inference (2) Resolve Philosophical Problems about scientific inference, observation, experiment; (problem of induction, objectivity, underdetermination). (3) Perform a Metamethodological Critique -scrutinize methodological rules (e.g., novel data)
  • 6. 6 Philosophy  Statistics job is to help resolve the conceptual, logical, and methodological discomforts of scientists In tackling the statistics wars one also makes progress on: the philosopher's problems of induction, objective evidence, underdetermination. “2-way street”
  • 7. Start from the preface: 1. The Statistics Wars 7
  • 8. 2. Role of Probability: performance or probabilism? (Frequentist vs. Bayesian) • End of foundations? (Unifications and Eclecticism) • Long-standing battles still simmer below the surface (agreement on numbers) 8
  • 9. Let’s brush the dust off the pivotal debates, walk into the museums to hear the founders: Fisher, Neyman, Pearson, Savage and many others in relation to today’s statistical crisis in science (xi) 9 Stat Sci/Phil Sci
  • 10. 3. Statistical inference as severe testing • Main source of the statistical crisis in science? • We set sail with a simple tool: if little or nothing has been done to rule out flaws in inferring a claim, it has not passed a severe test • Sufficiently general to apply to any methods now in use • You needn’t accept this philosophy to use it to excavate the statistics wars 10
  • 11. 4. A philosophical excursion “Taking the severity principle, along with the aim that we desire to find things out… let’s set sail on a philosophical excursion to illuminate statistical inference.” (8) “…and engage with a host of tribes marked by family quarrels, peace treaties, and shifting alliances” (xiv) 11
  • 12. 5. Getting Beyond the Statistics Wars Revisit some taboos: problems of induction & falsification, science vs. pseudoscience 12
  • 13. Excursion 1 How to Tell What’s True About Statistical inference Tour I: Beyond Probabilism and Performance 13
  • 14. Most findings are false? “Several methodologists have pointed out that the high rate of nonreplication of research discoveries is a consequence of the convenient, yet ill- founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. … It can be proven that most claimed research findings are false.” (John Ioannidis 2005, 0696) 14
  • 15. 15
  • 16. R.A. Fisher “[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher 1947, 14) 16
  • 17. Simple significance tests (Fisher) p-value. …to test the conformity of the particular data under analysis with H0 in some respect: …we find a function d(X) of the data, the test statistic, such that • the larger the value of d(X) the more inconsistent are the data with H0; • d(X) has a known probability distribution when H0 is true. …the p-value corresponding to any d(x) (or d0bs) p = p(t) = Pr(d(X) ≥ d(x); H0) (Mayo and Cox 2006, 81; d for t, x for y) 17
  • 18. Testing Reasoning • If even larger differences than d0bs occur fairly frequently under H0 (i.e., P-value is not small), there’s scarcely evidence of incompatibility with H0 • Small P-value indicates some underlying discrepancy from H0 because very probably you would have seen a less impressive difference than d0bs were H0 true. • This still isn’t evidence of a genuine statistical effect H1, let alone a scientific conclusion H* Stat-Sub fallacy H => H* 18
  • 19. Fallacy of rejection • H* makes claims that haven’t been probed by the statistical test • The moves from experimental interventions to H* don’t get enough attention–but your statistical account should block them 19
  • 20. Neyman-Pearson (N-P) tests: A null and alternative hypotheses H0, H1 that are exhaustive* H0: μ ≤ 0 vs. H1: μ > 0 “no effect” vs. “some positive effect” • So this fallacy of rejection H1H* is blocked • Rejecting H0 only indicates statistical alternatives H1 (how discrepant from null) *(introduces Type 2 error) 20
  • 21. Not an incompatible hybrid • They both fall under tools for “appraising and bounding the probabilities (under respective hypotheses) of seriously misleading interpretations of data” (Birnbaum 1970, 1033)– error probabilities • Can place all under the rubric of error statistics • Confidence intervals, N-P and Fisherian tests, resampling, randomization • Details of tests do not enter until Excursion 3 21
  • 22. We must get beyond N-P/ Fisher incompatibilist caricature • Beyond “inconsistent hybrid”: Fisher–inferential; N-P– long run performance • “Incompatibilism” results in P-value users robbed of features from N-P tests they need (power) • Wrongly supposes N-P testers have only fixed error probabilities & are not allowed to report P-values 22
  • 23. Both Fisher and N-P methods: it’s easy to lie with statistics by selective reporting • Sufficient finagling—cherry-picking, significance seeking, multiple testing, post-data subgroups, trying and trying again—may practically guarantee a preferred claim H gets support, even if it’s unwarranted by evidence 23
  • 24. Severity Requirement: If the test had little or no capability of finding flaws with H (even if H is incorrect), then agreement between data x0 and H provides poor (or no) evidence for H • Such a test fails a minimal requirement for a stringent or severe test • N-P and Fisher did not put it in these terms but our severe tester does 24
  • 25. Need to Reformulate Tests Severity function: SEV(Test T, data x, claim C) • Tests are reformulated in terms of a discrepancy γ from H0 • Instead of a binary cut-off (significant or not) the particular outcome is used to infer discrepancies that are or are not warranted 25
  • 26. This alters the role of probability: Probabilism. To assign a degree of probability, confirmation, support or belief in a hypothesis, given data x0 (absolute or comparative) (e.g., Bayesian, likelihoodist, Fisher (at times)) Performance. Ensure long-run reliability of methods, coverage probabilities (frequentist, behavioristic Neyman-Pearson, Fisher (at times)) 26
  • 27. What happened to using probability to assess error probing capacity? • Neither “probabilism” nor “performance” directly captures it • Good long-run performance is a necessary, not a sufficient, condition for severity 27
  • 28. Key to revising the role of error probabilities • What bothers you with selective reporting, cherry picking, stopping when the data look good, P-hacking • Not problems about long-runs— 28
  • 29. We cannot say the case at hand has done a good job of avoiding the sources of misinterpreting data
  • 30. A claim C is not warranted _______ • Probabilism: unless C is true or probable (gets a probability boost, made comparatively firmer) • Performance: unless it stems from a method with low long-run error • Probativism (severe testing) unless something (a fair amount) has been done to probe ways we can be wrong about C 30
  • 31. Spurious P-Value He reports: Such results would be difficult to achieve under the assumption of H0 When in fact such results are common under the assumption of H0 There are many more ways to be wrong with hunting (different sample space) (Formally): He says Pr(P-value ≤ Pobs; H0) ~ Pobs = small But in fact it is high 31
  • 32. Informal Examples Bad Evidence No Test (BENT): Texas marksman (Having drawn a bull’s eye around tightly clustered shots, claims marksman ability) Strong Severity (argument from coincidence to weight gain): Lift-off (p. 14) arguing from error (p. 16) 32
  • 33. Jump to Tour II: Error Probing Tools vs. Logics of Evidence (p. 30) To understand the stat wars, start with the holy grail–a purely formal (syntactical) logic of evidence It should be like deductive logic but with probabilities Engine behind probabilisms (e.g., Carnapian confirmation theories Likelihood accounts, Bayesian posteriors) 33
  • 34. Likelihood Principle (LP) In probabilisms, the import of the data is via the ratios of likelihoods of hypotheses Pr(x0;H0)/Pr(x0;H1) The data x0 are fixed, while the hypotheses vary A pivotal disagreement in the philosophy of statistics battles 34
  • 35. A note on Likelihoods • To see the difference between the likelihood of a hypothesis and its probability, any hypothesis that perfectly fits the data gets maximal likelihood (1) • Two rival hypotheses can both have likelihood 1 Lik(H0) = Pr(x0;H0) 35
  • 36. Logic of Support • Ian Hacking (1965) “Law of Likelihood”: x support hypothesis H0 less well than H1 if, Pr(x;H0) < Pr(x;H1) (rejects in 1980) • “there always is such a rival hypothesis viz., that things just had to turn out the way they actually did” (Barnard 1972, 129). 36
  • 37. Error Probability: • Pr(H0 is less well supported than H1 ;H0 ) is high for some H1 or other “In order to fix a limit between ‘small’ and ‘large’ values of [the likelihood ratio] we must know how often such values appear when we deal with a true hypothesis.” (Pearson and Neyman 1967, 106) 37
  • 38. All error probabilities violate the LP (even without selection effects): Sampling distributions, significance levels, power, all depend on something more [than the likelihood function]–something that is irrelevant in Bayesian inference–namely the sample space (Lindley 1971, 436) The LP implies…the irrelevance of predesignation, of whether a hypothesis was thought of before hand or was introduced to explain known effects (Rosenkrantz 1977, 122) 38
  • 39. Optional Stopping: Error probing capacities are altered not just by data dredging, but also via data dependent stopping rules: H0: no effect vs. H1: some effect Xi ~ N(μ, σ2), 2-sided H0: μ = 0 vs. H1: μ ≠ 0. Instead of fixing the sample size n in advance, in some tests, n is determined by a stopping rule: 39
  • 40. • Keep sampling until H0 is rejected at (“nominal”) 0.05 level Keep sampling until sample mean M > 1.96 SE • Trying and trying again: Having failed to rack up a 1.96SE difference after 10 trials, go to 20, 30 and so on until obtaining a 1.96 SE difference SE = s/√n 40
  • 41. Nominal vs. Actual significance levels again: • With n fixed the Type 1 error probability is 0.05 • With this stopping rule the actual significance level differs from, and will be greater than 0.05 (proper stopping rule) 41
  • 42. Optional Stopping 42 • “if an experimenter uses this [optional stopping] procedure, then with probability 1 he will eventually reject any sharp null hypothesis, even though it be true” (Edwards, Lindman, and Savage 1963, 239) • Understandably, they observe, the significance tester frowns on this, or at least requires adjustment of the P-values
  • 43. • “Imagine instead if an account advertised itself as ignoring stopping rules” (43) • “[the] irrelevance of stopping rules to statistical inference restores a simplicity and freedom to experimental design that had been lost by classical emphasis on significance levels (in the sense of Neyman and Pearson).” (Edwards, Lindman, and Savage 1963, 239) • “…these same authors who warn that to ignore stopping rules is to guarantee rejecting the null hypothesis even if it’s true” (43) declare it irrelevant. 43
  • 44. What counts as cheating depends on statistical philosophy • Are they contradicting themselves? • “No. It is just that what looks to be, and indeed is, cheating from the significance testing perspective is not cheating from [their] Bayesian perspective.” (43) 44
  • 45. 21 Word Solution • Replication researchers (re)discovered that data- dependent hypotheses and stopping are a major source of spurious significance levels. • Statistical critics, Simmons, Nelson, and Simonsohn (2011) place at the top of their list the need to block flexible stopping “Authors must decide the rule for terminating data collection before data collection begins and report this rule in the articles” (Simmons, Nelson, and Simonsohn 2011, 1362). 45
  • 46. Berger and Wolpert: “Likelihood Principle” • It seems very strange that a frequentist could not analyze a given set of data…if the stopping rule is not given….Data should be able to speak for itself. (Berger and Wolpert 1988, 78) • “intentions” should be irrelevant • Inference by Bayes Theorem satisfies this: “How Stopping Rules Drop Out” (ibid., 45) is derived in 1 page. 46
  • 47. Jimmy Savage on the LP: “According to Bayes' theorem,…. if y is the datum of some other experiment, and if it happens that P(x|µ) and P(y|µ) are proportional functions of µ (that is, constant multiples of each other), then each of the two data x and y have exactly the same thing to say about the values of µ…” (Savage 1962, 17) 47
  • 48. Probabilists can still block intuitively unwarranted inferences (without error probabilities)? • Supplement with subjective beliefs: What do I believe? As opposed to What is the evidence? (Royall) • Likelihoodists + prior probabilities 48
  • 49. Richard Royall 1. What do I believe, given x 2. What should I do, given x 3. How should I interpret this observation x as evidence? (comparing 2 hypotheses) For #1–degrees of belief, Bayesian posteriors For #2–frequentist performance For #3–LL p. 33 49
  • 50. Bayesian accounts • Statistical hypotheses assign probabilities to data Pr(x0;H1), but it’s rare to assign frequentist probabilities to hypotheses Hypotheses • The deflection of light due to the sun is 1.75 degrees • Deficiency in vitamin D increases chances of fractures • HRT decreases age-related dementia • IQ is more variable in men than women You might assign degrees of belief: betting 50
  • 51. Bayes Rule Pr(H|x) = Pr(x|H)Pr(H) Pr(x|H)Pr(H) + Pr(x|~H)Pr(~H) (p. 24) • Requires an exhaustive set of hypotheses 51
  • 52. Problems with appealing to priors to block inferences based on selection effects • Could work in some cases, but still wouldn’t show what researchers had done wrong—battle of beliefs • The believability of data-dredged hypotheses is what makes them so seductive • Additional source of flexibility, priors and biasing selection effects 52
  • 53. No help with the severe tester’s key problem • How to distinguish the warrant for a single hypothesis H with different methods (e.g., one has biasing selection effects, another, pre-registered results and precautions)? • Since there’s a single H, its prior would be the same 53
  • 54. Current State of Play in Bayesian-Frequentist Wars 1.3 View from a Hot-Air Balloon (p. 23) How can a discipline, central to science and to critical thinking, have two methodologies, two logics, two approaches that frequently give substantively different answers to the same problems? Is complacency in the face of contradiction acceptable for a central discipline of science? (Donald Fraser 2011, p. 329) 54
  • 55. Bayesian-Frequentist Wars The Goal of Frequentist Inference: Construct procedures with frequentist guarantees good long-run performance The Goal of Bayesian Inference: Quantify and manipulate your degrees of beliefs belief probabilism Larry Wasserman (p. 24) But now we have marriages and reconciliations (pp. 25-8) 55
  • 56. Most Bayesians (last decade) use “default” priors: unification • ‘Eliciting’ subjective priors too difficult, scientists reluctant for subjective beliefs to overshadow data “[V]irtually never would different experts give prior distributions that even overlapped” (J. Berger 2006, 392) • Default priors are supposed to prevent prior beliefs from influencing the posteriors–data dominant 56
  • 57. Marriages of Convenience Why? For subjective Bayesians, to be less subjective For frequentists, to have an inferential or epistemic interpretation of error probabilities 57
  • 58. How should we interpret them? • “The priors are not to be considered expressions of uncertainty, ignorance, or degree of belief. Conventional priors may not even be probabilities…” (Cox and Mayo 2010, 299) • No agreement on rival systems for default/non- subjective priors 58
  • 59. Some Bayesians reject probabilism (Falsificationist Bayesians) • “[C]rucial parts of Bayesian data analysis, such as model checking, can be understood as ‘error probes’ in Mayo’s sense” which might be seen as using modern statistics to implement the Popperian criteria of severe tests. (Andrew Gelman and Cosma Shalizi 2013, 10). 59
  • 60. Decoupling Break off stat methods from their traditional philosophies Can Bayesian methods find a new foundation in error statistical ideas? (p. 27) Excursion 6: (probabilist) foundations lost; (probative) foundations found 60
  • 61. Choose Your Stat War: Old or New • Replication crisis: should replication be required? What kind? • Does science need fixing? • Change “perverse incentives” (publishing, merit awarding, open science), badges? Less competition? • Should P-values be redefined? (lower p-value) • Does data dredging, selective reporting alter the import of data (always? Sometimes?) 61
  • 62. • What is the role of probability in inference (probabilism, performance, probativism, or something else?) • Different roles for different contexts? Or a unification? • Issues in the philosophy and history of statistics, confirmation, experimentation, replication • Neyman-Fisher wars: what were they really about? • Frequentist vs Bayesian wars: then and now? • What’s missing from current stat wars? Underlying philosophical disputes Need to test assumptions of statistical models Relating statistical to substantive claims, problems of measurement 62
  • 63. • Should tests be replaced by confidence intervals? • How to interpret non-significant results? • New Eclecticism: model selection, Bayes Factors, empirical Bayes posterior predictive checking, AI, ML • Is experimental philosophy a better way to do philosophy? • What about experimental economics? • What about the ethics of research & Big Data Modeling? 63

Editor's Notes

  1. They are at once ancient and up to the minute. They reflect disagreements on one of the deepest, oldest, philosophical questions: How do humans learn about the world despite threats of error? At the same time, they are the engine behind current controversies surrounding high-profile failures of replication in the social and biological sciences.
  2. Should probability enter to ensure we won’t reach mistaken interpretations of data too often in the long run of experience? Or to capture degrees of belief about claims? (performance or probabilism) The field has been marked by disagreements between competing tribes of frequentists and Bayesians that have been so contentious that everyone wants to believe we are long past them. We now enjoy unifications and reconciliations between rival schools, it will be said, and practitioners are eclectic, prepared to use whatever method “works.” The truth is, long-standing battles still simmer below the surface of today’ s debates about scientific integrity, irreproducibility Reluctance to reopen wounds from old battles has allowed them to fester. The reconciliations and unifications have been revealed to have serious problems, and there’s little agreement on which to use or how to interpret them. As for eclecticism, it’s often not clear what is even meant by “works.” The presumption that all we need is an agreement on numbers–never mind if they’re measuring different things–leads to statistical schizophrenia.
  3. What’s behind the constant drum beat today that science is in crisis? The problem is that high powered methods can make it easy to uncover impressive-looking findings even if they are false: spurious correlations and other errors have not been severely probed. We set sail with a simple tool: If little or nothing has been done to rule out flaws in inferring a claim, then it has not passed a severe test. In the severe testing view, probability arises in scientific contexts to assess and control how capable methods are at uncovering and avoiding erroneous interpretations of data. That’s what it means to view statistical inference as severe testing. In saying we may view statistical inference as severe testing, I’m not saying statistical inference is always about formal statistical testing. The concept of severe testing is sufficiently general to apply to any of the methods now in use, whether for exploration, estimation, or prediction. Regardless of the type of claim, you don’t have evidence for it if nothing has been done to have found it flawed.
  4. Can likewise ensure that 0 is excluded from the confidence intervals
  5. ; problem with unifications ----- Meeting Notes (11/24/18 20:22) ----- third problem