SlideShare a Scribd company logo
1 of 34
Download to read offline
Mayo ASA -1-
Statistical Skepticism: How to use significance tests effectively
Deborah Mayo
Virginia Tech
October12, 2017
If you are to use statistical significance tests effectively you must
be in a position to respond to expected critical challenges:
Why are you using a statistical test of significance rather than
some other tool?
•It makes sense to reject Unitarianism.
•What type of context would ever make statistical (significance)
testing just the ticket?
Mayo ASA -2-
(1) Why use a tool that infers from a single (arbitrary) P-
value that pertains to a statistical hypothesis H0 to a
research claim H*?
Mayo ASA -3-
(1) Why use a tool that infers from a single (arbitrary) P-
value that pertains to a statistical hypothesis H0 to a
research claim H*?
RESPONSE: We don’t.
“[W]e need, not an isolated record, but a reliable method of
procedure. In relation to the test of significance, we may say
that a phenomenon is experimentally demonstrable when we
know how to conduct an experiment which will rarely fail to
give us a statistically significant result.” (Fisher 1947, p. 14)
•This is the basis for falsification in science.
Mayo ASA -4-
•Were H0 a reasonable description of the process, then with very
high probability you would not be able to regularly produce
statistically significant results.
•So if you do, it’s evidence H0 is false in the particular manner
probed (argument from coincidence).
Mayo ASA -5-
•Even a genuine statistical effect isn’t automaticallyevidence
for a substantiveH*.
•H* makes claims that haven’t been probed by the statistical test:
cliché: correlation isn’t causation.
•Neyman-Pearson (N-P) tests restrict the inference to an
alternative statistical claim–these improved on Fisher.
Mayo ASA -6-
(2) Why use an incompatible hybrid (of Fisher and N-P)?
Mayo ASA -7-
(2) Why use an incompatible hybrid (of Fisher and N-P)?
RESPONSE: They fall under the umbrella of tools that use a
method’s sampling distribution to assess and control its error
probing capacities.
•confidence intervals, N-P and Fisherian tests, resampling,
randomization.
•there’s an inference and it’s qualified by how well or poorly
tested that inference is.
Incompatibilist positions keep to caricatures
•Fisherians can’t use power (or sensitivity).
•N-P testers can’t use P-values; all used achieved significance
levels.
Mayo ASA -8-
“It is good practice to determine … the smallest significance
level…at which the hypothesis would be rejected for the given
observation. This number, the so-called P-value gives an idea
of how strongly the data contradict the hypothesis. It also
enables others to reach a verdict based on the significance
level of their choice.” (Lehmann and Romano 2005, pp. 63-4)
An attained P-value is also an error probability:
“…the P-value “is the probability that we would mistakenly
declare there to be evidence against H0, were we to regard the
data under analysis as being just decisive” for issuing such a
report. (Cox and Hinkley 1974, p. 66)
Mayo ASA -9-
•Some say: Fisher was evidential, Neyman cared only for long-run
error control.
•Both tribes use error probabilities to measure performance and
evidence.
•The two schools mathematically dovetail; it’s up to us to interpret
them.
•Drop personality labels and the NHST acronym: “statistical tests”
or “error statistical tests” will do.
Mayo ASA -10-
(3) Why apply a method that uses error probabilities, the
sampling distribution, researcher “intentions” and violates the
likelihood principle (LP)? You should condition on the data.
Mayo ASA -11-
(3) Why apply a method that uses error probabilities, the
sampling distribution, researcher “intentions”, and violates the
likelihood principle (LP)? You should condition on the data.
RESPONSE 1: If I condition on the actual data, this precludes
error probabilities.
•I use them to assess how well or poorly tested claims are by the
data at hand.
•What bothers you when cherry pickers selectively report?
•Not a problem with long-runs: You can’t say about the case at
hand that it has done a good job of avoiding the sources of
misinterpreting data.
Mayo ASA -12-
You need a principle of reasoning that makes sense of this.
Informally:
I haven’t been given evidence for a genuine effect if the
method makes it very easy to find some impressive-looking
effect, even if spurious.
Not enough that data x fit H; the method must have had a
reasonable ability to uncover flaws, if present.
Minimal requirement for severe testing.
Mayo ASA -13-
So you can readily agree with those who say:
“P values can only be computed once the sampling plan is fully
known and specified in advance.” (Wagenmakers 2007, p. 784)
“In fact, Bayes factors can be used in the complete absence of a
sampling plan… .” (Bayarri, Benjamin, Berger, Sellke 2016, p.
100)
RESPONSE 2: We’re in very different contexts. I’m in one that
led to advise a “21 word solution”:
Mayo ASA -14-
“We report how we determined our sample size, all data
exclusions (if any), all manipulations, and all measures in the
study.” (Simmons, Nelson, and Simonsohn 2012, p. 4)
•I’m in the setting of a (skeptical) consumer of statistics.
•Have you given yourself lots of extra chances in the “forking
paths” (Gelman and Loken 2014) between data and inference?
Mayo ASA -15-
•The value of preregistered reports–key strategy to promote
replication–is that your appraisal of the evidence is altered when
you see the history (e.g., they had 6 primary endpoints, but the
one reported is not among them).
•There’s a tension between the preregistration mindset and the
position,
It makes no sense that frequentists adjust the P-value in the face
of data dredging and other biasing selection effects–these
considerations have nothing to do with the evidence.
•Many ways to correct P-values, appropriate for different contexts
(Bonferroni, false discovery rates).
•The main thing is to have an alert that the reported P-values are
invalid or questionable.
Mayo ASA -16-
RESPONSE 3: I don’t want to relinquish my strongest criticism
against researchers whose findings are the result of fishing
expeditions.
•Wanting to lambast them, while promoting an account that
downplays error probabilities, critics turn to other means–e.g.,
give the claims low prior degrees of belief.
•Might work in some cases.
•The researcher deserving criticism can deflect this: saying a
Bayesian can always counter an effect this way.
•Post-data subgroups are often believable,that’s what makes
them seductive.
•Need to be able to say “that’s a plausible claim but this is a lousy
test of it.”
Mayo ASA -17-
(4) Why use methods that overstate evidence against a null
hypothesis?
Mayo ASA -18-
(4) Why use methods that overstate evidence against a null
hypothesis?
RESPONSE: Whether P-values exaggerate,
“depends on one’s philosophy of statistics and the precise
meaning given to the terms involved. …
that P values overstate evidence against test hypotheses, based
on directly comparing P values against certain quantities
(likelihood ratios and Bayes factors) that play a central role
as evidence measures in Bayesian analysis … Nonetheless,
many other statisticians do not accept these quantities as
gold standards, … Thus, from this frequentist perspective, P
values do not overstate evidence ... .” (Greenland, Senn,
Rothman, Carlin, Poole, Goodman, Altman 2016. p. 342)
Mayo ASA -19-
RESPONSE (cont.)
•From the error statistical perspective, it’s the Bayesian or Bayes
factor assessment that exaggerates.
•P-values can also agree with the posterior: in short, there is wide
latitude for Bayesian assignments.
•In the cases of disagreement, many charge the problem is not P-
values but the high prior.
“concentrating mass on the point null hypothesis is biasing the
prior in favor of H0 as much as possible… .”(Casella and Roger
Berger, 1987a, p. 111)
“The testing of a point null hypothesis is one of the most misused
statistical procedures. …[Most of the time] there is a direction of
interest in many experiments, and saddling an experimenter with
a two-sided test would not be appropriate.”(ibid. p. 106)
Mayo ASA -20-
•You needn’t either rule out an assessment from a different
school, nor assume your significance level assessments are
debunked.
•They’re measuring different things.
•The danger is when terms are confused.
Mayo ASA -21-
(5) Why do you use a method that presupposes the underlying
statistical model?
Mayo ASA -22-
(5) Why do you use a method that presupposes the underlying
statistical model?
RESPONSE: Au contraire. I use (simple) significance tests
because I want to test my statistical assumptions and perhaps
falsify them.
Here’s George Box, a Bayesian eclecticist:
“Some check is needed on [the fact that] some pattern or other
can be seen in almost any set of data or facts. This is the
object of diagnostic checks [which] require frequentist
theory [of] significance tests… .” (Box 1983, p. 57)
Mayo ASA -23-
Andrew Gelman develops “‘falsificationist Bayesianism’, …
with an interpretation of probability that can be seen as
frequentist in a wide sense and with an error statistical
approach to testing assumptions… .” (Gelman and Henning
2017, p. 25)
There’s a Bayesian P-value research program that some turn to in
checking accordance of a model (no explicit alternative).
Mayo ASA -24-
(6) Why use a measure that doesn’t report effect sizes?
Mayo ASA -25-
(6) Why use a measure that doesn’t report effect sizes?
RESPONSE:
•Here, supplements to statistical tests are needed.
•We interpret statistically significant results, or “reject H0”, in
terms of indications of a discrepancy γ from H0.
•Could also use confidence distributions (confidence intervals at
several levels).
Mayo ASA -26-
(7) Why do you use a method that doesn’t provide posterior
probabilities?
Mayo ASA -27-
(7) Why do you use a method that doesn’t provide posterior
probabilities?
Here you can answer a question with a question:
RESPONSE 1: Which notion of a posterior do you recommend?
Note that posteriors aren’t provided by comparative accounts:
Bayes Factors, likelihood ratios or model selections.
•The most prevalent Bayesian accounts are default/non-subjective
(with data dominant in some sense).
•There is no agreement on suitable priors.
•Even the ordering of parameters will yield different priors.
Mayo ASA -28-
•Default priors are not expressing uncertainty or degree of belief;
“…in most cases, they are not even proper probability
distributions in that they often do not integrate [to] one.”
(Bernardo 1997, pp. 159-160)
If priors are not probabilities, “what interpretation is justified
for the posterior?” (Cox 2006, p. 77)
•Coherent updating goes by the board.
Mayo ASA -29-
RESPONSE 2 (for testers)
•Scientists don’t seek hypotheses that are highly probable (in the
formal sense).
•They want methods that very probably would have unearthed
discrepancies and improvements–the probability is on the
method.
•A Bayesian posterior requires a catchall factor: all hypotheses
that could explain the data, and I just want to split off a piece (or
variant of a theory) to test.
•A silver lining to a difference in aims (well-probed hypotheses vs
highly probable ones): use different tools depending on context.
•Better than trying to reconcile very different goals.
Mayo ASA -30-
Upshot: I’m dealing with a problem where I need to scrutinize
what is warranted to infer – normative.
Methods must be
•able to block inferences that violate minimal severity,
•directly altered by biasing selection effects (e.g., cherry-
picking),
•able to falsify claims statistically,
•able to test statistical model assumptions.
“A world beyond p <.05” suggests a world without error statistical
testing; if you’re using such a test, you’ll need to respond to
expected critical challenges.
Mayo ASA -31-
NOTE, SLIDE 15: Benjamini and Hochberg (1995) develop what’s called the false discovery rate (FDR): the expected
proportion of the N hypotheses tested that are falsely rejected. The same term is sometimes used in a different manner
requiring prior probabilities. Westfall and Young (1993), one of the earliest treatments on P-value adjustments, develop
resampling methods to adjust for both multiple testing and multiple modeling.
NOTE 1, SLIDE 16:
“Whenever the null hypothesis is sharply defined but the prior distribution on the alternative hypothesis is diffused over a
wide range of values, as it is in … it boosts the probability that any observed data will be higher under the null hypothesis than
under the alternative. This is known as the Lindley-Jeffreys paradox: A frequentist analysis that yields strong evidence in
support of the experimental hypothesis can be contradicted by a misguided Bayesian analysis that concludes that the same data
are more likely under the null.” (Bem et al 2011, p. 717)
NOTE 2, SLIDE 16: They can be developed from either Fisherian or N-P perspectives. FEV: Frequentist Principle of Evidence (Mayo and
Cox 2006/2010); SEV: severity (Mayo and Spanos 2006, 2011). For a one-sided test of a mean:
• FEV/SEV: an insignificant result: A moderate (non-small) P‐value indicates the absence of a discrepancy δ from H0, just to
the extent that it’s probable the test would have given a worse fit with H0 (i.e., d(X) > d(x0)) were a discrepancy δ to exist.
• FEV/SEV: a significant result is evidence of discrepancy δ from H0, if and only if, there is a high probability the test
would have yielded a less significant result, were a discrepancy δ absent.
NOTE, SLIDE 19: Casella and R. Berger (1987b) argue, “We would be surprised if most researchers would place even a 10% prior
probability of H0. We hope that the casual reader of Berger and Delampady realizes that the big discrepancies between P-values and
P(H0|x) … are due to a large extent to the large value of [the prior of .5 to H0] that was used.” The most common uses of a point null,
asserting the difference between means is 0, or the coefficient of a regression coefficient is 0, merely describe a potentially interesting
feature of the population, with no special prior believability. “J. Berger and Delampady admit…, P-values are reasonable measures of
evidence when there is no a priori concentration of belief about H0” (ibid., p. 345). Thus, “the very argument that and Delampady use
to dismiss P-values can be turned around to argue for P-values (ibid., p. 346).
NOTE, SLIDE 28: Diagnostic Screening: In a very different paradigm, the prior stems from considering your research
hypothesis as randomly selected from a population of null hypotheses to be tested. Where relevant, the posterior might given
a long-run performance assessment (perhaps over all science, or journals), it not an assessment of well-testedness.
Mayo ASA -32-
REFERENCES:
Bayarri, M., Benjamin, D., Berger, J., and Sellke, T. (2016). “Rejection Odds and Rejection Ratios:
A Proposal for Statistical Practice in Testing Hypotheses”, Journal of Mathematical
Psychology, 72, 90-103.
Bem, D., Utts, J. and Johnson, W. (2011). “Must Psychologists Change the Way They Analyze Their
Data?” Journal of Personality and Social Psychology, 101(4), 716-719.
Benjamini Y. and Hochberg Y. (1995). “Controlling the False Discovery Rate: A Practical and
Powerful Approach to Multiple Testing”, Journal of the Royal Statistical Society, B, 57, 289-
300.
Bernardo, J. (1997). “Non-informative Priors Do Not Exist: A Discussion”, Journal of Statistical
Planning and Inference 65, 159–89.
Box, G. (1983). “An Apology for Ecumenism in Statistics”, in Box, G., Leonard, T. and Wu, D.
(eds.), Scientific Inference, Data Analysis, and Robustness, New York: Academic Press, 51-84.
Casella, G., and Berger, R. (1987a). “Reconciling Bayesian and Frequentist Evidence in the One-
sided Testing Problem”, Journal of the American Statistical Association 82(397), 106–11.
Casella, G., and Berger, R. (1987b). “Comment on Testing Precise Hypotheses by J. O. Berger and
M. Delampady”, Statistical Science 2(3), 344–347.
Cox, D. R. (2006). Principles of Statistical Inference, Cambridge: Cambridge University Press.
Cox, D. R. and Hinkley, D. (1974). Theoretical Statistics. London: Chapman and Hall.
Cox, D. R. and Mayo, D. (2010). “Objectivity and Conditionality in Frequentist Inference”, in Mayo,
D. and Spanos. A. (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning,
Reliability, and the Objectivity and Rationality of Science, Cambridge: Cambridge University
Press, 276–304.
Fisher, R. A. (1947). The Design of Experiments (4th ed.), Edinburgh: Oliver and Boyd.
Mayo ASA -33-
Gelman, A. and Hennig (2017) “Beyond Subjective and Objective in Statistics” (with discussion),
Journal of the Royal Statistical Society (in press).
Gelman, A. and Loken, E. (2014). “The Statistical Crisis in Science”, American Scientist 2, 460-65.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., et al. (1989; 1990). Empire of Chance: How
Probability Changed Science and Everyday Life, Cambridge: Cambridge University Press.
Greenland, S., Senn, S., Rothman, K., Carlin, J., Poole, C. Goodman, S., Altman, D. (2016).
“Statistical Tests, P values, Confidence Intervals, and Power: A Guide to Misinterpretations”,
European Journal of Epidemiology, 31(4), 337-350.
Goodman, S. (1999). “Toward Evidence-Based Medical Statistics. 2: The Bayes Factor”, Annals of
Internal Medicine, 130(12), 1005-13.
Lehmann, E. and Romano, J. (2005). Testing Statistical Hypotheses. 3rd ed., New York: Springer.
Mayo, D. (1996). Error and the Growth of Experimental Knowledge, Chicago: University of
Chicago Press.
Mayo, D. (2018, forthcoming). Statistical Inference as Severe Testing: How to Get Beyond the
Statistics Wars, New York/Cambridge: Cambridge Universe Press.
Mayo, D. and Cox, D. R. (2006). “Frequentists Statistics as a Theory of Inductive Inference”, in
Rojo, J. (ed.) Optimality: The Second Erich L. Lehmann Symposium, Lecture Notes-
Monograph series, Institute of Mathematical Statistics (IMS), 49: 77-97. Reprinted in Mayo, D.
and Spanos. A. (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning,
Reliability, and the Objectivity and Rationality of Science, Cambridge: Cambridge University
Press, 247-275.
Mayo, D. and Spanos, A. (2006). “Severe Testing as a Basic Concept in a Neyman–Pearson
Philosophy of Induction”, British Journal for the Philosophy of Science 57(2), 323–57.
Mayo ASA -34-
Mayo, D. and Spanos, A. (2011). “Error Statistics”, in Bandyopadhyay, P. and Forster, M. (eds.),
Philosophy of Statistics, 7, Handbook of the Philosophy of Science, The Netherlands: Elsevier,
152–198.
Senn, S. (2002). “A Comment on Replication, P-values and Evidence S. N. Goodman, Statistics in
Medicine 1992; 11:875-879”,Statistics in Medicine 21 (16), 2437–2444.
Simmons, J. Nelson, L. and Simonsohn, U. (2012). “A 21 Word Solution”, Dialogue: The Official
Newsletter of the Society for Personality and Social Psychology, 26(2), 4–7.
Wagenmakers, J. (2007). “A Practical Solution to the Pervasive Problems of P values”, Psychonomic
Bulletin & Review 14(5), 779-804.
Westfall, P. and Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for
P-Value Adjustment. New York: Wiley.

More Related Content

What's hot

Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2jemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Researchjemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardChristian Robert
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperChristian Robert
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)jemille6
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2John Labrador
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayojemille6
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...jemille6
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingTushar Kumar
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Sciencejemille6
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...jemille6
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)jemille6
 

What's hot (20)

Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
Gelman psych crisis_2
Gelman psych crisis_2Gelman psych crisis_2
Gelman psych crisis_2
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Research
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
 
Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)Byrd statistical considerations of the histomorphometric test protocol (1)
Byrd statistical considerations of the histomorphometric test protocol (1)
 
On p-values
On p-valuesOn p-values
On p-values
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in ScienceD. Mayo: Philosophy of Statistics & the Replication Crisis in Science
D. Mayo: Philosophy of Statistics & the Replication Crisis in Science
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
 
Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)Mayo O&M slides (4-28-13)
Mayo O&M slides (4-28-13)
 

Similar to Statistical skepticism: How to use significance tests effectively

Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Introduction-to-Hypothesis-Testing Explained in detail
Introduction-to-Hypothesis-Testing Explained in detailIntroduction-to-Hypothesis-Testing Explained in detail
Introduction-to-Hypothesis-Testing Explained in detailShriramKargaonkar
 
Chapter10 3%285%29
Chapter10 3%285%29Chapter10 3%285%29
Chapter10 3%285%29jhtrespa
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.pptShivraj Nile
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studywolf vanpaemel
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
Surviving statistics lecture 1
Surviving statistics lecture 1Surviving statistics lecture 1
Surviving statistics lecture 1MikeBlyth
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfallsPavlos Msaouel, MD, PhD
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxadampcarr67227
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxkarlhennesey
 

Similar to Statistical skepticism: How to use significance tests effectively (20)

Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Introduction-to-Hypothesis-Testing Explained in detail
Introduction-to-Hypothesis-Testing Explained in detailIntroduction-to-Hypothesis-Testing Explained in detail
Introduction-to-Hypothesis-Testing Explained in detail
 
Basic Conecepts of Inferential Statistics _ Slideshare.pptx
Basic Conecepts of Inferential Statistics _ Slideshare.pptxBasic Conecepts of Inferential Statistics _ Slideshare.pptx
Basic Conecepts of Inferential Statistics _ Slideshare.pptx
 
Prague 02.10.2008
Prague 02.10.2008Prague 02.10.2008
Prague 02.10.2008
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
 
Chapter10 3%285%29
Chapter10 3%285%29Chapter10 3%285%29
Chapter10 3%285%29
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
Berd 5-6
Berd 5-6Berd 5-6
Berd 5-6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt
 
sience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real studysience 2.0 : an illustration of good research practices in a real study
sience 2.0 : an illustration of good research practices in a real study
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
Statistics basics
Statistics basicsStatistics basics
Statistics basics
 
Sti2018 jws
Sti2018 jwsSti2018 jws
Sti2018 jws
 
Surviving statistics lecture 1
Surviving statistics lecture 1Surviving statistics lecture 1
Surviving statistics lecture 1
 
Statistics in clinical and translational research common pitfalls
Statistics in clinical and translational research  common pitfallsStatistics in clinical and translational research  common pitfalls
Statistics in clinical and translational research common pitfalls
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docx
 
Prague 2008
Prague 2008Prague 2008
Prague 2008
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 

Recently uploaded

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Recently uploaded (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Statistical skepticism: How to use significance tests effectively

  • 1. Mayo ASA -1- Statistical Skepticism: How to use significance tests effectively Deborah Mayo Virginia Tech October12, 2017 If you are to use statistical significance tests effectively you must be in a position to respond to expected critical challenges: Why are you using a statistical test of significance rather than some other tool? •It makes sense to reject Unitarianism. •What type of context would ever make statistical (significance) testing just the ticket?
  • 2. Mayo ASA -2- (1) Why use a tool that infers from a single (arbitrary) P- value that pertains to a statistical hypothesis H0 to a research claim H*?
  • 3. Mayo ASA -3- (1) Why use a tool that infers from a single (arbitrary) P- value that pertains to a statistical hypothesis H0 to a research claim H*? RESPONSE: We don’t. “[W]e need, not an isolated record, but a reliable method of procedure. In relation to the test of significance, we may say that a phenomenon is experimentally demonstrable when we know how to conduct an experiment which will rarely fail to give us a statistically significant result.” (Fisher 1947, p. 14) •This is the basis for falsification in science.
  • 4. Mayo ASA -4- •Were H0 a reasonable description of the process, then with very high probability you would not be able to regularly produce statistically significant results. •So if you do, it’s evidence H0 is false in the particular manner probed (argument from coincidence).
  • 5. Mayo ASA -5- •Even a genuine statistical effect isn’t automaticallyevidence for a substantiveH*. •H* makes claims that haven’t been probed by the statistical test: cliché: correlation isn’t causation. •Neyman-Pearson (N-P) tests restrict the inference to an alternative statistical claim–these improved on Fisher.
  • 6. Mayo ASA -6- (2) Why use an incompatible hybrid (of Fisher and N-P)?
  • 7. Mayo ASA -7- (2) Why use an incompatible hybrid (of Fisher and N-P)? RESPONSE: They fall under the umbrella of tools that use a method’s sampling distribution to assess and control its error probing capacities. •confidence intervals, N-P and Fisherian tests, resampling, randomization. •there’s an inference and it’s qualified by how well or poorly tested that inference is. Incompatibilist positions keep to caricatures •Fisherians can’t use power (or sensitivity). •N-P testers can’t use P-values; all used achieved significance levels.
  • 8. Mayo ASA -8- “It is good practice to determine … the smallest significance level…at which the hypothesis would be rejected for the given observation. This number, the so-called P-value gives an idea of how strongly the data contradict the hypothesis. It also enables others to reach a verdict based on the significance level of their choice.” (Lehmann and Romano 2005, pp. 63-4) An attained P-value is also an error probability: “…the P-value “is the probability that we would mistakenly declare there to be evidence against H0, were we to regard the data under analysis as being just decisive” for issuing such a report. (Cox and Hinkley 1974, p. 66)
  • 9. Mayo ASA -9- •Some say: Fisher was evidential, Neyman cared only for long-run error control. •Both tribes use error probabilities to measure performance and evidence. •The two schools mathematically dovetail; it’s up to us to interpret them. •Drop personality labels and the NHST acronym: “statistical tests” or “error statistical tests” will do.
  • 10. Mayo ASA -10- (3) Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions” and violates the likelihood principle (LP)? You should condition on the data.
  • 11. Mayo ASA -11- (3) Why apply a method that uses error probabilities, the sampling distribution, researcher “intentions”, and violates the likelihood principle (LP)? You should condition on the data. RESPONSE 1: If I condition on the actual data, this precludes error probabilities. •I use them to assess how well or poorly tested claims are by the data at hand. •What bothers you when cherry pickers selectively report? •Not a problem with long-runs: You can’t say about the case at hand that it has done a good job of avoiding the sources of misinterpreting data.
  • 12. Mayo ASA -12- You need a principle of reasoning that makes sense of this. Informally: I haven’t been given evidence for a genuine effect if the method makes it very easy to find some impressive-looking effect, even if spurious. Not enough that data x fit H; the method must have had a reasonable ability to uncover flaws, if present. Minimal requirement for severe testing.
  • 13. Mayo ASA -13- So you can readily agree with those who say: “P values can only be computed once the sampling plan is fully known and specified in advance.” (Wagenmakers 2007, p. 784) “In fact, Bayes factors can be used in the complete absence of a sampling plan… .” (Bayarri, Benjamin, Berger, Sellke 2016, p. 100) RESPONSE 2: We’re in very different contexts. I’m in one that led to advise a “21 word solution”:
  • 14. Mayo ASA -14- “We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.” (Simmons, Nelson, and Simonsohn 2012, p. 4) •I’m in the setting of a (skeptical) consumer of statistics. •Have you given yourself lots of extra chances in the “forking paths” (Gelman and Loken 2014) between data and inference?
  • 15. Mayo ASA -15- •The value of preregistered reports–key strategy to promote replication–is that your appraisal of the evidence is altered when you see the history (e.g., they had 6 primary endpoints, but the one reported is not among them). •There’s a tension between the preregistration mindset and the position, It makes no sense that frequentists adjust the P-value in the face of data dredging and other biasing selection effects–these considerations have nothing to do with the evidence. •Many ways to correct P-values, appropriate for different contexts (Bonferroni, false discovery rates). •The main thing is to have an alert that the reported P-values are invalid or questionable.
  • 16. Mayo ASA -16- RESPONSE 3: I don’t want to relinquish my strongest criticism against researchers whose findings are the result of fishing expeditions. •Wanting to lambast them, while promoting an account that downplays error probabilities, critics turn to other means–e.g., give the claims low prior degrees of belief. •Might work in some cases. •The researcher deserving criticism can deflect this: saying a Bayesian can always counter an effect this way. •Post-data subgroups are often believable,that’s what makes them seductive. •Need to be able to say “that’s a plausible claim but this is a lousy test of it.”
  • 17. Mayo ASA -17- (4) Why use methods that overstate evidence against a null hypothesis?
  • 18. Mayo ASA -18- (4) Why use methods that overstate evidence against a null hypothesis? RESPONSE: Whether P-values exaggerate, “depends on one’s philosophy of statistics and the precise meaning given to the terms involved. … that P values overstate evidence against test hypotheses, based on directly comparing P values against certain quantities (likelihood ratios and Bayes factors) that play a central role as evidence measures in Bayesian analysis … Nonetheless, many other statisticians do not accept these quantities as gold standards, … Thus, from this frequentist perspective, P values do not overstate evidence ... .” (Greenland, Senn, Rothman, Carlin, Poole, Goodman, Altman 2016. p. 342)
  • 19. Mayo ASA -19- RESPONSE (cont.) •From the error statistical perspective, it’s the Bayesian or Bayes factor assessment that exaggerates. •P-values can also agree with the posterior: in short, there is wide latitude for Bayesian assignments. •In the cases of disagreement, many charge the problem is not P- values but the high prior. “concentrating mass on the point null hypothesis is biasing the prior in favor of H0 as much as possible… .”(Casella and Roger Berger, 1987a, p. 111) “The testing of a point null hypothesis is one of the most misused statistical procedures. …[Most of the time] there is a direction of interest in many experiments, and saddling an experimenter with a two-sided test would not be appropriate.”(ibid. p. 106)
  • 20. Mayo ASA -20- •You needn’t either rule out an assessment from a different school, nor assume your significance level assessments are debunked. •They’re measuring different things. •The danger is when terms are confused.
  • 21. Mayo ASA -21- (5) Why do you use a method that presupposes the underlying statistical model?
  • 22. Mayo ASA -22- (5) Why do you use a method that presupposes the underlying statistical model? RESPONSE: Au contraire. I use (simple) significance tests because I want to test my statistical assumptions and perhaps falsify them. Here’s George Box, a Bayesian eclecticist: “Some check is needed on [the fact that] some pattern or other can be seen in almost any set of data or facts. This is the object of diagnostic checks [which] require frequentist theory [of] significance tests… .” (Box 1983, p. 57)
  • 23. Mayo ASA -23- Andrew Gelman develops “‘falsificationist Bayesianism’, … with an interpretation of probability that can be seen as frequentist in a wide sense and with an error statistical approach to testing assumptions… .” (Gelman and Henning 2017, p. 25) There’s a Bayesian P-value research program that some turn to in checking accordance of a model (no explicit alternative).
  • 24. Mayo ASA -24- (6) Why use a measure that doesn’t report effect sizes?
  • 25. Mayo ASA -25- (6) Why use a measure that doesn’t report effect sizes? RESPONSE: •Here, supplements to statistical tests are needed. •We interpret statistically significant results, or “reject H0”, in terms of indications of a discrepancy γ from H0. •Could also use confidence distributions (confidence intervals at several levels).
  • 26. Mayo ASA -26- (7) Why do you use a method that doesn’t provide posterior probabilities?
  • 27. Mayo ASA -27- (7) Why do you use a method that doesn’t provide posterior probabilities? Here you can answer a question with a question: RESPONSE 1: Which notion of a posterior do you recommend? Note that posteriors aren’t provided by comparative accounts: Bayes Factors, likelihood ratios or model selections. •The most prevalent Bayesian accounts are default/non-subjective (with data dominant in some sense). •There is no agreement on suitable priors. •Even the ordering of parameters will yield different priors.
  • 28. Mayo ASA -28- •Default priors are not expressing uncertainty or degree of belief; “…in most cases, they are not even proper probability distributions in that they often do not integrate [to] one.” (Bernardo 1997, pp. 159-160) If priors are not probabilities, “what interpretation is justified for the posterior?” (Cox 2006, p. 77) •Coherent updating goes by the board.
  • 29. Mayo ASA -29- RESPONSE 2 (for testers) •Scientists don’t seek hypotheses that are highly probable (in the formal sense). •They want methods that very probably would have unearthed discrepancies and improvements–the probability is on the method. •A Bayesian posterior requires a catchall factor: all hypotheses that could explain the data, and I just want to split off a piece (or variant of a theory) to test. •A silver lining to a difference in aims (well-probed hypotheses vs highly probable ones): use different tools depending on context. •Better than trying to reconcile very different goals.
  • 30. Mayo ASA -30- Upshot: I’m dealing with a problem where I need to scrutinize what is warranted to infer – normative. Methods must be •able to block inferences that violate minimal severity, •directly altered by biasing selection effects (e.g., cherry- picking), •able to falsify claims statistically, •able to test statistical model assumptions. “A world beyond p <.05” suggests a world without error statistical testing; if you’re using such a test, you’ll need to respond to expected critical challenges.
  • 31. Mayo ASA -31- NOTE, SLIDE 15: Benjamini and Hochberg (1995) develop what’s called the false discovery rate (FDR): the expected proportion of the N hypotheses tested that are falsely rejected. The same term is sometimes used in a different manner requiring prior probabilities. Westfall and Young (1993), one of the earliest treatments on P-value adjustments, develop resampling methods to adjust for both multiple testing and multiple modeling. NOTE 1, SLIDE 16: “Whenever the null hypothesis is sharply defined but the prior distribution on the alternative hypothesis is diffused over a wide range of values, as it is in … it boosts the probability that any observed data will be higher under the null hypothesis than under the alternative. This is known as the Lindley-Jeffreys paradox: A frequentist analysis that yields strong evidence in support of the experimental hypothesis can be contradicted by a misguided Bayesian analysis that concludes that the same data are more likely under the null.” (Bem et al 2011, p. 717) NOTE 2, SLIDE 16: They can be developed from either Fisherian or N-P perspectives. FEV: Frequentist Principle of Evidence (Mayo and Cox 2006/2010); SEV: severity (Mayo and Spanos 2006, 2011). For a one-sided test of a mean: • FEV/SEV: an insignificant result: A moderate (non-small) P‐value indicates the absence of a discrepancy δ from H0, just to the extent that it’s probable the test would have given a worse fit with H0 (i.e., d(X) > d(x0)) were a discrepancy δ to exist. • FEV/SEV: a significant result is evidence of discrepancy δ from H0, if and only if, there is a high probability the test would have yielded a less significant result, were a discrepancy δ absent. NOTE, SLIDE 19: Casella and R. Berger (1987b) argue, “We would be surprised if most researchers would place even a 10% prior probability of H0. We hope that the casual reader of Berger and Delampady realizes that the big discrepancies between P-values and P(H0|x) … are due to a large extent to the large value of [the prior of .5 to H0] that was used.” The most common uses of a point null, asserting the difference between means is 0, or the coefficient of a regression coefficient is 0, merely describe a potentially interesting feature of the population, with no special prior believability. “J. Berger and Delampady admit…, P-values are reasonable measures of evidence when there is no a priori concentration of belief about H0” (ibid., p. 345). Thus, “the very argument that and Delampady use to dismiss P-values can be turned around to argue for P-values (ibid., p. 346). NOTE, SLIDE 28: Diagnostic Screening: In a very different paradigm, the prior stems from considering your research hypothesis as randomly selected from a population of null hypotheses to be tested. Where relevant, the posterior might given a long-run performance assessment (perhaps over all science, or journals), it not an assessment of well-testedness.
  • 32. Mayo ASA -32- REFERENCES: Bayarri, M., Benjamin, D., Berger, J., and Sellke, T. (2016). “Rejection Odds and Rejection Ratios: A Proposal for Statistical Practice in Testing Hypotheses”, Journal of Mathematical Psychology, 72, 90-103. Bem, D., Utts, J. and Johnson, W. (2011). “Must Psychologists Change the Way They Analyze Their Data?” Journal of Personality and Social Psychology, 101(4), 716-719. Benjamini Y. and Hochberg Y. (1995). “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing”, Journal of the Royal Statistical Society, B, 57, 289- 300. Bernardo, J. (1997). “Non-informative Priors Do Not Exist: A Discussion”, Journal of Statistical Planning and Inference 65, 159–89. Box, G. (1983). “An Apology for Ecumenism in Statistics”, in Box, G., Leonard, T. and Wu, D. (eds.), Scientific Inference, Data Analysis, and Robustness, New York: Academic Press, 51-84. Casella, G., and Berger, R. (1987a). “Reconciling Bayesian and Frequentist Evidence in the One- sided Testing Problem”, Journal of the American Statistical Association 82(397), 106–11. Casella, G., and Berger, R. (1987b). “Comment on Testing Precise Hypotheses by J. O. Berger and M. Delampady”, Statistical Science 2(3), 344–347. Cox, D. R. (2006). Principles of Statistical Inference, Cambridge: Cambridge University Press. Cox, D. R. and Hinkley, D. (1974). Theoretical Statistics. London: Chapman and Hall. Cox, D. R. and Mayo, D. (2010). “Objectivity and Conditionality in Frequentist Inference”, in Mayo, D. and Spanos. A. (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge: Cambridge University Press, 276–304. Fisher, R. A. (1947). The Design of Experiments (4th ed.), Edinburgh: Oliver and Boyd.
  • 33. Mayo ASA -33- Gelman, A. and Hennig (2017) “Beyond Subjective and Objective in Statistics” (with discussion), Journal of the Royal Statistical Society (in press). Gelman, A. and Loken, E. (2014). “The Statistical Crisis in Science”, American Scientist 2, 460-65. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., et al. (1989; 1990). Empire of Chance: How Probability Changed Science and Everyday Life, Cambridge: Cambridge University Press. Greenland, S., Senn, S., Rothman, K., Carlin, J., Poole, C. Goodman, S., Altman, D. (2016). “Statistical Tests, P values, Confidence Intervals, and Power: A Guide to Misinterpretations”, European Journal of Epidemiology, 31(4), 337-350. Goodman, S. (1999). “Toward Evidence-Based Medical Statistics. 2: The Bayes Factor”, Annals of Internal Medicine, 130(12), 1005-13. Lehmann, E. and Romano, J. (2005). Testing Statistical Hypotheses. 3rd ed., New York: Springer. Mayo, D. (1996). Error and the Growth of Experimental Knowledge, Chicago: University of Chicago Press. Mayo, D. (2018, forthcoming). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars, New York/Cambridge: Cambridge Universe Press. Mayo, D. and Cox, D. R. (2006). “Frequentists Statistics as a Theory of Inductive Inference”, in Rojo, J. (ed.) Optimality: The Second Erich L. Lehmann Symposium, Lecture Notes- Monograph series, Institute of Mathematical Statistics (IMS), 49: 77-97. Reprinted in Mayo, D. and Spanos. A. (eds.), Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, and the Objectivity and Rationality of Science, Cambridge: Cambridge University Press, 247-275. Mayo, D. and Spanos, A. (2006). “Severe Testing as a Basic Concept in a Neyman–Pearson Philosophy of Induction”, British Journal for the Philosophy of Science 57(2), 323–57.
  • 34. Mayo ASA -34- Mayo, D. and Spanos, A. (2011). “Error Statistics”, in Bandyopadhyay, P. and Forster, M. (eds.), Philosophy of Statistics, 7, Handbook of the Philosophy of Science, The Netherlands: Elsevier, 152–198. Senn, S. (2002). “A Comment on Replication, P-values and Evidence S. N. Goodman, Statistics in Medicine 1992; 11:875-879”,Statistics in Medicine 21 (16), 2437–2444. Simmons, J. Nelson, L. and Simonsohn, U. (2012). “A 21 Word Solution”, Dialogue: The Official Newsletter of the Society for Personality and Social Psychology, 26(2), 4–7. Wagenmakers, J. (2007). “A Practical Solution to the Pervasive Problems of P values”, Psychonomic Bulletin & Review 14(5), 779-804. Westfall, P. and Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. New York: Wiley.