SlideShare a Scribd company logo
Meeting 2 (May 28) Tour I
Ingenious and Severe Tests
Tour I Ingenious and Severe Tests p. 119
1
But first, a bit on the goals of our
seminar
This research seminar is intended as a lead-up to
the workshop: The Statistics Wars and Their
Casualties:
https://phil-stat-wars.com/2020/02/16/99/.
• to galvanize you to contribute to the ongoing
evidence-policy controversies now taking place.
2
3
Some Questions
• What is the “replication crisis”? Are we in one? (2010-20)
• Should replication be required?
• Should we change “perverse incentives”? (open science,
badges, preregistration)
• Do data dredging and P-hacking alter the import of data?
Always? Sometimes?)
• Should P-values be redefined (lowered)? Banned/
replaced with (confidence intervals, Bayes factors)
• Should we stop saying “significant”?
4
American Statistical Society
(ASA): Statement on P-values
“The statistical community has been deeply
concerned about issues of reproducibility and
replicability of scientific conclusions. …. much
confusion and even doubt about the validity of
science is arising. Such doubt can lead to
radical choices such as…to ban P-
values…(ASA 2016)
Is philosophy of science relevant?
5
I was a philosophical observer at
the ASA P-value “pow wow”
6
“Don’t throw out the error control
baby with the bad statistics
bathwater”
The American Statistician
7
8
The [ASA] is to be credited with opening up a
discussion into p-values; …We should oust recipe-like
uses of P-values that have long been lampooned, but
without understanding their valuable (if limited) roles,
there’s a danger of blithely substituting “alternative
measures of evidence” that throw out the error control
baby with the bad statistics bathwater”.
They only give an informal treatment based on a
popular variant on simple (Fisherian) tests (NHST)
A bit of history: Where are members
of our cast of characters in 1919?
Fisher
In 1919, Fisher accepts a job as a statistician at
Rothamsted Experimental Station.
• A more secure offer by Karl Pearson (KP)
required KP to approve everything Fisher taught
or published
• A subsistence farmer
9
Fisher & Family
10
Neyman
In 1919 Neyman is living a hardscrabble life in
Poland, sent to jail for a short time for selling
matches for food,
• Sent to KP in 1925 to have his work appraised.
11
Pearson
Pearson (Egon) gets his B.A. in 1919.
12
He describes the psychological crisis he’s going
through when Neyman arrives in London:
“I was torn between conflicting emotions: a. finding it
difficult to understand R.A.F., b. hating [Fisher] for his
attacks on my paternal ‘god,’ c. realizing that in some
things at least he was right” (Reid, C. 1997, p. 56).
N-P Tests: Putting Fisherian Tests on
a Logical Footing
After Neyman’s year at University College (1925/6),
Pearson is suddenly “smitten” with doubts due to
Fisher
For the Fisherian simple or “pure” significance test,
alternatives to the null “lurk in the undergrowth but are
not explicitly formulated probabilistically” (Mayo and
Cox 2006, p. 81).
13
So What’s in a Test? (p. 129-130):
We proceed by setting up a specific hypothesis to
test, H0 in Neyman’s and my terminology, the null
hypothesis in R. A. Fishers…in choosing the test, we
take into account alternatives to H0 which we believe
possible or at any rate consider it most important to
be on the look out for….:
Step 1. We must first specify the set of results
Step 2. We then divide this set by a system of
ordered boundaries …such that as we pass across
one boundary and proceed to the next, we come to a
class of results which makes us more and more
inclined on the information available, to reject the
hypothesis tested in favour of alternatives which differ
from it by increasing amounts. 14
Step 3. We then, if possible, associate with each contour
level the chance that, if H0 is true, a result will occur in
random sampling lying beyond that level….
In our first papers [in 1928] we suggested that the likelihood
ratio criterion, λ, was a very useful one… Thus Step 2
proceeded Step 3. In later papers [1933-1938] we started
with a fixed value for the chance, ε, of Step 3… However,
although the mathematical procedure may put Step 3
before 2, we cannot put this into operation before we
have decided, under Step 2, on the guiding principle to
be used in choosing the contour system. That is why I
have numbered the steps in this order. (Egon Pearson
1947, p. 143)
15
16
They were not intended to be used as
Accept-Reject Routines
(behavioristic performance)
‘[U]nlike Fisher, Neyman and Pearson (1933, p. 296) did
not recommend a standard level but suggested that ‘ how
the balance [between the two kinds of error] should be
struck must be left to the investigator’” (Lehmann 1993b,
p. 1244).
Neyman and Pearson stressed that the tests were to be
“used with discretion and understanding” depending on
the context (Neyman and Pearson 1928, p. 58).
17
Water Plant (SIST p. 142)
1-sided normal testing
H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 100)
let significance level α = .025
Reject H0 whenever M ≥ 150 + 2σ/√n: M ≥ 152
M is the sample mean, !𝑋, its value is M0.
1SE = σ/√n = 1
18
Rejection rules:
Reject iff M > 150 + 2SE (N-P)
In terms of the P-value:
Reject iff P-value ≤ .025 (Fisher)
(P-value a distance measure, but inverted)
Let M = 152, so I reject H0.
19
SOME P –VALUES
Let M = 152
Z = (152 – 150)/1 = 2
Z = (mean obs – H0)/1 = 2
The P-value is Pr(Z > 2) = .025
20
SOME P –VALUES
Let M = 151
Z = (151 – 150)/1 = 1
The P-value is Pr(Z > 1) = .16
21
SOME P –VALUES
Let M = 150.5
Z = (150.5 – 150)/1 = .5
The P-value is Pr(Z > .5) = .3
22
SOME P –VALUES
Let M = 150
Z = (150 – 150)/1 = 0
The P-value is Pr(Z > 0) = .5
(important benchmark)
23
Major Criticism: P-values Don’t
Measure an Effect Size!
“A p-value, or statistical significance, does not
measure the size of an effect or the importance of
a result”. (ASA Statement)
True, a P-value is a probability.
24
Reformulation
Severity function: SEV(Test T, data x, claim C)
• Tests are reformulated in terms of a discrepancy γ
from H0
• Instead of a binary cut-off (significant or not) the
particular outcome is used to infer discrepancies
that are or are not warranted
25
H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 100)
The usual test infers there’s an indication of some
positive discrepancy from 150 because
Pr(M < 152: H0) = .97
Not very informative
Are we warranted in inferring μ > 153 say?
26
• Note: Our inferences are not to point values,
but we agree to the need to block inferences to
discrepancies beyond those warranted with
severity.
27
Consider: How severely has μ > 153 passed the test?
SEV(μ > 153 ) (p. 143)
M = 152, as before, claim C: μ > 153
The data “accord with C” but there needs to be a
reasonable probability of a worse fit with C, if C is false
Pr(“a worse fit”; C is false)
Pr(M ≤ 152; μ ≤ 153)
Evaluate at μ = 153, as the prob is greater for μ < 153.
28
Consider: How severely has μ > 153 passed the
test?
To get Pr(M ≤ 152: μ = 153), standardize:
Z = √100 (152- 153)/1 = -1
Pr(Z < -1) = .16 Terrible evidence
29
Consider: How severely has μ > 150 passed the test?
To get Pr(M ≤ 152: μ = 150), standardize:
Z = √100 (152- 150)/1 = 2
Pr(Z < 2) = .97
Notice it’s 1 – P-value
30
Now consider SEV(μ > 150.5) (still with M = 152)
Pr (A worse fit with C; claim is false) = .97
Pr(M < 152; μ = 150.5)
Z = (152 – 150.5) /1 = 1.5
Pr (Z < 1.5)= .93 Fairly good indication μ > 150.5
31
32
O Table 3.1
O Table 3.1
FOR PRACTICE:
Now consider SEV(μ > 151) (still with M = 152)
Pr (A worse fit with C; claim is false) = __
Pr(M < 152; μ = 151)
Z = (152 – 151) /1 = 1
Pr (Z < 1)= .84
33
MORE PRACTICE:
Now consider SEV(μ > 152) (still with M = 152)
Pr (A worse fit with C; claim is false) = __
Pr(M < 152; μ = 152)
Z = 0
Pr (Z < 0)= .5–important benchmark
Terrible evidence that μ > 152
Table 3.2 has exs with M = 153. 34
Example for you (change sample size)
1-sided normal testing
H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 25)
let significance level α = .025
Reject H0 whenever M ≥ 150 + 2σ/√n:
Assess SEV for the claims in Table 3.1 (p. 144)
35
Criticism: a P-value with a large
sample size may indicate a trivial
discrepancy or effect size
• Fixing the P-value, increasing sample size n,
the cut-off gets smaller
• Get to a point where x is closer to the null than
various alternatives
• Many would lower the P-value requirement as n
increases-can always avoid inferring a
discrepancy beyond what’s warranted: 36
Severity tells us:
• an α-significant difference indicates less of a
discrepancy from the null if it results from larger (n1)
rather than a smaller (n2) sample size (n1 > n2 )
• What’s more indicative of a large effect (fire), a fire
alarm that goes off with burnt toast or one that
doesn’t go off unless the house is fully ablaze?
• [The larger sample size is like the one that goes off
with burnt toast] 37
Compare n = 100 with n = 10,000
H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 10,000)
Reject H0 whenever M ≥ 2SE: M ≥ 150.2
M is the sample mean (significance level = .025)
1SE = σ/√n = 10/√10,000 = .1
Let M = 150.2, so I reject H0.
(return to in Meeting 4)
38
Comparing n = 100 with n = 10,000
Reject H0 whenever M ≥ 2SE: M ≥ 150.2
SEV10,000(μ > 150.5) = 0.001
Z = (150.2 – 150.5) /.1 = -.3/.1 = -3
P(Z < -3) = .001
Corresponding 95% CI: [0, 150.4]
A .025 result is terrible indication μ > 150.5
When reached with n = 10,000
While SEV100(μ > 150.5) = 0.93 39
Fallacies of Non-rejection.
Let M = 151, the test does not reject H0.
We want to be alert to a fallacious interpretation of a
“negative” result: inferring there’s no positive
discrepancy from μ = 150.
Is there evidence of compliance? μ ≤ 150?
The data “accord with” H0, but what if the test had little
capacity to have alerted us to discrepancies from 150?
40
No evidence against H0 is not evidence for it
(Postcard).
We need to consider Pr(X > 151; 150), which is only
.16.
41
Computation for SEV(T, M = 151, C: μ ≤ 150)
Z = (151 – 150)/1 = 1
Pr(Z > 1) = .16
SEV(C: μ ≤ 150) = low (.16).
• So there’s poor indication of H0
42
Can they say M = 151 is a good indication that μ ≤
150.5?
No, SEV(T, M = 151, C: μ ≤ 150.5) = ~.3.
[Z = 151 – 150.5 = .5]
But M = 151 is a good indication that μ ≤ 152
[Z = 151 – 152 = -1; Pr (Z > -1) = .84 ]
SEV(μ ≤ 152) = .84
It’s an even better indication μ ≤ 153 (Table 3.3, p. 145)
[Z = 151 – 153 = -2; Pr (Z > -2) = .97 ] 43
Retire Statistical Significance?
“researchers have been warned that a statistically
non-significant result does not ‘prove’ the null
hypothesis”
“…. we should never conclude there is ‘no
difference’ or ‘no association’ just because a P
value is larger than a threshold such as 0.05.
(Amrhein et al., 2019)
Agreed
44
Neyman’s fault?
The term “ acceptance,” Neyman tells us, was
merely shorthand: “ The phrase ‘ do not reject H’
is longish and cumbersome . . . My own
preferred substitute for ‘ do not reject H’ is ‘ no
evidence against H is found’” (Neyman 1976, p.
749).
But he also set upper bounds (confidence
intervals, power analysis)
45
FEV: Frequentist Principle of Evidence; Mayo and
Cox (2006); SEV: Mayo 1991, Mayo and Spanos
(2006)
FEV/SEV A small P-value indicates discrepancy γ from H0, if
and only if, there is a high probability the test would have
resulted in a larger P-value were a discrepancy as large as γ
absent.
FEV/SEV A moderate P-value indicates the absence of a
discrepancy γ from H0, only if there is a high probability
the test would have given a worse fit with H0 (i.e., a
smaller P-value) were a discrepancy γ present.
46
Frequentist Evidential Principle:
FEV
FEV (i). x is evidence against H0 (i.e., evidence of
discrepancy from H0), if and only if the P-value Pr(d >
d0; H0) is very low (equivalently, Pr(d < d0; H0)= 1 - P
is very high).
47
Contraposing FEV(i) we get our minimal principle
FEV (ia) x are poor evidence against H0 (poor
evidence of discrepancy from H0), if there’s a high
probability the test would yield a more discordant
result, if H0 is correct.
Note the one-directional ‘if’ claim in FEV (1a)
(i) is not the only way x can be BENT.
48
P-value “moderate” (non-significance)
FEV(ii): A moderate p value is evidence of the absence
of a discrepancy γ from H0, only if there is a high
probability the test would have given a worse fit with H0
(i.e., smaller P- value) were a discrepancy γ to exist.
In the Neyman-Pearson theory of tests, the
sensitivity of a test is assessed by the notion of
power, defined as the probability of reaching a preset
level of significance …for various alternative
hypotheses. In the approach adopted here the
assessment is via the distribution of the random
variable P, again considered for various alternatives
(Cox 2006, p. 25) 49
Π(γ): “sensitivity function”
Computing Π(γ) views the P-value as a statistic.
Π(γ) = Pr(P < pobs; µ0 + γ).
The alternative µ1 = µ0 + γ.
Given that P-value inverts the distance, it is less
confusing to write Π(γ)
Π(γ) = Pr(d > d0; µ0 + γ).
Compare to the power of a test:
POW(γ) = Pr(d > cα; µ0 + γ) the N-P cut-off cα.
50
The following slides are extra
FEV(ii) in terms of Π(γ)
P-value is modest (not small): Since the data
accord with the null hypothesis, FEV directs us to
examine the probability of observing a result more
discordant from H0 if µ = µ0 +γ:
If Π(γ) = Pr(d > d0; µ0 + γ) is very high, the data
indicate that µ< µ0 +γ.
Here Π(γ) gives the severity with which the test has
probed the discrepancy γ.
51
FEV (ia) in terms of Π(γ)
If Π(γ) = Pr(d > do; µ0 +γ) = moderately high (greater
than .3, .4, .5), then there’s poor grounds for
inferring µ > µ0 + γ.
This is equivalent to saying the SEV(µ > µ0 +γ) is
poor.
52

More Related Content

What's hot

Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
Babasab Patil
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
jemille6
 
Chapter 6 Probability
Chapter 6  ProbabilityChapter 6  Probability
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distribution
Stephen Ong
 
Testing a Claim About a Mean
Testing a Claim About a MeanTesting a Claim About a Mean
Testing a Claim About a Mean
Long Beach City College
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
Sanjay Basukala
 
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Christian Robert
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
RajaKrishnan M
 
Chapter11
Chapter11Chapter11
Chapter11
Richard Ferreria
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
Eugene Yan Ziyou
 
4 2 continuous probability distributionn
4 2 continuous probability    distributionn4 2 continuous probability    distributionn
4 2 continuous probability distributionn
Lama K Banna
 
Chapter8
Chapter8Chapter8
Probability distribution
Probability distributionProbability distribution
Probability distribution
Rohit kumar
 
Continuous probability Business Statistics, Management
Continuous probability Business Statistics, ManagementContinuous probability Business Statistics, Management
Continuous probability Business Statistics, Management
Debjit Das
 
Binomial Probability Distributions
Binomial Probability DistributionsBinomial Probability Distributions
Binomial Probability Distributions
Long Beach City College
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Sumit Sharma
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
Birinder Singh Gulati
 
Test hypothesis
Test hypothesisTest hypothesis
Test hypothesis
Homework Guru
 

What's hot (20)

Basic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QMBasic of Hypothesis Testing TEKU QM
Basic of Hypothesis Testing TEKU QM
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
Chapter 6 Probability
Chapter 6  ProbabilityChapter 6  Probability
Chapter 6 Probability
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distribution
 
Testing a Claim About a Mean
Testing a Claim About a MeanTesting a Claim About a Mean
Testing a Claim About a Mean
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
 
Different types of distributions
Different types of distributionsDifferent types of distributions
Different types of distributions
 
Chapter11
Chapter11Chapter11
Chapter11
 
02a one sample_t-test
02a one sample_t-test02a one sample_t-test
02a one sample_t-test
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
4 2 continuous probability distributionn
4 2 continuous probability    distributionn4 2 continuous probability    distributionn
4 2 continuous probability distributionn
 
Chapter8
Chapter8Chapter8
Chapter8
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
Continuous probability Business Statistics, Management
Continuous probability Business Statistics, ManagementContinuous probability Business Statistics, Management
Continuous probability Business Statistics, Management
 
Binomial Probability Distributions
Binomial Probability DistributionsBinomial Probability Distributions
Binomial Probability Distributions
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Test hypothesis
Test hypothesisTest hypothesis
Test hypothesis
 

Similar to Ingenious and Severe Tests

Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Long Beach City College
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622Dexen Xi
 
Normal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionNormal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson Distribution
Q Dauh Q Alam
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
MinilikDerseh1
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
HazilahMohd
 
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptxBIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
Payaamvohra1
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
simonithomas47935
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
College of Medicine(University of Malawi)
 
The Signed Rank Test (1).pptx
The Signed Rank Test (1).pptxThe Signed Rank Test (1).pptx
The Signed Rank Test (1).pptx
SACHINS902817
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptx
dangwalakash07
 
probability assignment help
probability assignment helpprobability assignment help
probability assignment help
Statistics Homework Helper
 
1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities
Dr Fereidoun Dejahang
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
Eugene Yan Ziyou
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
Statistics Assignment Help
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Ppt1
Ppt1Ppt1
Ppt1
kalai75
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)
Ramnath Takiar
 
Practice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing SolutionPractice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing Solution
Long Beach City College
 

Similar to Ingenious and Severe Tests (20)

Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
 
CrashCourse_0622
CrashCourse_0622CrashCourse_0622
CrashCourse_0622
 
Normal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson DistributionNormal Distribution, Binomial Distribution, Poisson Distribution
Normal Distribution, Binomial Distribution, Poisson Distribution
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
 
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptxBIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
BIOSTATISTICS T TEST Z TEST F TEST HYPOTHESIS TYPES OF ERROR.pptx
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Hmisiri nonparametrics book
Hmisiri nonparametrics bookHmisiri nonparametrics book
Hmisiri nonparametrics book
 
The Signed Rank Test (1).pptx
The Signed Rank Test (1).pptxThe Signed Rank Test (1).pptx
The Signed Rank Test (1).pptx
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptx
 
probability assignment help
probability assignment helpprobability assignment help
probability assignment help
 
1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities1615 probability-notation for joint probabilities
1615 probability-notation for joint probabilities
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Probability and Statistics Assignment Help
Probability and Statistics Assignment HelpProbability and Statistics Assignment Help
Probability and Statistics Assignment Help
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Ppt1
Ppt1Ppt1
Ppt1
 
Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)Test of significance (t-test, proportion test, chi-square test)
Test of significance (t-test, proportion test, chi-square test)
 
Practice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing SolutionPractice Test 4A Hypothesis Testing Solution
Practice Test 4A Hypothesis Testing Solution
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
jemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
jemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
jemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
jemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
jemille6
 
What's the question?
What's the question? What's the question?
What's the question?
jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
jemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
jemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
jemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
jemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
jemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
jemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 

Recently uploaded

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Ingenious and Severe Tests

  • 1. Meeting 2 (May 28) Tour I Ingenious and Severe Tests Tour I Ingenious and Severe Tests p. 119 1
  • 2. But first, a bit on the goals of our seminar This research seminar is intended as a lead-up to the workshop: The Statistics Wars and Their Casualties: https://phil-stat-wars.com/2020/02/16/99/. • to galvanize you to contribute to the ongoing evidence-policy controversies now taking place. 2
  • 3. 3
  • 4. Some Questions • What is the “replication crisis”? Are we in one? (2010-20) • Should replication be required? • Should we change “perverse incentives”? (open science, badges, preregistration) • Do data dredging and P-hacking alter the import of data? Always? Sometimes?) • Should P-values be redefined (lowered)? Banned/ replaced with (confidence intervals, Bayes factors) • Should we stop saying “significant”? 4
  • 5. American Statistical Society (ASA): Statement on P-values “The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions. …. much confusion and even doubt about the validity of science is arising. Such doubt can lead to radical choices such as…to ban P- values…(ASA 2016) Is philosophy of science relevant? 5
  • 6. I was a philosophical observer at the ASA P-value “pow wow” 6
  • 7. “Don’t throw out the error control baby with the bad statistics bathwater” The American Statistician 7
  • 8. 8 The [ASA] is to be credited with opening up a discussion into p-values; …We should oust recipe-like uses of P-values that have long been lampooned, but without understanding their valuable (if limited) roles, there’s a danger of blithely substituting “alternative measures of evidence” that throw out the error control baby with the bad statistics bathwater”. They only give an informal treatment based on a popular variant on simple (Fisherian) tests (NHST)
  • 9. A bit of history: Where are members of our cast of characters in 1919? Fisher In 1919, Fisher accepts a job as a statistician at Rothamsted Experimental Station. • A more secure offer by Karl Pearson (KP) required KP to approve everything Fisher taught or published • A subsistence farmer 9
  • 11. Neyman In 1919 Neyman is living a hardscrabble life in Poland, sent to jail for a short time for selling matches for food, • Sent to KP in 1925 to have his work appraised. 11
  • 12. Pearson Pearson (Egon) gets his B.A. in 1919. 12 He describes the psychological crisis he’s going through when Neyman arrives in London: “I was torn between conflicting emotions: a. finding it difficult to understand R.A.F., b. hating [Fisher] for his attacks on my paternal ‘god,’ c. realizing that in some things at least he was right” (Reid, C. 1997, p. 56).
  • 13. N-P Tests: Putting Fisherian Tests on a Logical Footing After Neyman’s year at University College (1925/6), Pearson is suddenly “smitten” with doubts due to Fisher For the Fisherian simple or “pure” significance test, alternatives to the null “lurk in the undergrowth but are not explicitly formulated probabilistically” (Mayo and Cox 2006, p. 81). 13
  • 14. So What’s in a Test? (p. 129-130): We proceed by setting up a specific hypothesis to test, H0 in Neyman’s and my terminology, the null hypothesis in R. A. Fishers…in choosing the test, we take into account alternatives to H0 which we believe possible or at any rate consider it most important to be on the look out for….: Step 1. We must first specify the set of results Step 2. We then divide this set by a system of ordered boundaries …such that as we pass across one boundary and proceed to the next, we come to a class of results which makes us more and more inclined on the information available, to reject the hypothesis tested in favour of alternatives which differ from it by increasing amounts. 14
  • 15. Step 3. We then, if possible, associate with each contour level the chance that, if H0 is true, a result will occur in random sampling lying beyond that level…. In our first papers [in 1928] we suggested that the likelihood ratio criterion, λ, was a very useful one… Thus Step 2 proceeded Step 3. In later papers [1933-1938] we started with a fixed value for the chance, ε, of Step 3… However, although the mathematical procedure may put Step 3 before 2, we cannot put this into operation before we have decided, under Step 2, on the guiding principle to be used in choosing the contour system. That is why I have numbered the steps in this order. (Egon Pearson 1947, p. 143) 15
  • 16. 16
  • 17. They were not intended to be used as Accept-Reject Routines (behavioristic performance) ‘[U]nlike Fisher, Neyman and Pearson (1933, p. 296) did not recommend a standard level but suggested that ‘ how the balance [between the two kinds of error] should be struck must be left to the investigator’” (Lehmann 1993b, p. 1244). Neyman and Pearson stressed that the tests were to be “used with discretion and understanding” depending on the context (Neyman and Pearson 1928, p. 58). 17
  • 18. Water Plant (SIST p. 142) 1-sided normal testing H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 100) let significance level α = .025 Reject H0 whenever M ≥ 150 + 2σ/√n: M ≥ 152 M is the sample mean, !𝑋, its value is M0. 1SE = σ/√n = 1 18
  • 19. Rejection rules: Reject iff M > 150 + 2SE (N-P) In terms of the P-value: Reject iff P-value ≤ .025 (Fisher) (P-value a distance measure, but inverted) Let M = 152, so I reject H0. 19
  • 20. SOME P –VALUES Let M = 152 Z = (152 – 150)/1 = 2 Z = (mean obs – H0)/1 = 2 The P-value is Pr(Z > 2) = .025 20
  • 21. SOME P –VALUES Let M = 151 Z = (151 – 150)/1 = 1 The P-value is Pr(Z > 1) = .16 21
  • 22. SOME P –VALUES Let M = 150.5 Z = (150.5 – 150)/1 = .5 The P-value is Pr(Z > .5) = .3 22
  • 23. SOME P –VALUES Let M = 150 Z = (150 – 150)/1 = 0 The P-value is Pr(Z > 0) = .5 (important benchmark) 23
  • 24. Major Criticism: P-values Don’t Measure an Effect Size! “A p-value, or statistical significance, does not measure the size of an effect or the importance of a result”. (ASA Statement) True, a P-value is a probability. 24
  • 25. Reformulation Severity function: SEV(Test T, data x, claim C) • Tests are reformulated in terms of a discrepancy γ from H0 • Instead of a binary cut-off (significant or not) the particular outcome is used to infer discrepancies that are or are not warranted 25
  • 26. H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 100) The usual test infers there’s an indication of some positive discrepancy from 150 because Pr(M < 152: H0) = .97 Not very informative Are we warranted in inferring μ > 153 say? 26
  • 27. • Note: Our inferences are not to point values, but we agree to the need to block inferences to discrepancies beyond those warranted with severity. 27
  • 28. Consider: How severely has μ > 153 passed the test? SEV(μ > 153 ) (p. 143) M = 152, as before, claim C: μ > 153 The data “accord with C” but there needs to be a reasonable probability of a worse fit with C, if C is false Pr(“a worse fit”; C is false) Pr(M ≤ 152; μ ≤ 153) Evaluate at μ = 153, as the prob is greater for μ < 153. 28
  • 29. Consider: How severely has μ > 153 passed the test? To get Pr(M ≤ 152: μ = 153), standardize: Z = √100 (152- 153)/1 = -1 Pr(Z < -1) = .16 Terrible evidence 29
  • 30. Consider: How severely has μ > 150 passed the test? To get Pr(M ≤ 152: μ = 150), standardize: Z = √100 (152- 150)/1 = 2 Pr(Z < 2) = .97 Notice it’s 1 – P-value 30
  • 31. Now consider SEV(μ > 150.5) (still with M = 152) Pr (A worse fit with C; claim is false) = .97 Pr(M < 152; μ = 150.5) Z = (152 – 150.5) /1 = 1.5 Pr (Z < 1.5)= .93 Fairly good indication μ > 150.5 31
  • 32. 32 O Table 3.1 O Table 3.1
  • 33. FOR PRACTICE: Now consider SEV(μ > 151) (still with M = 152) Pr (A worse fit with C; claim is false) = __ Pr(M < 152; μ = 151) Z = (152 – 151) /1 = 1 Pr (Z < 1)= .84 33
  • 34. MORE PRACTICE: Now consider SEV(μ > 152) (still with M = 152) Pr (A worse fit with C; claim is false) = __ Pr(M < 152; μ = 152) Z = 0 Pr (Z < 0)= .5–important benchmark Terrible evidence that μ > 152 Table 3.2 has exs with M = 153. 34
  • 35. Example for you (change sample size) 1-sided normal testing H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 25) let significance level α = .025 Reject H0 whenever M ≥ 150 + 2σ/√n: Assess SEV for the claims in Table 3.1 (p. 144) 35
  • 36. Criticism: a P-value with a large sample size may indicate a trivial discrepancy or effect size • Fixing the P-value, increasing sample size n, the cut-off gets smaller • Get to a point where x is closer to the null than various alternatives • Many would lower the P-value requirement as n increases-can always avoid inferring a discrepancy beyond what’s warranted: 36
  • 37. Severity tells us: • an α-significant difference indicates less of a discrepancy from the null if it results from larger (n1) rather than a smaller (n2) sample size (n1 > n2 ) • What’s more indicative of a large effect (fire), a fire alarm that goes off with burnt toast or one that doesn’t go off unless the house is fully ablaze? • [The larger sample size is like the one that goes off with burnt toast] 37
  • 38. Compare n = 100 with n = 10,000 H0: μ ≤ 150 vs. H1: μ > 150 (Let σ = 10, n = 10,000) Reject H0 whenever M ≥ 2SE: M ≥ 150.2 M is the sample mean (significance level = .025) 1SE = σ/√n = 10/√10,000 = .1 Let M = 150.2, so I reject H0. (return to in Meeting 4) 38
  • 39. Comparing n = 100 with n = 10,000 Reject H0 whenever M ≥ 2SE: M ≥ 150.2 SEV10,000(μ > 150.5) = 0.001 Z = (150.2 – 150.5) /.1 = -.3/.1 = -3 P(Z < -3) = .001 Corresponding 95% CI: [0, 150.4] A .025 result is terrible indication μ > 150.5 When reached with n = 10,000 While SEV100(μ > 150.5) = 0.93 39
  • 40. Fallacies of Non-rejection. Let M = 151, the test does not reject H0. We want to be alert to a fallacious interpretation of a “negative” result: inferring there’s no positive discrepancy from μ = 150. Is there evidence of compliance? μ ≤ 150? The data “accord with” H0, but what if the test had little capacity to have alerted us to discrepancies from 150? 40
  • 41. No evidence against H0 is not evidence for it (Postcard). We need to consider Pr(X > 151; 150), which is only .16. 41
  • 42. Computation for SEV(T, M = 151, C: μ ≤ 150) Z = (151 – 150)/1 = 1 Pr(Z > 1) = .16 SEV(C: μ ≤ 150) = low (.16). • So there’s poor indication of H0 42
  • 43. Can they say M = 151 is a good indication that μ ≤ 150.5? No, SEV(T, M = 151, C: μ ≤ 150.5) = ~.3. [Z = 151 – 150.5 = .5] But M = 151 is a good indication that μ ≤ 152 [Z = 151 – 152 = -1; Pr (Z > -1) = .84 ] SEV(μ ≤ 152) = .84 It’s an even better indication μ ≤ 153 (Table 3.3, p. 145) [Z = 151 – 153 = -2; Pr (Z > -2) = .97 ] 43
  • 44. Retire Statistical Significance? “researchers have been warned that a statistically non-significant result does not ‘prove’ the null hypothesis” “…. we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05. (Amrhein et al., 2019) Agreed 44
  • 45. Neyman’s fault? The term “ acceptance,” Neyman tells us, was merely shorthand: “ The phrase ‘ do not reject H’ is longish and cumbersome . . . My own preferred substitute for ‘ do not reject H’ is ‘ no evidence against H is found’” (Neyman 1976, p. 749). But he also set upper bounds (confidence intervals, power analysis) 45
  • 46. FEV: Frequentist Principle of Evidence; Mayo and Cox (2006); SEV: Mayo 1991, Mayo and Spanos (2006) FEV/SEV A small P-value indicates discrepancy γ from H0, if and only if, there is a high probability the test would have resulted in a larger P-value were a discrepancy as large as γ absent. FEV/SEV A moderate P-value indicates the absence of a discrepancy γ from H0, only if there is a high probability the test would have given a worse fit with H0 (i.e., a smaller P-value) were a discrepancy γ present. 46
  • 47. Frequentist Evidential Principle: FEV FEV (i). x is evidence against H0 (i.e., evidence of discrepancy from H0), if and only if the P-value Pr(d > d0; H0) is very low (equivalently, Pr(d < d0; H0)= 1 - P is very high). 47
  • 48. Contraposing FEV(i) we get our minimal principle FEV (ia) x are poor evidence against H0 (poor evidence of discrepancy from H0), if there’s a high probability the test would yield a more discordant result, if H0 is correct. Note the one-directional ‘if’ claim in FEV (1a) (i) is not the only way x can be BENT. 48
  • 49. P-value “moderate” (non-significance) FEV(ii): A moderate p value is evidence of the absence of a discrepancy γ from H0, only if there is a high probability the test would have given a worse fit with H0 (i.e., smaller P- value) were a discrepancy γ to exist. In the Neyman-Pearson theory of tests, the sensitivity of a test is assessed by the notion of power, defined as the probability of reaching a preset level of significance …for various alternative hypotheses. In the approach adopted here the assessment is via the distribution of the random variable P, again considered for various alternatives (Cox 2006, p. 25) 49
  • 50. Π(γ): “sensitivity function” Computing Π(γ) views the P-value as a statistic. Π(γ) = Pr(P < pobs; µ0 + γ). The alternative µ1 = µ0 + γ. Given that P-value inverts the distance, it is less confusing to write Π(γ) Π(γ) = Pr(d > d0; µ0 + γ). Compare to the power of a test: POW(γ) = Pr(d > cα; µ0 + γ) the N-P cut-off cα. 50
  • 51. The following slides are extra FEV(ii) in terms of Π(γ) P-value is modest (not small): Since the data accord with the null hypothesis, FEV directs us to examine the probability of observing a result more discordant from H0 if µ = µ0 +γ: If Π(γ) = Pr(d > d0; µ0 + γ) is very high, the data indicate that µ< µ0 +γ. Here Π(γ) gives the severity with which the test has probed the discrepancy γ. 51
  • 52. FEV (ia) in terms of Π(γ) If Π(γ) = Pr(d > do; µ0 +γ) = moderately high (greater than .3, .4, .5), then there’s poor grounds for inferring µ > µ0 + γ. This is equivalent to saying the SEV(µ > µ0 +γ) is poor. 52