SlideShare a Scribd company logo
1 of 15
Download to read offline
PHIL 6334 - Probability/Statistics Lecture Notes 5:
Post-data severity evaluation
Aris Spanos [Spring 2014]
1 Introduction
Fallacies of Acceptance and Rejection
How is one supposed to interpret accept or reject the null?
I Unfortunately, in fields like econometrics ‘accept 0’ is
routinely, but erroneously, interpreted as ‘data x0 provide ev-
idence for 0’, and ‘reject 0’ is routinely but erroneously
interpreted as ‘data x0 provide evidence for some alternative
1’.
The problem is that neither of these evidential claims can be
justified, since they are both vulnerable to two classic fallacies.
(a) The fallacy of acceptance: no evidence against 0 is mis-
interpreted as evidence for 0.
This fallacy can easily arise in cases where the test in ques-
tion has low power to detect discrepancies of interest, e.g.
small sample size .
(b) The fallacy of rejection: evidence against 0 is misinter-
preted as evidence for a particular 1.
This fallacy can easily arise in cases where the power of
a test is very high, e.g. the case of a very large sample size
 This renders N-P rejections, as well as tiny p-values, with
large  highly susceptible to this fallacy.
In the statistics literature, as well as in the secondary liter-
atures in several applied fields, there have been numerous at-
tempts to circumvent these two fallacies, but none succeeded.
1
The first successful attempt was made by Mayo (1996) by in-
troducing the notion of a post-data severity evaluation.
2 The post-data severity evaluation
2.1 The notion of post-data severity
The post-data severity assessment aims to supplement fre-
quentist testing with a view to bridge the gap between the p-
value and the accept/reject rules on one hand, and providing
evidence for or against a hypothesis in the form of the dis-
crepancy  from the null warranted by data x0, on the other.
I Its key difference from the Bayesian and likelihoodist
approaches to testing is that it takes into account the generic
capacity of the test in establishing .
I The intuition behind this notion is that a rejection of
0 using a less (more) powerful test provides better (worse)
evidence for a departure from 0. Similarly, an acceptance of
0 using a less (more) powerful test provides worse (better)
evidence for no departure from 0
The severity evaluation is a post-data appraisal of the ac-
cept/reject and p-value results with a view to provide an evi-
dential interpretation. It can be used to address not only the
fallacies or acceptance and rejection and several additional
criticisms of N-P testing. The discussion that follows relies
heavily on Mayo and Spanos (2006).
¥ A hypothesis  passes a severe test  with data x0 if:
(S-1) x0 accords with , and
(S-2) with very high probability, test  would have produced
a result that accords less well with  than x0 does, if  were
false.
Severity can be viewed as an feature of a test  as it re-
2
lates to a particular data x0 and a specific claim  being
considered. Hence, the severity function has three arguments,
 ( x0 ) denoting the severity with which  passes 
with x0.
Example 1. Let us assume that the appropriate statisti-
cal model for data x0 is the simple (one parameter) Normal
model, where  is known (table 1).
Table 1 - Simple Normal (one parameter) Model
Statistical GM: = +  ∈N={1 2 }
[1] Normality:  v N( ) ∈R
[2] Constant mean: ()=
[3] Constant variance:  ()=2
[known]
⎫
⎬
⎭
∈N.
[4] Independence: { ∈N} independent process
Let us consider the hypotheses of interest:
0 : =0 vs. 1 :   0 (1)
in the context of the simple Normal model (table 1). The
optimal (UMP) test for these hypotheses is:
={(X)=
√
(−0)

 1()={x : (x)  } (2)
where =1

P
=1   is the threshold rejection value. Given
that:
(X)=
√
(−0)

=0
v N(0 1) (3)
one can evaluate the type I error probability (significance level)
 using:
P((X)  ; 0 true)=
where  is the type I error; 0    1. To evaluate the type II
error probability one needs to know the sampling distribution
of (X) when 0 is false. However, since 0 is false refers to
3
1 :   0 this evaluation will involve all values of  greater
than 0 (i.e. 10) :
(1)=P((X) ≤ ; 0 false)=P((X) ≤ ; =1) ∀(10)
The relevant sampling distribution takes the form:
(X)=
√
(−0)

=1
v N(1 1)
where 1=
√
(1−0)

 for all 10
(4)
To use the Normal tables one needs to transform
√
(−0)

into
√
(−1)

using:
(X)
z }| {√

¡
 − 0
¢

−
1
z }| {√
 (1 − 0)

=
√
(−1)

=1
v N(0 1) for 10
(5)
The power is defined by 1 − (1) :
(1) =P((X)  ; =1)=
=P(
√
(−1)

  −
√
(−1)

; =1)=
=P(   −
√
(−1)

; =1) for all 1≥0
where  is a generic standard Normal r.v., i.e.  v N(0 1)
0=12 =2 =025 (=196) =100
=1−0 1=
√
(1−0)

: (1)=P(
√
(−1)

  − 1; 1)
=1 =5 (121)=P(  196 − 5)=072
=2 =1 (122)=P(  196 − 2)=169
=3 =15 (133)=P(  196 − 3)=323
=5 =25 (135)=P(  196 − 3)=705
=7 =35 (137)=P(  196 − 3)=938
4
2.2 Severity in the case of reject 0
Consider the case where 0=12 =2 =100 =025 (=196)
and =126
Evaluating the test statistic yields:
(x0)=
√
100(126−12)
2
=30
which results in rejecting 0: =12 The p-value confirms the
rejection since:
(x0)=P((X)  (x0); =12)=0013
Evaluating the post-data severity in order the establish the
discrepancy  from the null warranted by test  and data x0
(=126)
(S-1). The severity ‘accordance’ condition (S-1) implies
that:
the rejection of 0=12 with (x0)=30 accords with 1
and the relevant inferential claim is:
  1=0+ for some  ≥ 0 (6)
(S-2). To establish the particular discrepancy  warranted by
data x0, the post-data severity ‘discordance’ condition:
5
"(S-2): with very high probability, test  would have pro-
duced a result that accords less well with 1 than x0 does, if
1 were false."
calls for evaluating the probability of the tail events:
"outcomes x that accord less well with 1 than x0 does",
i.e. [x: (x) ≤ (x0)] giving rise to:
 (;   1) =P((X) ≤ (x0);   1 is false)=
=P((X) ≤ (x0);  ≤ 1 is true)=
=P((X) ≤ (x0); =1)
(7)
To evaluate this probability we need to use the same distri-
bution under the alternative (4) as in the case of the power,
but now instead of using  as the threshold we will use (x0)
and adjust it as in (5)
(x0)−1=
√
(−1)

 (8)
For instance, for a discrepancy =1 the severity evaluation
is:
 (;   1=121) =P(
√
(−121)

≤
√
100(126−121)
2
; =1)
=P( ≤ 25; =1)=994
where  v N(0 1). Similarly, for a discrepancy =5 the
severity evaluation is:
 (;   1=125) =P(
√
(−125)

≤
√
100(126−125)
2
; =1)
=P( ≤ 05; =1)=691
Table 2 reports several such severity evaluations for different
discrepancies =1  10
6
0=12 =2 =100 and =126
Table 2: Reject 0: =12 vs. 1:   12
Relevant claim Severity
 1=[12+] P(x: (X)≤(x0); 1)
10   121 994
20   122 977
30   123 933
344   12344 900
40   124 841
50   125 691
60   126 500
70   127 309
80   128 159
90   129 067
10   130 023
 
7
The idea of using the post-data severity evaluation in the
case of reject 0 is to establish the largest warranted discrep-
ancy  from the null at a certain high threshold, say .90. In
this case the discrepancy is  ≤ 344
I How does the post-data severity evaluation address the
fallacy of rejection? By pointing out the warranted and un-
warranted discrepancies from the null and specifying the rel-
evant inferential claim.
2.3 Severity in the case of accept 0
Consider the case where 0=12 =2 =100 =025 (=196)
and =121
Evaluating the test statistic yields:
(x0)=
√
100(121−12)
2
=5
which results in accepting 0: =12 The p-value confirms
the acceptance since:
(x0)=P((X)  (x0); =12)=309
Let us evaluate the post-data severity in order the establish
the discrepancy from the null warranted by test  and data
x0 yielding =121
(S-1). The severity ‘accordance’ condition (S-1) implies
that:
the acceptance of 0=12 with (x0)=5 accords with 0
and the relevant inferential claim  is:
 ≤ 1=0+ for some  ≥ 0 (9)
(S-2). To establish the particular discrepancy  warranted by
data x0, the post-data severity ‘discordance’ condition:
8
"(S-2): with very high probability, test  would have pro-
duced a result that accords less well with 0 than x0 does, if
0 were false."
calls for evaluating the probability of the tail events:
"outcomes x that accord less well with 0 than x0 does",
i.e. [x: (x)  (x0)] giving rise to:
 (;  ≤ 1) =P((X)  (x0); =0 is false)=
=P((X)  (x0);   0 is true)=
=P((X)  (x0); =1)
(10)
For a discrepancy =1 the severity evaluation is:
 (;  ≤ 1=121) =P(
√
(−121)


√
100(121−121)
2
; =1)
=P(  00; =1)=500
Similarly, for a discrepancy =5 the severity evaluation is:
 (;  ≤ 1=125) =P(
√
(−125)


√
100(121−125)
2
; =1)
=P(  −20; =1)=691
Table 3 reports several such severity evaluations for different
discrepancies = − 3  7
9
0=12 =2 =100 and =121
Table 3: Accept 0: =12 vs. 1:   12
Relevant claim Severity
 ≤1=[12+] P(x: (X)(x0); 1)
−3  ≤ 117 023
−2  ≤ 118 067
−1  ≤ 110 159
0  ≤ 120 309
10  ≤ 121 500
20  ≤ 122 691
30  ≤ 123 841
356  ≤ 12356 900
40  ≤ 124 933
50  ≤ 125 977
60  ≤ 126 994
70  ≤ 127 999
 
10
The idea of using the post-data severity evaluation in the
case of accept 0 is to establish the smallest warranted dis-
crepancy  from the null at a certain high threshold, say .90.
In this case the discrepancy is  ≥ 356
I How does the post-data severity evaluation address the
fallacy of acceptance? By pointing out the warranted and
unwarranted discrepancies from the null and specifying the
relevant inferential claim.
2.4 The large n problem
The large  problem was initially raised by Lindley (1957) in
the context of the simple Normal model (table 1) where the
variance 2
 0 is assumed known, by pointing out:
[a] the large  problem: frequentist testing is susceptible
to the "fallacious" result that there is always a large enough
sample size  for which any point null, say 0: =0, will be
rejected by a frequentist -significance level test.
Lindley claimed that this result is paradoxical because, when
viewed from the Bayesian perspective, one can show:
[b] the Jeffreys-Lindley paradox: for certain choices of
the prior, the posterior probability of 0 given a frequentist
-significance level rejection, will approach 1 as →∞.
Claims [a] and [b] contrast the behavior of a frequentist test
(p-value) and the posterior probability of 0 as →∞, that
highlights a potential for conflict between the frequentist and
Bayesian accounts of evidence.
[c] Bayesian charge: a hypothesis that is well-supported
by Bayes factor can be (misleadingly) rejected by a frequentist
test when  is large; see Berger and Sellke (1987), pp. 112-3.
A paradox? No! From the error statistical perspective:
11
(i) There is nothing fallacious about a small p-value, or a
rejection of 0 when  is large [it is a feature of a consistent
frequentist test].
What is paradoxical is why the posterior probability of 0
as →∞ goes to 1, irrespective of the truth or falsity of 0!
I Hence, the real problem does not lie with the p-value or
the accept/reject rules as such, but with how such results are
transformed into evidence for or against a particular . The
problem arises when such accept/reject results are detached
from the test itself, and are treated as providing the same
evidence for a particular alternative 1, regardless of the the
power of the test in question, which depends crucially on 
The large  problem can be addressed using the post-data
severity evaluation.
 
To illustrate that, consider the case where =025 (=196) =1
and the the observed value of the test statistic in (2) is (x0)=197
In this case data x0 result in rejecting of 0: =12 and the
12
p-value is:
(x0)=P((X)  197; =12)=024
In the traditional accounts of frequentist testing this result
would be interpreted in the same way, irrespective of whether
the sample size was =25 =100 =400. The post-data
severity evaluation, however, takes that into account because
 affects the generic capacity (power) of the test. For instance,
the severity of inferring   121 associated with the same
(x0)=197, will be different for each sample size:
 (; =25;   121) =P( ≤ 197 −
√
25(121−12)
1
)=93
 (; =100;   121) =P( ≤ 197 −
√
100(121−12)
1
)=83
 (; =400;   121) =P( ≤ 197 −
√
400(121−12)
1
)=49
2.5 The problem with the p-value
Viewing the p-value from the severity vantage point, it can be
defined as follows:
‘the p-value is the probability of all possible outcomes x∈R

that accord less well with 0 than x0 does, if 0 were true.
Hence, a small p-value can be related to 1 passing a severe
test because the probability that test  would have produced
a result that accords less well with 1 than x0 does (x: (x) 
(x0)), if 1 were false (0 true):
Sev(; x0; 0) =P((X)  (x0); ≤0) =
=1−P((X)(x0); =0)
is very high, i.e. (x0 is very low.
I Hence, the key problem with the p-value is that is estab-
lishes the existence of some discrepancy  ≥ 0 from the null,
but provides no information concerning its magnitude  The
13
severity evaluation remedies that because it revolves around
the discrepancy  by being evaluated under different values
associated with the inferential claim  ≥ 0 + In this sense,
the p-value can be related to a severity evaluation associated
with the inferential claim  ≥ 0 where the implicit discrep-
ancy is =0, i.e. in the case of the p-value, SEV is implicitly
evaluated under the null!
3 Conclusions
Neither Fisher’s p-value, nor the N-P accept/reject rules can
provide such an evidential interpretation, primarily because
they are vulnerable to two serious fallacies.
(a) Fallacy of acceptance: no evidence against the null is
misinterpreted as evidence for it.
(b) Fallacy of rejection: evidence against the null is misin-
terpreted as evidence for a specific alternative.
These fallacies can be circumvented by supplementing the
accept/reject rules (or the p-value) with a post-data evaluation
of inference based on severe testing with a view to deter-
mined the discrepancy  from the null warranted by data x0
This establishes the inferential claim warranted [and thus, the
unwarranted ones].
The severity assessment enables one to address the crucial
fallacies of acceptance and rejection as well as the potential
arbitrariness and possible abuse of:
[c] switching between one-sided, two-sided or simple-vs-
simple hypotheses,
[d] interchanging the null and alternative hypotheses,
[e] manipulating the level of significance in an attempt to
get the desired testing result,
[f] the relevant p-value.
14
[g] observed confidence intervals vs. severity evaluations.
Doesn’t the post-data severity evaluation change the origi-
nal threshold  with a severity threshold? Aren’t both equally
arbitrary?
No! Any choice that can be discussed in the particular
context between different modelers is neither arbitrary nor
subjective, but it is debatable! The severity curve will provide
all possible discrepancies from the null and the modeler can
decide which threshold is appropriate in each case.
15

More Related Content

What's hot

Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)jemille6
 
C2 st lecture 13 revision for test b handout
C2 st lecture 13   revision for test b handoutC2 st lecture 13   revision for test b handout
C2 st lecture 13 revision for test b handoutfatima d
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)jemille6
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handoutfatima d
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probabilityRanjan Kumar
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMichael Ogoy
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1Padma Metta
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2Padma Metta
 
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?jemille6
 

What's hot (20)

Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
C2 st lecture 13 revision for test b handout
C2 st lecture 13   revision for test b handoutC2 st lecture 13   revision for test b handout
C2 st lecture 13 revision for test b handout
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter5
Chapter5Chapter5
Chapter5
 
A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)A. spanos slides ch14-2013 (4)
A. spanos slides ch14-2013 (4)
 
U unit7 ssb
U unit7 ssbU unit7 ssb
U unit7 ssb
 
Chapter4
Chapter4Chapter4
Chapter4
 
b
bb
b
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 
Sfs4e ppt 06
Sfs4e ppt 06Sfs4e ppt 06
Sfs4e ppt 06
 
Chapter 04 answers
Chapter 04 answersChapter 04 answers
Chapter 04 answers
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Discrete probability
Discrete probabilityDiscrete probability
Discrete probability
 
Practice Test 2 Solutions
Practice Test 2  SolutionsPractice Test 2  Solutions
Practice Test 2 Solutions
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Mean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random VariableMean, variance, and standard deviation of a Discrete Random Variable
Mean, variance, and standard deviation of a Discrete Random Variable
 
Statistical computing 1
Statistical computing 1Statistical computing 1
Statistical computing 1
 
Probability Assignment Help
Probability Assignment HelpProbability Assignment Help
Probability Assignment Help
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?
Excursion 4 Tour II: Rejection Fallacies: Whose Exaggerating What?
 

Similar to A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation

Mayo class #7 March 6 slides
Mayo class #7 March 6 slidesMayo class #7 March 6 slides
Mayo class #7 March 6 slidesjemille6
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testingjemille6
 
Likelihoodist vs. significance tester w bernoulli trials
Likelihoodist vs. significance tester w bernoulli trialsLikelihoodist vs. significance tester w bernoulli trials
Likelihoodist vs. significance tester w bernoulli trialsjemille6
 
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsLong Beach City College
 
Probability and basic statistics with R
Probability and basic statistics with RProbability and basic statistics with R
Probability and basic statistics with RAlberto Labarga
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-testChristina K J
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptxdangwalakash07
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Mayo Slides Feb. 27: Need to Reformulate Tests
Mayo Slides Feb. 27: Need to Reformulate TestsMayo Slides Feb. 27: Need to Reformulate Tests
Mayo Slides Feb. 27: Need to Reformulate Testsjemille6
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptxMinilikDerseh1
 
Chapter4
Chapter4Chapter4
Chapter4Vu Vo
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingRyan Herzog
 
PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestAashish Patel
 
Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344ohenebabismark508
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statisticsTarun Gehlot
 

Similar to A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation (20)

Mayo class #7 March 6 slides
Mayo class #7 March 6 slidesMayo class #7 March 6 slides
Mayo class #7 March 6 slides
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
Likelihoodist vs. significance tester w bernoulli trials
Likelihoodist vs. significance tester w bernoulli trialsLikelihoodist vs. significance tester w bernoulli trials
Likelihoodist vs. significance tester w bernoulli trials
 
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
 
Probability and basic statistics with R
Probability and basic statistics with RProbability and basic statistics with R
Probability and basic statistics with R
 
K.A.Sindhura-t,z,f tests
K.A.Sindhura-t,z,f testsK.A.Sindhura-t,z,f tests
K.A.Sindhura-t,z,f tests
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Ch01_03.ppt
Ch01_03.pptCh01_03.ppt
Ch01_03.ppt
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptx
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Mayo Slides Feb. 27: Need to Reformulate Tests
Mayo Slides Feb. 27: Need to Reformulate TestsMayo Slides Feb. 27: Need to Reformulate Tests
Mayo Slides Feb. 27: Need to Reformulate Tests
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2
 
Chapter4
Chapter4Chapter4
Chapter4
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
PG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z TestPG STAT 531 Lecture 6 Test of Significance, z Test
PG STAT 531 Lecture 6 Test of Significance, z Test
 
Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344Ch 56669 Slides.doc.2234322344443222222344
Ch 56669 Slides.doc.2234322344443222222344
 
Error analysis statistics
Error analysis   statisticsError analysis   statistics
Error analysis statistics
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 

More from jemille6

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 

More from jemille6 (20)

“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 

Recently uploaded

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

A. Spanos Probability/Statistics Lecture Notes 5: Post-data severity evaluation

  • 1. PHIL 6334 - Probability/Statistics Lecture Notes 5: Post-data severity evaluation Aris Spanos [Spring 2014] 1 Introduction Fallacies of Acceptance and Rejection How is one supposed to interpret accept or reject the null? I Unfortunately, in fields like econometrics ‘accept 0’ is routinely, but erroneously, interpreted as ‘data x0 provide ev- idence for 0’, and ‘reject 0’ is routinely but erroneously interpreted as ‘data x0 provide evidence for some alternative 1’. The problem is that neither of these evidential claims can be justified, since they are both vulnerable to two classic fallacies. (a) The fallacy of acceptance: no evidence against 0 is mis- interpreted as evidence for 0. This fallacy can easily arise in cases where the test in ques- tion has low power to detect discrepancies of interest, e.g. small sample size . (b) The fallacy of rejection: evidence against 0 is misinter- preted as evidence for a particular 1. This fallacy can easily arise in cases where the power of a test is very high, e.g. the case of a very large sample size  This renders N-P rejections, as well as tiny p-values, with large  highly susceptible to this fallacy. In the statistics literature, as well as in the secondary liter- atures in several applied fields, there have been numerous at- tempts to circumvent these two fallacies, but none succeeded. 1
  • 2. The first successful attempt was made by Mayo (1996) by in- troducing the notion of a post-data severity evaluation. 2 The post-data severity evaluation 2.1 The notion of post-data severity The post-data severity assessment aims to supplement fre- quentist testing with a view to bridge the gap between the p- value and the accept/reject rules on one hand, and providing evidence for or against a hypothesis in the form of the dis- crepancy  from the null warranted by data x0, on the other. I Its key difference from the Bayesian and likelihoodist approaches to testing is that it takes into account the generic capacity of the test in establishing . I The intuition behind this notion is that a rejection of 0 using a less (more) powerful test provides better (worse) evidence for a departure from 0. Similarly, an acceptance of 0 using a less (more) powerful test provides worse (better) evidence for no departure from 0 The severity evaluation is a post-data appraisal of the ac- cept/reject and p-value results with a view to provide an evi- dential interpretation. It can be used to address not only the fallacies or acceptance and rejection and several additional criticisms of N-P testing. The discussion that follows relies heavily on Mayo and Spanos (2006). ¥ A hypothesis  passes a severe test  with data x0 if: (S-1) x0 accords with , and (S-2) with very high probability, test  would have produced a result that accords less well with  than x0 does, if  were false. Severity can be viewed as an feature of a test  as it re- 2
  • 3. lates to a particular data x0 and a specific claim  being considered. Hence, the severity function has three arguments,  ( x0 ) denoting the severity with which  passes  with x0. Example 1. Let us assume that the appropriate statisti- cal model for data x0 is the simple (one parameter) Normal model, where  is known (table 1). Table 1 - Simple Normal (one parameter) Model Statistical GM: = +  ∈N={1 2 } [1] Normality:  v N( ) ∈R [2] Constant mean: ()= [3] Constant variance:  ()=2 [known] ⎫ ⎬ ⎭ ∈N. [4] Independence: { ∈N} independent process Let us consider the hypotheses of interest: 0 : =0 vs. 1 :   0 (1) in the context of the simple Normal model (table 1). The optimal (UMP) test for these hypotheses is: ={(X)= √ (−0)   1()={x : (x)  } (2) where =1  P =1   is the threshold rejection value. Given that: (X)= √ (−0)  =0 v N(0 1) (3) one can evaluate the type I error probability (significance level)  using: P((X)  ; 0 true)= where  is the type I error; 0    1. To evaluate the type II error probability one needs to know the sampling distribution of (X) when 0 is false. However, since 0 is false refers to 3
  • 4. 1 :   0 this evaluation will involve all values of  greater than 0 (i.e. 10) : (1)=P((X) ≤ ; 0 false)=P((X) ≤ ; =1) ∀(10) The relevant sampling distribution takes the form: (X)= √ (−0)  =1 v N(1 1) where 1= √ (1−0)   for all 10 (4) To use the Normal tables one needs to transform √ (−0)  into √ (−1)  using: (X) z }| {√  ¡  − 0 ¢  − 1 z }| {√  (1 − 0)  = √ (−1)  =1 v N(0 1) for 10 (5) The power is defined by 1 − (1) : (1) =P((X)  ; =1)= =P( √ (−1)    − √ (−1)  ; =1)= =P(   − √ (−1)  ; =1) for all 1≥0 where  is a generic standard Normal r.v., i.e.  v N(0 1) 0=12 =2 =025 (=196) =100 =1−0 1= √ (1−0)  : (1)=P( √ (−1)    − 1; 1) =1 =5 (121)=P(  196 − 5)=072 =2 =1 (122)=P(  196 − 2)=169 =3 =15 (133)=P(  196 − 3)=323 =5 =25 (135)=P(  196 − 3)=705 =7 =35 (137)=P(  196 − 3)=938 4
  • 5. 2.2 Severity in the case of reject 0 Consider the case where 0=12 =2 =100 =025 (=196) and =126 Evaluating the test statistic yields: (x0)= √ 100(126−12) 2 =30 which results in rejecting 0: =12 The p-value confirms the rejection since: (x0)=P((X)  (x0); =12)=0013 Evaluating the post-data severity in order the establish the discrepancy  from the null warranted by test  and data x0 (=126) (S-1). The severity ‘accordance’ condition (S-1) implies that: the rejection of 0=12 with (x0)=30 accords with 1 and the relevant inferential claim is:   1=0+ for some  ≥ 0 (6) (S-2). To establish the particular discrepancy  warranted by data x0, the post-data severity ‘discordance’ condition: 5
  • 6. "(S-2): with very high probability, test  would have pro- duced a result that accords less well with 1 than x0 does, if 1 were false." calls for evaluating the probability of the tail events: "outcomes x that accord less well with 1 than x0 does", i.e. [x: (x) ≤ (x0)] giving rise to:  (;   1) =P((X) ≤ (x0);   1 is false)= =P((X) ≤ (x0);  ≤ 1 is true)= =P((X) ≤ (x0); =1) (7) To evaluate this probability we need to use the same distri- bution under the alternative (4) as in the case of the power, but now instead of using  as the threshold we will use (x0) and adjust it as in (5) (x0)−1= √ (−1)   (8) For instance, for a discrepancy =1 the severity evaluation is:  (;   1=121) =P( √ (−121)  ≤ √ 100(126−121) 2 ; =1) =P( ≤ 25; =1)=994 where  v N(0 1). Similarly, for a discrepancy =5 the severity evaluation is:  (;   1=125) =P( √ (−125)  ≤ √ 100(126−125) 2 ; =1) =P( ≤ 05; =1)=691 Table 2 reports several such severity evaluations for different discrepancies =1  10 6
  • 7. 0=12 =2 =100 and =126 Table 2: Reject 0: =12 vs. 1:   12 Relevant claim Severity  1=[12+] P(x: (X)≤(x0); 1) 10   121 994 20   122 977 30   123 933 344   12344 900 40   124 841 50   125 691 60   126 500 70   127 309 80   128 159 90   129 067 10   130 023   7
  • 8. The idea of using the post-data severity evaluation in the case of reject 0 is to establish the largest warranted discrep- ancy  from the null at a certain high threshold, say .90. In this case the discrepancy is  ≤ 344 I How does the post-data severity evaluation address the fallacy of rejection? By pointing out the warranted and un- warranted discrepancies from the null and specifying the rel- evant inferential claim. 2.3 Severity in the case of accept 0 Consider the case where 0=12 =2 =100 =025 (=196) and =121 Evaluating the test statistic yields: (x0)= √ 100(121−12) 2 =5 which results in accepting 0: =12 The p-value confirms the acceptance since: (x0)=P((X)  (x0); =12)=309 Let us evaluate the post-data severity in order the establish the discrepancy from the null warranted by test  and data x0 yielding =121 (S-1). The severity ‘accordance’ condition (S-1) implies that: the acceptance of 0=12 with (x0)=5 accords with 0 and the relevant inferential claim  is:  ≤ 1=0+ for some  ≥ 0 (9) (S-2). To establish the particular discrepancy  warranted by data x0, the post-data severity ‘discordance’ condition: 8
  • 9. "(S-2): with very high probability, test  would have pro- duced a result that accords less well with 0 than x0 does, if 0 were false." calls for evaluating the probability of the tail events: "outcomes x that accord less well with 0 than x0 does", i.e. [x: (x)  (x0)] giving rise to:  (;  ≤ 1) =P((X)  (x0); =0 is false)= =P((X)  (x0);   0 is true)= =P((X)  (x0); =1) (10) For a discrepancy =1 the severity evaluation is:  (;  ≤ 1=121) =P( √ (−121)   √ 100(121−121) 2 ; =1) =P(  00; =1)=500 Similarly, for a discrepancy =5 the severity evaluation is:  (;  ≤ 1=125) =P( √ (−125)   √ 100(121−125) 2 ; =1) =P(  −20; =1)=691 Table 3 reports several such severity evaluations for different discrepancies = − 3  7 9
  • 10. 0=12 =2 =100 and =121 Table 3: Accept 0: =12 vs. 1:   12 Relevant claim Severity  ≤1=[12+] P(x: (X)(x0); 1) −3  ≤ 117 023 −2  ≤ 118 067 −1  ≤ 110 159 0  ≤ 120 309 10  ≤ 121 500 20  ≤ 122 691 30  ≤ 123 841 356  ≤ 12356 900 40  ≤ 124 933 50  ≤ 125 977 60  ≤ 126 994 70  ≤ 127 999   10
  • 11. The idea of using the post-data severity evaluation in the case of accept 0 is to establish the smallest warranted dis- crepancy  from the null at a certain high threshold, say .90. In this case the discrepancy is  ≥ 356 I How does the post-data severity evaluation address the fallacy of acceptance? By pointing out the warranted and unwarranted discrepancies from the null and specifying the relevant inferential claim. 2.4 The large n problem The large  problem was initially raised by Lindley (1957) in the context of the simple Normal model (table 1) where the variance 2  0 is assumed known, by pointing out: [a] the large  problem: frequentist testing is susceptible to the "fallacious" result that there is always a large enough sample size  for which any point null, say 0: =0, will be rejected by a frequentist -significance level test. Lindley claimed that this result is paradoxical because, when viewed from the Bayesian perspective, one can show: [b] the Jeffreys-Lindley paradox: for certain choices of the prior, the posterior probability of 0 given a frequentist -significance level rejection, will approach 1 as →∞. Claims [a] and [b] contrast the behavior of a frequentist test (p-value) and the posterior probability of 0 as →∞, that highlights a potential for conflict between the frequentist and Bayesian accounts of evidence. [c] Bayesian charge: a hypothesis that is well-supported by Bayes factor can be (misleadingly) rejected by a frequentist test when  is large; see Berger and Sellke (1987), pp. 112-3. A paradox? No! From the error statistical perspective: 11
  • 12. (i) There is nothing fallacious about a small p-value, or a rejection of 0 when  is large [it is a feature of a consistent frequentist test]. What is paradoxical is why the posterior probability of 0 as →∞ goes to 1, irrespective of the truth or falsity of 0! I Hence, the real problem does not lie with the p-value or the accept/reject rules as such, but with how such results are transformed into evidence for or against a particular . The problem arises when such accept/reject results are detached from the test itself, and are treated as providing the same evidence for a particular alternative 1, regardless of the the power of the test in question, which depends crucially on  The large  problem can be addressed using the post-data severity evaluation.   To illustrate that, consider the case where =025 (=196) =1 and the the observed value of the test statistic in (2) is (x0)=197 In this case data x0 result in rejecting of 0: =12 and the 12
  • 13. p-value is: (x0)=P((X)  197; =12)=024 In the traditional accounts of frequentist testing this result would be interpreted in the same way, irrespective of whether the sample size was =25 =100 =400. The post-data severity evaluation, however, takes that into account because  affects the generic capacity (power) of the test. For instance, the severity of inferring   121 associated with the same (x0)=197, will be different for each sample size:  (; =25;   121) =P( ≤ 197 − √ 25(121−12) 1 )=93  (; =100;   121) =P( ≤ 197 − √ 100(121−12) 1 )=83  (; =400;   121) =P( ≤ 197 − √ 400(121−12) 1 )=49 2.5 The problem with the p-value Viewing the p-value from the severity vantage point, it can be defined as follows: ‘the p-value is the probability of all possible outcomes x∈R  that accord less well with 0 than x0 does, if 0 were true. Hence, a small p-value can be related to 1 passing a severe test because the probability that test  would have produced a result that accords less well with 1 than x0 does (x: (x)  (x0)), if 1 were false (0 true): Sev(; x0; 0) =P((X)  (x0); ≤0) = =1−P((X)(x0); =0) is very high, i.e. (x0 is very low. I Hence, the key problem with the p-value is that is estab- lishes the existence of some discrepancy  ≥ 0 from the null, but provides no information concerning its magnitude  The 13
  • 14. severity evaluation remedies that because it revolves around the discrepancy  by being evaluated under different values associated with the inferential claim  ≥ 0 + In this sense, the p-value can be related to a severity evaluation associated with the inferential claim  ≥ 0 where the implicit discrep- ancy is =0, i.e. in the case of the p-value, SEV is implicitly evaluated under the null! 3 Conclusions Neither Fisher’s p-value, nor the N-P accept/reject rules can provide such an evidential interpretation, primarily because they are vulnerable to two serious fallacies. (a) Fallacy of acceptance: no evidence against the null is misinterpreted as evidence for it. (b) Fallacy of rejection: evidence against the null is misin- terpreted as evidence for a specific alternative. These fallacies can be circumvented by supplementing the accept/reject rules (or the p-value) with a post-data evaluation of inference based on severe testing with a view to deter- mined the discrepancy  from the null warranted by data x0 This establishes the inferential claim warranted [and thus, the unwarranted ones]. The severity assessment enables one to address the crucial fallacies of acceptance and rejection as well as the potential arbitrariness and possible abuse of: [c] switching between one-sided, two-sided or simple-vs- simple hypotheses, [d] interchanging the null and alternative hypotheses, [e] manipulating the level of significance in an attempt to get the desired testing result, [f] the relevant p-value. 14
  • 15. [g] observed confidence intervals vs. severity evaluations. Doesn’t the post-data severity evaluation change the origi- nal threshold  with a severity threshold? Aren’t both equally arbitrary? No! Any choice that can be discussed in the particular context between different modelers is neither arbitrary nor subjective, but it is debatable! The severity curve will provide all possible discrepancies from the null and the modeler can decide which threshold is appropriate in each case. 15