Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013

Testing a Point Null Hypothesisi: The
Irreconcilability of P Values and Evidence

JAMES O.BERGER and THOMAS SELLKE

25.02.2013

JAMES O.BERGER and THOMAS SELLKE Testing a Point Null Hypothesisi: The Irreconcilability of P Val

Content

1 INTRODUCTION


Content

1 INTRODUCTION
2 POSTERIOR PROBABILITIES AND ODDS


Content

1 INTRODUCTION
3 LOWER BOUNDS ON POSTERIOR PROBABILI-
TIES
Introduction
Lower Bounds for GA ={All Distributions}
Lower Bounds for GS ={Symmetric Distributions}
Lower Bounds for GUS ={Unimodal,Symmetric Distributions}
Lower Bounds for GNOR ={Normal Distributions}


Content

1 INTRODUCTION
TIES
Introduction
4 MORE GENERAL HYPOTHESES AND CONDITIONAL
CALCULATOINS
General Formulation
More General Hypotheses


Content

1 INTRODUCTION
TIES
Introduction
4 MORE GENERAL HYPOTHESES AND CONDITIONAL
CALCULATOINS
General Formulation
More General Hypotheses
5 CONCLUSIONS


1. Introduction

The paper studies the problem of testing a point null hypothesis,
of interest is the relationship between the P value and conditional
and Bayesian measures of evidence against the null hypothesis

∗ The overall conclusion is that P value can be highly
misleading measures of the evidence provided by le data
against the null hypothesis


1. Introduction

Consider the simple situation of observing a random quantity
X having density f (x | θ) , θ ⊂ R 1 , it is desired to test the
null hypothesis H0 : θ = θ0 versus the alternative hypothesis
H1 : θ = θ0 .

p = Prθ=θ0 (T (X ) ≥ T (x))


1. lntroduction

Example

Suppose that X = (X1 , ......., Xn ) where the Xi are iid N(θ, σ 2 )
Then the usual test statistic is
√
¯
T (X ) = n | X − θ0 | /σ
¯
where X is the sample mean, and

p = 2(1 − Φ(t))

where Φ is the standard normal cdf and
√
t = T (x) = n | x − θ0 | /σ
¯


1. Introduction

We presume that the classical approcach is the report of p,
rather than the report of a Neyman-Perason error probability.
This is because
Most statistician prefer use of P values, feeling it to be impor-
tant to indicate how strong the evidence against H0 is .
The alternative measures of evidence we consider are based on
knowledge of x.
There are several well-known criticisms of testing a point null
hypothesis.
One is the issue of ’statistical’ versus ’practical’ signiﬁcance,
that one can get a very small p even when | θ − θ0 | is so small
as to make θ equivalent to θ0 for practical purposes.
Another well known is ’Jeﬀrey’s paradox’


1. Introduction

Example
Consider a Bayesian who chooses the prior distribution on θ, which
gives probability 0,5 to H0 and H1 and spreads mass out on H1
according to an N(θ, σ 2 ) density. It will be seen in Section 2 that
the posterior probability, Pr(H0 | x), of H0 is given by

Pr (H0 | x) = (1 + (1 + n)−1/2 exp{t 2 /[2[(1 + 1/n)]})−1


1. Introduction

Table 1 : Pr(H0 | x) for Jeﬀreys-Type Prior)

n
p t 1 5 10 20 50 100 1,000
.10 1.645 .42 .44 .47 .56 .65 .72 .89
.05 .1.960 .35 .33 .37 .42 .52 .60 .82
.01 .2.576 .21 .13 .14 .16 .22 .27 .53
.001 3.291 .086 .026 .024 .026 .034 .045 .124

The conﬂict between p and Pr(H0 | x) is apparent.


1. Introduction

Example
Again consider a Bayesian who gives each hypothesis prior probabil-
ity 0.5, but now suppose that he decides to spread out the mass on
H1 in the symmetric fashion that is as favorable to H1 as possible.
The corresponding values of Pr (H0 | x) are determined in Section 3
and are given in Table 2 for certain values of t.

Table 2 : Pr(H0 | x) for a Prior Biased Towar H1

P value(p) t Pr (H0 | x)
.10 1.645 .340
.05 .1.960 .227
.01 .2.576 .068
.001 3.291 .0088


1. Introduction

Example (A Likelihood Analysis)
It is common to perceive the comparative evidence provided by x for
two pssible parameter values, θ1 andθ2 , as being measured by the
likelihood ratio

lx (θ1 : θ2 ) = f (x | θ1 )/f (x | θ2 )

A lower bound on the comparative evidence would be

f (x | θ0 )
lx = inf lx (θ0 : θ) = = exp{−t 2 /2}
θ supθ f (x | θ)


1. Introduction

Values of lx for various t are given in Table 3

Table 3 : Bounds on the Comparative Likelihood

Likelihood ratio
P value(p) t lower bound (lx )
.10 1.645 .340
.05 .1.960 .227
.01 .2.576 .068
.001 3.291 .0088


2. Posterior probabilities and odds

let 0< π0 < 1 denote the prior probability of H0 , and let π1 = 1−π0
denote the prior probability of H1 , suppose that the mass on H1 is
spread out according to the density g (θ).

Realistic hypothesis: H0 :| θ − θ0 |≤ b
Prior probability π0 would be assigned to {θ :| θ − θ0 |≤ b}
(To a Bayesian, a point null test is typically reasonable only when
the prior distribution is of this form)



The posterior odds ratio of H0 to H1
Pr (H0 | x) π0 f (x | θ0 )
= ×
1 − Pr (H0 | x) (1 − π0 ) mg (x)
Post odds Prior odds Bayes factor Bg (x)

Interest in the Bayes factor centers around the fact that it does
not involve the prior probabilities of the hypotheses and hence is
sometimes interpreted as the actual odds of the hypotheses implied
by the data alone.


Example (Jeﬀreys-Lindley paradox)
Suppose that π0 is arbitrary and g is again N(θ0 , σ 2 ).Since a suﬃ-
cent statistic for θ is X ¯ N(θ0 , σ 2 /n) ,we have that mg (¯) is an
x
N(θ0 , σ 2 (1 + n−1 )) distribution. Thus

Bg (x) = f (x | θ0 )/mg (¯)
x

[2πσ 2 /n]−2 exp{− n (¯ − θ0 )2 /σ 2 }
2 x
= 2 (1 + n−1 )]−1/2 exp{− 1 (¯ − θ 2 )/[σ 2 (1 + n−1 ]}
[2πσ 2 x 0
1
= (1 + n)1/2 exp{− t 2 /(1 + n−1 )}
2
and
Pr (H0 | x) = [1 + (1 − π0 )/(π0 Bg )]−1
(1 − π0 ) 1
= [1 + (1 + n)−1/2 × exp{ t 2 /(1 + n−1 )}]−1
π0 2

3. Lower bounds on posterior probabilites
3.1 Introduction

This section will examine some lower bounds on Pr (H0 | x)
when g (θ), the distribution of θ given that H1 is true is
allowed to vary within some class of distribitions G
GA ={all distributions}
GS ={all distributions symmetric about θ0 }
GUS ={all unimodal distribution symmetric about θ0 }
GNOR ={all N(θ0 , τ 2 )distributions, 0≤ τ 2 < ∞}
Even though these G’s are supposed to consist only of distribution
on {θ | θ = θ0 }, il will be convenient to allow them to include
distributions with mass at θ0 , so the lower bounds we compute are
always attained.


3.1 Introduction

Letting
Pr (H0 | x, G ) = inf Pr (H0 | x)
g ∈G

and
B(xmG ) = inf Bg (x)
g ∈G

we see immediately form formulas before that

B(x, G ) = f (x | θ0 )/ sup mg (x)
g ∈G

and
(1 − π0 ) 1
Pr (H0 | xmG ) = [1 + × ]−1
π0 B(x, G )


3.2 Lower bounds for GA ={All distributions}

Theorem
Suppose that a maximum likelihood estimate of θ0 , exists for the
observed x. Then
ˆ
B(x, GA ) = f (x | θ0 )/f (x | θ(x))

and
ˆ
(1 − π0 ) f (x | θ(x)) −1
Pr (H0 | x, GA ) = [1 + × ]
π0 f (x | θ0 )



Example
In this situation,we have
2
B(x, GA ) = e −t /2
¯

and
(1 − π0 ) t 2 /2 −1
Pr (H0 | x, GA ) = [1 + e ]
π0

For servral choices of t, Table 4 gives the corresponding two-sided
P values,p, and the values of Pr (H0 | x, GA ),with π0 = 0.5.



For servral choices of t, Table 4 gives the corresponding
two-sided P values,p, and the values of Pr (H0 | x, GA ),with
π0 = 0.5.

Table 4 : Comparison of P values and Pr (H0 | x, GA ) when π0 = 0.5

P value(p) t Pr (H0 | x, GA ) Pr (H0 | x, GA )/(pt)
.10 1.645 .205 1.25
.05 .1.960 .128 1.30
.01 .2.576 .035 1.36
.001 3.291 .0044 1.35



Theorem
For t > 1.68 and π0 = 0.5 in Example 1,

Pr (H0 | x, GA )/pt > π/2 1.253

Furthermore
lim Pr (H0 | x, GA )/pt = π/2
t→∞


3.3 Lower bounds for GS ={Symmetric distributions}

There is a large gap between Pr (H0 | x, GA )andPr (H0 | x) for
the Jeﬀreys-type single prior analysis.This reinforces the
suspicion that using GA unduly biases the conclusion against
H0 and suggests use of more reasonable classes of priors.
Theorem
sup mg (x) = sup mg (x),
g ∈G2ps g ∈GS
so
B(x, G2PS ) = B(x, GS )
and
Pr (H0 | x, G2ps ) = Pr (H0 | x, GS )



Example
If t ≤ 1, a calculus argument show that the symmetric two point
distribution that strictly maximizes mg (x) is the degenerate ”two-
point”distribution putting all mass at θ0 . Thus B(x, GS ) = 1 and
Pr (H0 | x, GS ) = π0 for t ≤ 1.
If t ≥ 1 , then mg (x) is maximized by a nondegenerate element
of G2ps . For moderately large t, the maximum value of mg (x) for
g∈ G2ps is very well approximated by taking g to be the two-point
ˆ ˆ
distribution putting equal mass at θ(x) and at 2θ − θ(x). so

ϕ(t)
B(x, GS ) 2 exp {−0.5t 2 }
0.5ϕ(0) + 0.5ϕ(2t)



Example
For t ≤ 1.645, the first approximation is accurate to within 1 in the
fourth significant digit and the second approximation to within 2 in
the third significant digit.
Table 5 gives the value of Pr (H0 | x, Gs ) of several choices of t.

Table 5 : Comparison of P values and Pr (H0 | x, GS ) when π0 = 0.5

P value(p) t Pr (H0 | x, GS ) Pr (H0 | x, GS )/(pt)
.10 1.645 .340 2.07
.05 .1.960 .227 2.31
.01 .2.576 .068 2.62
.001 3.291 .0088 2.68


3.4 Lower bounds for GUS ={Unimodal, Symmetric distributions}

Minimizing Pr (H0 | x) over all symmetric priors still involves
considerable bias against H0 . A further ’objective’ restriction,
which would seem reasonable to many, is to require the prior
to be unimodal, or non-increasing in | θ − θ0 | .
Theorem
sup mg (x) = sup mg (x),
g ∈Gus g ∈US

with US ={all symmetric uniform distributions}
so B(x, GUS ) = B(x, US ) and Pr (H0 | x, GUS ) = Pr (H0 | x, US )



Theorem
If t ≤ 1 in example 1, then B(x, GUS ) = 1 and Pr (H0 | x.GUS ) = π0 .
Ift > 1 then
2ϕ(t)
B(x.GUS ) =
ϕ(K + t) + ϕ(K − t)

and

(1 − π0 ) ϕ(K + t) + ϕ(K = t) −1
Pr (H0 | x, GUS ) = [1 + × ]
π0 2ϕ(t)
where K > 0

Figures 1 and 2 give values of K and B for various val-
ues of t in this problem



Table 6 gives Pr (H0 | x, GUS ) for some speciﬁc important
values of t and π0 = 0.5

Table 6 : Comparison of P values and Pr (H0 | x, GUS ) when π0 = 0.5

P value(p) t Pr (H0 | x, GUS ) Pr (H0 | x, GUS )/(pt)
.10 1.645 .390 1.44
.05 .1.960 .290 1.51
.01 .2.576 .109 1.64
.001 3.291 .018 1.66


3.5 Lower bounds for GNOR ={Normal distributions}

We have seene that minimizing Pr (H | x)overg ∈ GUS is the same
as minimizing over g ∈ US . Althought using US is much more
reasonable than using GA ,there is still some residual bias against
H0 involved in using US .
Theorem
If t ≤ 1in Example 1, then B(x, GNOR ) = 1 and Pr (H0 | x, GNOR ) =
π0 . If t > 1, then
√ 2
B(x, GNOR ) = ete −t /2

and
(1 − π0 ) exp{t 2 /2} −1
Pr (H0 | x, GNOR ) = [1 + × √ ]
π0 et


3.5 Lower bounds for GNOR ={Normal distributions}

Table 7 gives Pr (H0 | x, GNOR ) for servral values of t

Table 7 : Comparison of P values and Pr (H0 | x, GNOR ) when π0 = 0.5

P value(p) t Pr (H0 | x, GNOR ) Pr (H0 | x, GNOR )/(pt)
.10 1.645 .412 1.52
.05 .1.960 .321 1.67
.01 .2.576 .133 2.01
.001 3.291 .0235 2.18


4. More general hypotheses and conditional calculations
4.1 General formulation

Consider the Bayesian calculation of Pr (H0 | A) , where H0 is
of the form H0 : θ ∈ Θ0 and A is the set in which x is known
to reside. Then letting π0 and π1 denote the prior
probabilities of H0 and H1 and g1 and g2 as the densities on
Θ0 and Θ1 . Then we have :

1 − π0 mg 1 (A) −1
Pr (H0 | A) = [1 + × ]
π+0 mg 0 (A)

where
mg 1 (A) = Prθ (A)gi (θ)dθ
Θ0


4.1 General formulation

For the general formulation, one can determine lower bounds
on Pr (H0 | A) by choosing sets G0 and G1 of g0 and g1 ,
respectively, calculating

B(A, G0 , G1 ) = inf mg0 (A)/ sup mg 1 (A)
g0 ∈G0 g1 ∈G1

(1−π0 ) 1 −1
and deﬁning Pr (H0 | A, G0 , G1 ) = [1 + π0 × B(A,G0 ,G1 ) ]


4.2 More general hypotheses

Assume in this section that A= {x}. The lower bounds can
be applied to a variety of generalizations of point null
hypotheses. If Θ0 is a small set about θ0 , the negeral lower
bounds turn out to be essentially equivalent to the point null
lower bounds.

In example 1, suppose that the hypotheses were H0 : θ ∈ (θ0 −
√
b, θ0 + b) and H1 : θ ∈ (θ0 − b, θ0 + b) If | t − nb/σ |≥ 1
/
and G0 = G1 = GS , then B(x, G0 , G1 ) and Pr (H0 | x, G0 , G1 ) are
exactly the same as B and P for testing the point null.


5. Conclusion

B(x, GUS ) and the P value calculated at (t − 1)+ in instead of t,
this last will be called the P value of (t − 1)+ Figure shows that
this comparative likelihood is close to the P value that would be
obtained if we replaced t by (t − 1)+ . The implication is that :
t = 1 means only mild evidence against H0 , t = 2 means
significant evidence against H0 , and t = 3 means highly evidence
against H0 , and t = 4 means over whelming evidence against H0 ,
should at least replaced by the rule of thumb, that is :
t = 1 means no evidence against H0 , t = 2 means only mild
evidence against H0 , and t = 3 means significant evidence against
H0 , and t = 4 means highly significant evidence against H0 .


5. Conclusion

t = 1 means no evidence against H0 , t = 2 means only mild
evidence against H0 , and t = 3 means signiﬁcant evidence against
H0 , and t = 4 means highly signiﬁcant evidence against H0 .


References

EDWARDS, W, LINDMAN, H, and SAVAGE, L.J (1963).
Bayesian statistical inference for psychological research. Psycho-
logical Review 70, 193-242
Jayanta Ghosha, Sumitra Purkayastha and Tapas Samanta
Role of P-values and other Measures of Evidence in Bayesian Anal-
ysis. Handbook of Statistics, Vol. 25
James O. Berger Statistical Decision Theory and Bayesian Analy-
sis. Springer Series in Statistics


Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013

Similar to Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013 (20)

More from Christian Robert

More from Christian Robert (20)

Recently uploaded

Recently uploaded (20)

Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013