1.
Introduction to hypothesis Testing:
Suppose you have to buy cornflakes from a
salesman. The issue is not the price of
cornflakes but the amount of cornflakes in
each box. The salesman appears and claims
that the cornflakes he is selling are packaged
10 oz/box
at
. You have exactly 4
alternative possible views of his claim.
2.
He is honest and
μ = 10 oz
He is conservative and there is more than 10 oz/box;
μ > 10 oz
He is trying to cheat you and there is less than 10 oz/box;
μ < 10 oz
He is new on the job and does not really know the
amount per box; his claim could be high or low, ie
μ ≠ 10 oz.
3.
If you think he is honest you would just go
ahead and order your cornflakes from him.
You may, however, have one the other views,
he is i)CONSERVATIVE or
ii)LIAR or
CLUELESS.
iii)
The position you hold regarding the salesman
can be any one of these but not more than
one. You can’t assume he is liar and
conservative ie μ < 10 oz and μ > 10 oz , at
the same time.
4.
Proper use of scientific method will allow you to
test one of these alternative positions through a
sampling process. Remember you can choose only
one to test.
How would you decide ?
?????
5.
CASE 1: Testing the salesman is conservative
Suppose the salesman is remarkably shy and seems
to lack self confidence. You feel from his general
conduct that he is being conservative in his claim
of 10 oz/box. The situation can be summarized
with a pair of hypothesis – actually a pair of
predictions.
A) The salesman’s claim and the prediction we
will directly test. It is usually called Ho or
null hypothesis. In this case
Ho: μ=10 oz.
6.
B) The second is called the alternative or
research hypothesis which is your belief or
position. The alternative hypothesis in this
case is Ha: μ > 10 oz. By writing the null
hypothesis as Ho: μ ≤ 10 oz. Predictions take
the following forms
Ho: μ ≤ 10 oz (null hypothesis)
Ha: μ > 10 oz (alternative hypothesis)
And we have generated two mutually exclusive
and all-inclusive possibilities. Therefore,
either Ho or Ha will be true, but not both.
7.
Hypothses
A.
B.
Salesman’s claim(Ho)
Customer’s belief or
position (Ha)
8.
In order to test the salesman’s claim (Ho)
against your views (Ha), you decide to do a
small experiment. You select 25 boxes of
cornflakes from a consignment and carefully
empty each box, weigh and record its
contents. This experimental sampling is done
after you have formulated the two
hypotheses. If the first hypothesis were true
you would expect the sample mean of the 25
boxes to be close to or less than 10 oz.
9.
If the second hypothesis were true you would
expect the sample mean to be significantly
greater than 10 oz. We have to think about
what significantly greater means in this
context. In statistics significantly less or
more or different means that the result of
the experiment would be a rare result if the
null hypothesis were true. In other words,
the result is far enough from the prediction
in the null hypothesis that we feel that we
must reject the truthfulness of the
hypothesis.
10.
The idea leads to the problem of what is a rare
result or rare enough result to be sufficiently
suspicious of the null hypothesis. For now we
will say if the result could occur by chance
less than 1 in 20 times if the null hypothesis
were true. When we will reject the null
hypothesis and consequently accept the
alternative ones. Let’s now look at how this
decision making criterion works in CASE 1.
11.
Ho : μ ≤ 10 oz
Ha : μ > 10 oz
n= 25 and assume
known.
1 .0
and is widely
12.
Suppose the mean of your 25 box sample is 10.36
oz. Is that significantly different from (>) 10 oz
so that we should reject the claim of 10 oz
stated in Ho. Clearly it is greater than 10 oz but
is this mean rare enough under the claim of μ ≤
10 oz for us to reject the claim.
To answer this question we will use the standard
normal transformation to find the probability of X
≥10.36 oz when the mean of the sampling
distribution of X is 10 oz. If this probability is
less than 0.05 (1 in 20), we consider the result
to be too rare for acceptance of Ho.
13.
CASE II: Testing that the salesman is a cheat
Suppose our salesman is a fast and smooth
talker with fancy cloths and a new sports car.
Your view might be that cornflakes salesman
only gain this type affluence through
unethical practices. You think this guy is
cheat. Your null hypothesis is Ho: μ ≥ 10 oz
and your alternative hypothesis is Ha:μ < 10
oz . Notice that the two hypothesis are again
mutually exclusive and all inclusive and that
the equal sign is always in the null
hypothesis.
14.
It is the null hypothesis (the salesman’s claim)
that will be tested.
Ho : μ ≥ 10 oz
Ha : μ < 10 oz.
Suppose you again sample 25 boxes to
determine the average weight. The question
you want to answer and the predictions (Ho,
Ha) stemming from that question are again
formulated before the sampling is done,
15.
1 .0
n = 25,
oz and again we find X =
10.36 oz. How does this result fit our
predictions ? If Ho is false, we expect the
mean X to be significantly less than 10 oz.
16.
CASE III: Testing that the salesman is clueless
The last case is somewhat different from the
first in that we really don’t know whether to
expect the mean of the sample to be higher
or lower than the salesman’s claim. The
salesman is new on the job and does not
know his product very well. The claim of 10
oz per box is what he has been told, but you
don’t have a sense that he is either overly
conservative (CASE I) or dishonest (CASE II).
Your alternative hypothesis here is less
focused.
17.
It becomes that the mean is different from 10
oz. The prediction become
Ho: μ = 10 oz
Ha : μ ≠ 10 oz. Under Ho we expect X to be
close to 10 oz, while under Ha we expect X
in either
direction ie significantly
smaller or significantly larger
than 10 oz.
to be different from 10 oz
18.
State the problem: should I buy cornflakes
from salesman?
2. Formulate the null and alternative hypothesis
Ho : μ = 10 oz
Ha : μ ≠ 10 oz
3. Choose the level of significance. This means to
choose the probability of rejecting a true null
hypothesis. We choose 1 in 20 in our cornflakes
example, that is, 5% or 0.05. When Z was so
extreme as to occur less than 1 in 20 times if
Ho were true, we rejected Ho.
1.
19.
4. Z is calculated as
Z
X
n
Determine the appropriate test statistic. Here
we mean the index whose sampling
distribution is known, so that objective
criteria can be used to decide between Ho
and Ha. In the cornflakes example we used a
Z transformation because under the Central
Limit Theorem X was assumed to be
normally or approximately normally
distributed and the value of
was known.
20.
5. Calculate the appropriate test statistic. Only
after the first four steps are completed , can
one do the sampling and generate the socalled test statistic.
Here Z=
10 . 36
10 . 00
1
25
0 . 36
0 . 20
1 .8
21.
6. Determine the critical values for the
sampling distribution and appropriate level
of significance. For the two tailed test and
level of significance of 1 in 20 we have
critical values of + 1.960 (C.3 Tab). These
values or more extreme ones only occur 1 in
20 times if Ho is true. The critical values
serve as cutoff points in the sampling
distribution for regions to reject Ho.
22.
7. Compare the test statistic to the critical
values. In a two-tailed test, the CV’s = +
1.960 and the test statistic is 1.8, so
- 1.960<1.8<1.960.
8. Based on the comparison in step 7, accept
or reject Ho. Since Z falls between the
critical values, it is not extreme enough to
reject Ho.
9. State your conclusion and answer the
question posed in step 1. SO WE ACCEPT HO.
23.
Because the predictions in Ho and Ha are written so
that they are naturally exclusive and all inclusive,
we have a situation where one is true and the other
is automatically false.
When Ho is true, then Ha is false.
If we accept Ho we have done the right thing
If we reject Ho we have made an error
This type of mistake is called a
Type I error
24.
When Ho is false , then Ha is true
If
we accept Ho, we have made an error
If we reject Ho, we have done the right
thing
The second type of mistake is
called Type II error
25.
t- test ( Hypothesis involving the mean)
Example 1. A forest ecologist studying
regeneration of rain forest communities in
gaps caused by large tree falling during
storms, read the stinging (bow) tree,
Dendrocnide excelsa, seedlings will grow
1.5m/yr in direct sun light in each gap. In
the gaps in her study plot she identified 9
specimens of this species and measured them
in 2009and again 1 yr later. Listed below are
the changes in height for the nine specimens.
26.
Do her data support the published contention
that seedlings of this species will average 1.5
m of growth per yr in direct sun light ?
1.9 2.5 1.6 2.0 1.5 2.7 1.9 1.0 2.0
Solution
Hypothesis : Ho: μ = 1.5 m/yr
Ha: μ ≠ 1.5 m/yr
27.
If the sample mean for 9 specimens is close to 1.5
m/yr we will accept Ho. If sample mean is
significantly larger or smaller than 1.5 m/yr we
will accept Ha (reject Ho). To test significant
difference, it means that they are so rare that
they would occur by chance less than 5% of the
time, if Ho is true ie α = 0.05. Test statistic will
be
t
X
s
n
28.
Here, n=9,
X
s2 =0.260 m2 , s= 0.51 and
1 . 90 m ,
1 . 90 1 . 50
0 . 40
s
0 . 51
0 . 51
n
t
X
9
3
2 . 35
Clearly t-value of 2.35 is not zero but it is far
enough away from zero so that we can
comfortably reject Ho. With a predetermined
α level of 0.05 we must get a t-value far
enough from zero that would occur <5% of
the time if Ho is true.
29.
From Tab C.4 we have the following sampling
distribution for t with v=n-1= 8 and α=0.05
for a two tailed test.
t=2.35
r
e
j
e
0.025 c
t
-2.306
accept
0
r
e
j
e 0.025
c
t
+2.306
30.
If Ho is true and we sample hundreds or
thousands of times with samples of 9 species
and each time we calculate the t-value for
the sample, these t-values would form a
distribution with the shape indicated above.
2.5% of the samples would generate t-values
below -2.306 and 2.5% of the samples would
generate t values above 2.306. So values as
extreme as + 2.306 are rare if Ho is true.
31.
The test statistic in this sample is 2.35 and
since 2.35>2.306, the result would be
considered rare for a true null hypothesis.
We reject Ho based on this comparison and
conclude that average growth of stinging
trees in direct sun light is different from the
published value and is, in fact, greater than
1.5 m/yr.
32.
Watching an infomercial on TV you hear the
claim that without changing your eating
habits, a particular herbal extract when
taken daily will allow you to loose 5lb in 5
days. You decide to test this claim by
enlisting 12 of your classmates into an
experiment. You weigh each subject, ask
them to use the herbal extract for 5 days and
then weigh them again. From the results
recorded below, test the infomercial’s claim
of 5 lb lost in 5 days.
34.
Solution: Because the data are paired we are
not directly interested in the values
presented above, but are interested in the
differences or changes on the pairs of
members. Think of data as in groups
Group 1
Group 2
X11
X21
X12
X22
X13
X23
…
…
X1n
X2n
For the paired data
here we wish to
investigate the
differences or di’s
where
X11-X21 = d1, X12-X22 =
d2, X1n-X2n =dn
35.
Expressing the data set in terms of these
differences di’s, we have the following table.
Note importance of sign of these differences
subjects
di
subjects
di
1
8
7
2
2
8
8
-1
3
2
9
-3
4
-1
10
6
5
8
11
5
6
3
12
9
36.
The infomercial claim of a 5 lb loss in 5 days
could be written
Ho: μB- μA = 5lb but Ho: μd = 5lb is somewhat
more appealing
Ho: μd = 5 lb
Ha: μd ≠ 5 lb
Choose α = 0.05, since the two columns of data
collapse into one column of interest, we
treat these data now as a one sample
experiment.
37.
There is no preliminary F test and our only
assumption is that the di’s are approximately
normally distributed. The test statistic for
the paired sample t test is
t
X
d
sd
n
With v = n-1, where n is number of pairs of
data points.
38.
Here X
= 3.8 lb, sd = 4.1 lb, n=12. We
expect this statistic to be close to 0 if Ho
is true ie the herbal extract allows you to
loose 5 lb in 5 days. We expect this
statistic to be significantly different from
0 if the claim is false.
d
t
3 .8 5
4 .1
12
1 . 01
39.
With v= n-1= 12-1 =11. The critical value for
this left tailed test from Tab C.4 is t0.05(11)= 1.796. Since -1.796<-1.01 the test statistic
does not deviate enough from expectation
under a true Ho that you can reject Ho. The
data gathered from your classmates support
the claim of an average loss of 5 lbs in 5 days
with the herbal extract. Because you accept
Ho here, you may be making a Type II error
(accepting a false Ho), but we have no way
of quantifying the probability of this type of
error.
40.
An expt. was conducted to compare the
performance of two varieties of wheat, A
and B. Seven farms were randomly chosen
for the expt. and the yields in metric tons
per hectare for each variety on each farm
were as follows;
Farm
Yield of var. A
Yield of var. B
1
4.6
4.1
2
4.8
4.0
3
3.2
3.5
4
4.7
4.1
5
4.3
4.5
6
3.7
3.3
7
4.1
3.8
41.
a)
b)
Why do you think both varieties were on
each farm rather than testing variety A on
seven farms and variety B on seven
different farms?
Carry out a hypothesis test to decide
whether the mean yields are the same for
the two varieties.
42.
Solution: The expt. was designed to test both
varieties on each farm because different
farms may have significantly different yields
due to differences in
i) soil characteristics
ii) micro climate
iii) cultivation practices
“Pairing” the data points accounts for most of
the “between farm” variability and should
make any difference in yield due solely to
what variety.
43.
Farm
Difference (AB)
1
0.5
2
0.8
3
-0.3
4
0.6
5
-0.2
6
0.4
7
0.3
t
0 . 30
0
0 . 41
7
The hypotheses are
Ho : μA – μB or μd = 0
Ha : μd ≠ 0
Let α = 0.05.Then X
ton/hectare n =7
and and sd = 0.41
ton/hectare.
d
1 . 94
0 . 30
44.
With v=7-1=6 . The critical values from Tab C.4
are t0.025(6)= -2.447 and t0.975(6) = 2.447. Since
-2.447<1.94<2.447 the test statistic does not
deviate enough from 0, the expected t value
if Ho is true, to reject Ho. From the data
given we can not say that the yields of
varieties A and B are significantly different.
45.
Example: A geneticist interested in human
population has been studying growth
patterns in US males since 1900. A
monograph written in 1902 states that the
mean height of adult US males is 67.0 inch
with a standard deviation of 3.5 inch.
Wishing to see if these values have changed
over the 20th century the geneticists
X
measured a random sample of adult US males
and found that = 69.4 inch and s = 4.0
inch. Are these values significantly different
from the values published in 1902?
46.
Solution: There are two questions here – one
about the mean and the second about the
standard deviation or variance. Two
questions require two sets of hypotheses and
two test statistics. For the question about
means, the hypotheses are
Ho : μ = 67.0 inch
Ha : μ ≠ 67.0 inch
47.
With n = 28 and α = 0.01. This is a two tail test
with the question and hypotheses (Ho and
Ha) formulated before the data were
collected or analyzed.
69 . 4 67 . 0
2 .4
s
4 .0
0 . 76
n
t
X
28
3 . 16
Using an α level of 0.01 for v= n-1= 27, we find
the critical values to be ± 2.771 (Tab C.4).
48.
Since 3.16>2.77, we reject Ho and say that
modern mean is significantly different from
that reported in 1902 and , in fact, is higher
than the reported value (because the t-value
falls in the right hand tail). P (Type I error)<
0.01.
For the question about variance, the
hypotheses are Ho:
Ha :
2
12 . 25 inch
2
2
12 . 25 inch
2
49.
Here n=28. Then
2
(n
1) s
2
2
( 28
1)16
35 . 3
12 . 25
The question about variability is answered with
a Chi-square statistic. The 2 value is
expected to be close to 27 (n-1), if Ho is true
and significantly different from 27, if Ha is
true.
50.
From Table C.5 using an alpha level of 0.01 for
v = 27, we find the critical values for
to be
11.8 and 49.6. Since 11.8<35.3<49.6 we do
not reject Ho here. There is not statistical
35 . 3 )
support for Ha. The p value here for p (
is between 0.500(31.5) and 0.250(36.7)
indicating the calculated value is not a rare
event under the null hypothesis.
2
2
51.
We would conclude that the mean height of
adult US males is higher now than reported
in 1902, but the variability in heights is not
significantly different today than in 1902.
52.
Assumptions for the
test for goodness of fit are
that
1. An independent random sample of size n is
drawn from the population.
2. The population can be divided into a set of k
mutually exclusive categories.
3. The expected frequencies for each category
must be specified. Let Ei denote the expected
frequency for the i-th category. The sample
size must be sufficiently large so that each Ei is
at least 5 (categories may be combined to
achieve this).
2
53.
The hypothesis test takes only one form
Ho : The observed frequency distribution is
the same as the hypothesized frequency
distribution
Ha : The observed and hypothesized
frequency distributions are different
Generally speaking, this is an example of a
statistical test where one wishes to confirm
the null hypothesis.
54.
Test statistic
2
k
i 1
(O i
Ei )
2
Ei
Let Oi denote the observed frequency of the i-th
category. The test statistic is based on the
difference between the observed and expected
frequencies, Oi - Ei.
The intuition for the test is that if the observed
and expected frequencies are nearly equal for
each category, then each
2
Oi – Ei
will be small and, hence,
will be small. Small values of Chi-squares
should lead to acceptance of Ho while large
values lead to rejection. The test is always right
tailed. Ho is rejected only when the test statistic
exceeds a specified value.
55.
The statistic has an approximate Chi-square
distribution where Ho is true; the
approximation improves as sample size
increases. The values of the Chi-square
distribution are tabulated in C.5.
56.
The progeny of self-fertilized four-o’clocks
were expected to flower red, pink and white
in the ratio of 1:2:1. There were 240 progeny
produced with 55 red plants, 132 pink plants,
and 53 white plants. Are these data
reasonably consistent with the Mendelian
1:2:1 ratio?
57.
Solution: The hypotheses are
Ho: The data are consistent with
a Mendelian model (1:2:1)
Ha: The data are inconsistent
with a Mendelian model (1:2:1)
The THREE colours are the THREE categories. In
order to calculate frequencies, no parameters
need to be estimated. The Mendelian ratios are
given; 25% red, 50% pink and 25% white. Using
the fact that there are 240 observations, the
number of expected red four-o’clock is 0.25 ×
240 = 60 ie Ei = 60. Similar calculations for pink
and white yield the following table:
58.
Category
Oi
Ei
(O i
Ei )
Ei
Red
55
60
0.42
Pink
132
120
1.20
White
53
60
0.82
Total
240
240
2.44
2
59.
2
3
i 1
(O i
Ei )
Ei
2
0 . 42
1 . 20
0 . 82
2 . 44
60.
v = df = no. of categories-1 = 3-1 = 2 Let α =
0.05
Because the test is right tailed, the critical
)
value occurs when p (
. Thus in
Table C.5 for df=2 and p=1-α = 0.95, the
critical value is found to be 5.99. Since
2.44<5.99, Ho is accepted. This support
Mendelian 1:2:1 ratio.
2
1
2
1
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment