This document provides an introduction to hypothesis testing. It discusses the concepts of null and alternative hypotheses, test statistics, p-values, and significance levels. An example is provided to illustrate the process of hypothesis testing. The example tests whether the average time spent on spam by office workers exceeds 25 minutes. It calculates the test statistic, finds the p-value, and makes a decision based on comparing the p-value to the significance level. The conclusion is that the average time spent on spam is likely not greater than 25 minutes, since the p-value is greater than the significance level of 1%.
Introduction to Hypothesis Testing and Sampling Distributions
1. INTRODUCTION TO HYPOTHESIS
TESTING
Chapters 9 and 11
D
orit N
evo, 2013
1
INFERENTIAL STATISTICS
—
people, physical entities, and built devices behave.
repeatedly tested in various settings to either
strengthen or refute them.
—
the drawing of conclusions about a population of interest
based on findings from samples obtained from that
population.
—
—
2. —
HYPOTHESES TESTING
atistics is very much about expectations. We
aim to test specific expectations that we have
about the population’s parameters using sample
statistics. We call these expectations hypotheses.
—
test.
given the hypothesized distribution of the
population
We believe that Our sample
mean is
Possible?
X~N(10,2) 12 Probably yes
X~N(10,2) 20 Probably not
HYPOTHESES
iate between the research hypothesis
and the null hypothesis.
—
people carry $50 or more in their wallet. This claim
3. is the null hypothesis. The research hypothesis
contains the other side of this claim, that is – that
people carry less than $50. We can also write it as:
—
(sometimes denoted HA) is used to notate the
research hypothesis, commonly referred to as the
alternative hypothesis.
HYPOTHESES
200 or fewer friends on Facebook. However, you
believe that Facebook users, in fact, have more than
200 friends. You can set your hypotheses as:
—
—
grade average is different than that of other sections
of the course. The average for all other sections is
‘75’. To help the professor learn if her section’s grade
average is different than that of other sections, we
need to set up the following hypotheses:
—
— grade average ≠ 75
THREE FORMS OF HYPOTHESES
H0: Average
4. amount of money
≥ $50
H1: Average
amount of money
< $50
Lower-
Tail
Test
H0: Section
average = 75
H1: Section
average ≠ 75
Two-
Tail
Test H0: Average
number of
friends ≤ 200
H1: Average
number of
friends > 200
Upper-
Tail
Test
-4 -3 -2 -1 0 1 2 3 4
Z
5. The ‘equal’ sign (=, ≥, ≤) always goes in the null hypothesis
A FINAL NOTE ON HYPOTHESES
mutually exclusive.
—
possible option.
—
hypotheses. The following hypotheses are, therefore,
incorrect because the ‘= 20’ option is in both the null and
alternative hypotheses:
Not Exhaustive Exhaustive
H0: Average weeks of job seeking = 20
H1: Average weeks of job seeking > 20
H0: Average weeks of job seeking ≤ 20
H1: Average weeks of job seeking > 20
WRITE THE HYPOTHESES
various platforms. According to one source, people
collectively spend 300 million minutes per day
playing Angry Birds and it supposedly costs the US
economy billions of dollars in lost work time. Suppose
6. you believe this number (300 million minutes) is too
high. You conduct some more research to find out
that there are roughly 30 million daily active users of
Angry Birds, which means that on average each
player spends (300 million minutes/30 million users)
10 minutes per day playing Angry Birds, according to
the original claim. You plan to use a sample of
students at your school to test this claim and need to
set up your hypotheses first.
D
orit N
evo, 2013
8
HYPOTHESES
H0: Average time playing (µ) ≥ 10
H1: Average time playing (µ) < 10
D
orit N
evo, 2013
9
MORE PRACTICE
7. D
orit N
evo, 2013
10
for the three types of tests
(lower-tail, upper-tail, and two-
tail), write the hypotheses for
the following tests:
— -tail test for the average
number of sleep hours
— -tail test for the
average hours spent grooming
— -tail test for the average
hours spent on educational
activities
— -tail test for the average
hours spent eating and
drinking
— -tail test for the average
hours spent on leisure and
sports
— -tail test for the
average hours spent working
Activity Hours
8. Sleeping 8.4
Leisure and Sports 3.6
Educational Activities 3.4
Working 3.0
Other 2.2
Traveling 1.5
Eating and Drinking 1.1
Grooming 0.8
Total 24.0
EXAMPLE
n a
used textbook store and are interested in determining
the amount of money that customers are likely to
spend when shopping at the store.
owner, the average customer making purchases at the
store spent $100 with a standard deviation of $35.
the last year. We hypothesize that:
—
—
9. average spending (the
parameter of interest) for the population (all
customers who have made purchases at the store).
D
orit N
evo, 2013
11
WE TAKE A SAMPLE
the mean spending in our sample is ‘$93.27’.
—
hypothesized mean, but we can attribute it to
sampling error.
—
one sample?
DECISION RULE
is:
—
size n=20 out of a population with a mean of $100 and
a standard deviation of $35 and obtain a sample
mean of ‘$93.27’?
10. —
reject the null hypothesis.
—
extremely low, then we should conclude that the null
hypothesis is likely false and we should reject it,
believing that the true average spending is, in fact,
lower than $100.
sample mean of ‘$93.27’ given a population mean
of $100 and standard deviation of $35?
D
orit N
evo, 2013
13
THE SAMPLING DISTRIBUTION OF
THE MEAN
D
orit N
evo, 2013
14
To answer this question we need to know:
11. SAMPLING DISTRIBUTIONS
probability distribution of any sample statistic.
The sampling distribution of the mean
describes the probabilities attached to all values
of the mean of samples that are repeatedly taken
from the same population.
example to better understand this concept.
D
orit N
evo, 2013
15
SAMPLING DISTRIBUTION
ple of twenty
customers (i.e., n=20), and measured how much
each of the customers spent. We then computed
the mean spending for that sample to be ‘$93.27’.
each time taking a different sample of twenty
customers and computing the mean spending for
each sample.
D
orit N
12. evo, 2013
16
Sample (n=20) Sample Mean (X)
1 $93.27
2 $104.34
3 $100.49
… …
199 $96.65
200 $111.65
Average over 200 samples: $99.04
Standard Deviation: $11.14
GRAPHICALLY
D
orit N
evo, 2013
17
F
re
q
u
en
cy
13. F
re
q
u
en
cy
200 samples of size 20
1,000 samples of size 20
CENTRAL LIMIT THEOREM (PART I)
D
orit N
evo, 2013
18
sampling distribution of the mean of a random
sample of any size (n) drawn from a normally
distributed population also follows a normal
distribution with a mean of µ and a standard
deviation of .
standard error of the mean.
—
14. σ
n
σ
n
X ~ N(100, 35
20
)
BUT WHAT ABOUT NON-NORMAL
POPULATIONS?
-rolling example:
— -sided die, we can obtain any
one of the values from ‘1’ to ‘6’, each with a
probability of ‘1/6’. Defining a random variable as
‘the outcome of a single die roll’, this variable follows
the uniform and discrete distribution.
D
orit N
evo, 2013
19 0
0.02
0.04
0.06
15. 0.08
0.1
0.12
0.14
0.16
0.18
1 2 3 4 5 6
dice. And, instead of looking at the outcome on
the individual die, we will define our random
variable as ‘the average of the values obtained
from two rolled dice’.
— or example, if one die rolled on ‘3’ and the other on
‘5’, then we will record the outcome of this roll as ‘4’.
—
obtain along with their probabilities
D
orit N
evo, 2013
20
17. ROLLED (10 IN THE GRAPH BELOW)…
D
orit N
evo, 2013
22
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
CENTRAL LIMIT THEOREM (PART II)
D
orit N
evo, 2013
18. 23
distributed, the mean will follow an
approximately normal distribution for large
enough samples (n≥30), with a mean of µ and a
standard deviation of . σ
n
PUTTING IT ALL TOGETHER
D
orit N
evo, 2013
24
The mean of a sample taken from a normally
distributed population (e.g., the spending example)
will follow a normal distribution, with the mean equal
to the population mean and a standard deviation
equal to the population standard deviation divided by
the square root of the sample size (hence the
distribution is narrower).
The mean of a sufficiently large sample taken from a
non-normally distributed population (e.g., the dice
example) will follow an approximately normal
distribution, with the mean equal to the population
mean and a standard deviation equal to the
population standard deviation divided by the square
19. root of the sample size.
Using statistical notaion:
EXAMPLE 9.1(A)
the amount of soda in each “32-ounce” bottle is a
normally distributed random variable, with a
mean of 32.2 ounces and a standard deviation of
0.3 ounce.
—
that the bottle will contain more than 32 ounces?
—
— table:
D
orit N
evo, 2013
25 7486.2514.1)67.Z(P
3.
2.3232XP)32X(P =−=−>=⎟
⎠
⎞
⎜
⎝
⎛
20. >
σ
µ−
=>
EXAMPLE 9.1(B)
what is the probability that the mean amount
of the four bottles will be greater than 32
ounces?
—
asked about the mean of X
D
orit N
evo, 2013
26
P(X > 32) = ?
INTERPRET THE QUESTION
—
—
21. —
D
orit N
evo, 2013
27
€
X ~ N(32.2, 0.3
4
)
€
X ~ N(µ, σ
n
)
SOLUTION
mean of the four bottles contains more than 32oz.
D
orit N
evo, 2013
28
22. €
Z = X − µX
σX / n
=
32 − 32.2
0.3/2
= −1.33
P(X > 32) = P(Z > −1.33) = 0.9082
BACK TO HYPOTHESIS TESTING
D
orit N
evo, 2013
29
EXAMPLE
lbs) during a student's first year at college. A
researcher set out to challenge this belief
claiming that the weight gain is in fact much
lower. Based on a sample of 100 students the
researcher found an average weight gain of 13lbs.
Assuming the standard deviation of weight gain
is known to be 10lbs, what can the researcher
23. conclude?
D
orit N
evo, 2013
30
€
H0 : µ ≥15 lbs [or µ =15]
HA : µ <15 lbs
TEST STATISTIC
sample mean into a Z-score. This Z-score is
known as the test statistic
D
orit N
evo, 2013
31
Z = X −µ
σ
n
24. Z = 13−15
10 / 100
= −2
P-VALUE
extreme as the one we have found is called the p-
value of the test.
whether or not to reject the
null hypothesis will be based on this probability.
D
orit N
evo, 2013
32
P(X <13)→ P(Z < 13−15
10 100
) = P(Z < −2) = 0.0228
THE SIGNIFICANCE OF THE TEST
-value to a desired significance
level known as α (alpha).
25. ‘statistically significant’, it means that the
finding is unlikely to have occurred by chance.
Our level of significance is the maximum
chance probability we are willing to tolerate.
hypothesis testing context:
— -value smaller than α we reject the
null hypothesis.
D
orit N
evo, 2013
33
IN OUR EXAMPLE
P-value = 0.028 < 0.05 = α
conclude that the average weight gain is likely
lower than 15lbs.
D
orit N
evo, 2013
26. 34
ANOTHER EXAMPLE (11.34)
nuisance. An office manager believes that the
average amount of time spent by office workers
reading and deleting spam exceeds 25 minutes
per day. To test this belief, he takes a random
sample of 18 workers and measures the amount
of time each spends reading and deleting spam.
The results are listed below. If the population of
times is normal with a standard deviation of 12
minutes, can the manager infer at the 1%
significance level that he is correct?
D
orit N
evo, 2013
35
35 48 29 44 17 21 32 28 34
23 13 9 11 30 42 37 43 48
INTERPRETATION
time spent by
office workers reading and deleting spam exceeds 25 minutes
per day”
27. —
—
— n=18
deleting
spam”
—
and deleting
spam
—
f the population of times is normal with a standard
deviation of 12
minutes”
—
is correct”
—
D
orit N
evo, 2013
36
STEP 1: COMPUTE THE TEST STATISTIC
28. -score corresponding to
our sample mean
on the central limit theorem, we know that:
D
orit N
evo, 2013
37
X ~ N(25, 12
18
)
Z = 30.22− 2512
18
=1.845
STEP 2: FIND THE P-VALUE
—
sample mean at least as high as the one we have
found:
29. — – 0.9678 = 0.0322
round Z to 1.85 to be able to look it up in the table
D
orit N
evo, 2013
38
STEP 3: DECISION RULE AND CONCLUSION
-value is less than α.
— -value = 0.0322 and the questions states that α is 0.01
— Since p-value > α we do not reject the null hypothesis.
time spent on spam likely does not exceed 25 minutes.
— -value < α and we reverse our conclusion and
reject the null hypothesis
— -value to indicate the highest level of
significance
of our test.
— ficant at the
5%
level but not at the 1% level.
30. D
orit N
evo, 2013
39
IN RESEARCH PAPERS
D
orit N
evo, 2013
40
A TWO-TAIL TEST EXAMPLE
section’s grade average is different than that of
other sections of the course. The average for all
other sections is 75 with a standard deviation of
4. The professor collected test scores from 25
students and found the sample average was 72.
She wishes to conduct a hypothesis test at the 5%
level of significance (that is, α=0.05).
D
orit N
evo, 2013
31. 41
STEP 1: HYPOTHESES
the null hypothesis if the sample mean is
significantly smaller or significantly larger than
the hypothesized mean.
D
orit N
evo, 2013
42
H0 :µ = 75
HA :µ ≠ 75
STEP 2: COMPUTE THE TEST STATISTIC
—
that X is normally distributed with a mean of 75 and
a standard deviation of 4.
— -bar
(the average starting salary) is:
32. sample mean of 72 to a Z score using the
hypothesized distribution:
D
orit N
evo, 2013
43
X ~ N(75, 4
25
)
Z = 72− 75
4 25
= −3.75
STEP 3: P-VALUE
as extreme as the one we have found?
— -3.75) = 0.00009
-tail test p-value needs to be
doubled before we compare it to α
— -value = 2*P(Z≤|test statistic|) = 2*0.00009 =
0.00018
D
33. orit N
evo, 2013
44
STEP 4: DECISION RULE AND CONCLUSION
-value of ‘0.00018’ is less than α (0.05) we
reject the null hypothesis
conclude that the average grade is different than 75.
D
orit N
evo, 2013
45
KEY TERMS
by taking repeated samples of size n from a given
population, computing the mean for each sample, and
charting the distribution of these means.
—
sampling distribution of the mean in most cases (see
34. chapter 9.1 for a review)
It is a Z-score calculated based on the sample statistic
and the hypothesized distribution
-value is the probability of finding a test statistic as
extreme (either as high or as low) as the one we have
found
D
orit N
evo, 2013
46
ANOTHER APPROACH (SIMILAR BUT NOT
EXACTLY THE SAME)
of his fastball pitch is greater than that of his
rival, who averages 90 mph. He collects data on
the speed of 100 fastballs and finds his average
speed is 90.8 mph. Assuming that we know the
standard deviation of a fastball pitch to be 3.85
mph, what can we conclude about the pitcher’s
claim?
D
orit N
evo, 2013
35. 47
STEP 1: HYPOTHESES
has a higher speed than that of his rival.
Therefore, the null hypothesis is that it is not
higher (i.e. it is the same speed or lower) and the
alternative is that it is higher:
—
—
speed.
D
orit N
evo, 2013
48
STEP 2: COMPUTE THE TEST STATISTIC
— be true, we believe
that X is normally distributed with a mean of 90mph
and a standard deviation of 3.85mph.
— -bar
(the average pitching speed) is:
36. sample mean of 90.8 to a Z score using the
hypothesized distribution:
D
orit N
evo, 2013
49
X ~ N(90, 3.85
100
)
Z = 90.8− 90
3.85 100
= 2.078
STEP 3: FIND THE CRITICAL VALUE
the desired significance level (α).
—
at the 1% level of significance: α=0.01.
—
P(Z ≥ |critical value|) = α.
— distribution table to
37. find the critical value Z*, such that: P(Z ≥ Z*) = 0.01.
relationship here since this is an upper-tail test
—
distribution.
D
orit N
evo, 2013
50
GRAPHICALLY
region.
—
reject the null hypothesis
D
orit N
evo, 2013
51
38. STEP 4: COMPARE THE TWO VALUES
rejection region and therefore we do not reject
the null hypotheses.
—
— esis
D
orit N
evo, 2013
52
STEP 5: CONCLUSION
concluding that there is no evidence to support
the claim that the pitcher’s fastballs are faster
than his rival’s
D
orit N
evo, 2013
53
A LOWER TAIL TEST EXAMPLE
39. average starting salary for clerical employees in
the state is less than $30,000 per year. To test
this claim, she has collected a random sample of
100 starting salaries of clerks from across the
state and found that the sample mean is $29,570.
—
—
deviation is known to be $2,500 and the significance
level for the test is 0.05
Dorit Nevo, 2013
54
STEP 1: HYPOTHESES
reject the null hypothesis if the test statistic is
significantly smaller than the hypothesized
parameter.
D
orit N
evo, 2013
55
€
H0 :µ ≥ $30,000
HA :µ < $30,000
40. Dorit Nevo, 2013
56
-4 -3 -2 -1 0 1 2 3 4
Z
ONE-TAIL TEST (LOWER TAIL)
Lower tail rejection
region
STEP 2: COMPUTE THE TEST STATISTIC
— be true, we believe
that X is normally distributed with a mean of $30,000
and a standard deviation of $2,500.
— -bar
(the average starting salary) is:
our
sample mean of $29,570 to a Z score using the
hypothesized distribution:
D
orit N
41. evo, 2013
57
X ~ N(30000, 2500
100
)
Z = 29570−30000
2500 100
= −1.72
STEP 3: CRITICAL VALUE
level for the test as 0.05
-Z*)=0.05
(lower rejection region only)
-1.645
D
orit N
evo, 2013
58
STEP 4: COMPARE THE TWO VALUES
42. D
orit N
evo, 2013
59
€
Zstatistic = −1.72 < −1.645 = Zcritical
STEP 5: CONCLUSION
critical Z value and therefore falls in the rejection
region. At α=0.05 we can reject the null hypothesis and
conclude that the average salary is less than 30,000.
D
orit N
evo, 2013
60
A TWO-TAIL TEST EXAMPLE
section’s grade average is different than that of
other sections of the course. The average for all
43. other sections is 75 with a standard deviation of
4. The professor collected test scores from 25
students and found the sample average was 72.
She wishes to conduct a hypothesis test at the 5%
level of significance (that is, α=0.05).
D
orit N
evo, 2013
61
STEP 1: HYPOTHESES
the null hypothesis if the test statistic is
significantly smaller or significantly larger than
the hypothesized parameter.
D
orit N
evo, 2013
62
H0 :µ = 75
HA :µ ≠ 75
STEP 2: COMPUTE THE TEST STATISTIC
44. —
that X is normally distributed with a mean of 75 and
a standard deviation of 4.
— on of X-bar
(the average starting salary) is:
sample mean of 72 to a Z score using the
hypothesized distribution:
D
orit N
evo, 2013
63
X ~ N(75, 4
25
)
Z = 72− 75
4 25
= −3.75
STEP 3: CRITICAL VALUE
rejection regions
—
45. the tails:
— 0.05
— -Z*)=0.025 (this is
the
lower tail)
— -1.96
D
orit N
evo, 2013
64
STEP 4: COMPARE THE TWO VALUES
D
orit N
evo, 2013
65
Zstatistic = −3.75< −1.96 = Zcritical
STEP 5: CONCLUSION
critical Z value at the lower tail, and therefore falls in
46. the rejection region.
conclude that the average grade is different than 75.
D
orit N
evo, 2013
66
TYPE I AND TYPE II ERRORS
Still Chapter 11
D
orit N
evo, 2013
67
STARTING WITH AN EXAMPLE
customer receives no more than 8 spam text messages
each day, with a standard deviation of 0.8. To test
this claim (at the 5% significance level), you collect
data from 60 wireless customers and record the
number of spam messages received during a single
day. Your sample average is 8.2.
47. tep 1: Hypotheses
—
—
—
value is 1.645
D
orit N
evo, 2013
68
CONT’D
D
orit N
evo, 2013
69
c
—
the test statistic to our critical value of 1.645 and
conclude that the null hypothesis should be rejected,
because 1.94>1.645. In other words, we conclude that
48. the average customer receives more than 8 spam text
messages per day
Z = 8.2−8
0.8 60
=1.94
TYPE I AND TYPE II ERRORS
of errors:
—
pam messages is
more than 8 when in fact it is eight
—
more than eight when in fact it is.
e II error
Dorit Nevo, 2013
70
ERROR SUMMARY
49. Hypothesis testing
indicates that you
should
In actuality
H0 is true
(“do not reject”)
H0 is false
(“reject”)
Not reject H0
Reject H0
Dorit Nevo, 2013
71
Type I
error
(prob. α)
Type II
error
(prob. β)
TYPE I ERROR IN OUR EXAMPLE
computed the probability of finding the sample mean
that we observed.
—
50. 60 with a mean of 8.2 spam message is: P(Z ≥ 1.94)=0.026,
which is quite low.
—
should reject the null hypothesis: we believe that it is false.
ue.
—
type I error.
—
significance), α is also the probability of committing a type
I error.
D
orit N
evo, 2013
72
THE TRADE-OFF
type I error, we need to use smaller values of α.
—
committing a type II error (not rejecting a false null
hypothesis).
51. —
to α
D
orit N
evo, 2010
73
SUMMARY
—
– Quiz
¢
12 (other single population tests)
D
orit N
evo, 2010
74
For the project you should identify three research questions that
can be addressed through different hypothesis testing
procedures. Your submission should include the following
components.
1. Introduction: Briefly describe (in words) each of the three
research questions and your motivation for studying them (not
52. more than two pages). For each test you should state your null
hypothesis and alternative hypothesis.
2. Hypothesis tests: for each of the three tests clearly discuss:
(1) the data used to conduct the test, (2) any assumption that
you need to make in order to conduct the test including support
for making these assumptions, (3) the test and its results, (4) the
statistical significance of your findings.
3. Summary: summarize your findings form the analysis. What
do you conclude about the hypothesis you raised? Identify key
limitations of your analysis (including any limitations of the
data set), contributions that you believe your analysis makes to
the readers, and any interesting directions for future analysis
(not more than three pages)
Project sample is shown as below:
Introduction: The Doll Computer Company makes its own
computers and delivers them directly to customers who order
them via the Internet.
To achieve its objective of speed, Doll makes each of its five
most popular computers and transports them to warehouses from
which it generally takes 1 day to deliver a computer to the
customer.
This strategy requires high levels of inventory that add
considerably to the cost.
To lower these costs the operations manager wants to use an
inventory model. He notes demand during lead time is normally
distributed and he needs to know the mean to compute the
optimum inventory level.
He observes 25 lead time periods and records the demand during
each period.
The manager would like a 95% confidence interval estimate of
the mean demand during lead time. Assume that the manager
knows that the standard deviation is 75 computers
Hypothesis Tests:
25 observed lead time shown below
235 374 309 499 253
53. 421 361 514 462 369
394 439 348 344 330
261 374 302 466 535
386 316 296 332 334
Interpret the question
· X represents demand
a. X~N (µ,75)
· δx=75 (standard deviation)
· n=25
· We would like to create a 95% confidence interval for the
mean demand based on our sample of 25 lead time periods
· What do we need to know?
x̄ + Zα/2
· Compute the sample mean (X-bar) = 370.16
· Finding Zα/2
a. Looking for Zα/2 such that:
b. P(-Zα/2<Z< Zα/2)=0.95
c. Easiest:
Look for the lower tail probability in the normal table
P(Z<-Z α/2)=0.025
d. Therefore:
Zα/2=1.96
The computation is not done, but the format is pretty much like
shown above.