2. o Hypothesis testing for a population mean
o Steps of hypotheses testing:
▪ Null Hypothesis and Alternative Hypothesis
▪ Test statistic
▪ P-value
▪ Conclusion
o Relationship between Confidence Interval and Hypothesis Testing
o Read Chapter 6.1,6.2
This lecture note covers
3. Statistical Inference
❑There are two common types of statistical inference:
▪ Confidence interval is used when your goal is to estimate a
population parameter.
▪ Tests of significance is used to assess evidence in the data
about some claim.
❑A test of significance is a formal procedure for comparing
observed data with a claim (also called a hypothesis) whose truth
we want to assess.
▪ The claim is a statement about a parameter, like the population
proportion p or the population mean µ.
❑ We express the results of a significance test in terms of a
probability, called the P-value, that measures how well the data and
the claim agree.
4. The Reasoning of Tests of Significance
❑ Assume that you have been told that the average grade in a certain
course is 60/100 (claimed value).
▪ You take a group of students taking that course and collect the
grades of all of them.
▪ You calculate the statistic: sample mean and obtain ഥX = 90/100.
This looks like a high grade!!!
▪ We see that ഥX > 60. We would like to know just how certain we can
be that μ > 60.
▪ A confidence interval is not quite what we need. For example, if
we construct the CI that, with 95% CI, μ is between [58, 90]. It
does not directly tell us how confident we can be that μ > 60 and
how strong the evidence against the claim.
5. Tests of Significance
▪ Our aim will be to infer µ, the value of the mean for the
population.
▪ We are going to start with a very unrealistic situation:
assuming we know 𝜎, the standard deviation of the
distribution for the population.
6. Steps in Significance Tests
1. State the null and alternative hypothesis.
2. Calculate a test statistic to measure the compatibility between
the null hypothesis and the data.
‐ Test statistic =
estimate from data − 𝑡ℎ𝑒 𝑐𝑙𝑎𝑖𝑚𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑢𝑛𝑑𝑒𝑟𝐻0
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒
3. Calculate the probability of the estimate (the statistic you
measured) under the null hypothesis - P-value.
4. State a conclusion regarding evidence against the null
hypothesis.
7. Step 1: Null and alternative hypotheses
▪ The null hypothesis is the claim which is initially
favored or believed to be true. Often default or
uninteresting situation of “no effect” or “no difference”.
▪ THEN, we usually need to determine if there is strong
enough evidence against it.
▪ The test of significance is designed to assess the
strength of the evidence against the null hypothesis.
8. Back to our motivating example
Claimed value = 60/100, actually obtain ഥ𝒙 = 90/100.
1) Assuming that μ=60, is it just a rare case?
2) How rare is it? Is there some evidence that maybe the
average grade is greater than 60?
▪ The statement being tested is that the mean of the population
(the value of the parameter µ) is 60 – Null Hypothesis, 𝐇 𝟎.
‐ The test of significance is designed to assess the strength
of evidence against the null hypothesis.
▪ The alternate statement is that the mean of the population
(the value of the parameter µ) is > 60 – Alternative
Hypothesis, 𝐇 𝒂.
‐ The test of significance is designed to assess the strength
of evidence to support the alternative hypothesis.
9. Practice on null and alternative hypotheses formulation
Specifications for a water pipe call for a mean breaking strength μ
of more than 2000 lb per linear foot. Engineers will perform a
hypothesis test to decide whether to use a certain kind of pipe.
They will select a random sample of 1 ft sections of pipe, measure
their breaking strengths, and perform a hypothesis test. The pipe
will not be used unless the engineers can conclude that μ > 2000.
▪ How to set up the null hypothesis and the alternative hypothesis?
10. H0: μ = 60 vs. Ha: μ <60
Suspect the average grade is lower. One-sided Ha.
H0: μ = 60 vs. Ha: μ >60
Suspect the average grade is higher. One-sided Ha.
H0: μ =60 vs. Ha: μ ≠60
Suspect the average grade is different. Two-sided Ha.
Note:
you must decide on the setting, based on general knowledge,
before you see the data or other measurements.
Hypotheses Possibilities
11. The Basic Idea
Every time we perform a hypothesis test, this is the basic
procedure that we will follow:
1.We'll make an initial assumption about the population
parameter.
2.We'll collect evidence or else use somebody else's
evidence (in either case, our evidence will come in the
form of data).
3.Based on the available evidence (data), we'll decide
whether to "reject" or "not reject" our initial assumption.
12. Step 2: Test Statistic: Z Test for 𝛍
▪ We want to test whether we have evidence that the
mean of the population has a certain value μ0.
H0: 𝜇 = 𝜇0
▪ From the data (sample size n) we measure the sample
mean ത𝑋.
Z = Test Statistic =
ത𝑋 − 𝜇0
𝜎
√𝑛
Based on the CLT, ഥ𝒙 comes from a distribution N(µ0,
𝝈
𝒏
)
We know that under 𝑯 𝟎 the mean value for the population is µ0.
13. Step 3: P-value
❑In performing a hypothesis test, we
essentially put the null hypothesis
on trial. We begin by assuming that
H0 is true, just as we begin a trial by
assuming a defendant to be
innocent.
❑The hypothesis test involves
measuring the strength of the
disagreement between the sample
and H0 to produce a number
between 0 and 1, called a P value.
❑P-value is a probability, computed
assuming that H0 is true, that the
test statistics would take as
extreme or more extreme values as
the one actually observed.
14. More about P-value…
When the P-value is small, there are 2 choices:
1. The null hypothesis is true, and our observed effect is
extremely rare!
OR more likely…
2. The null hypothesis is false, and our data is telling us this
by the small P-value!
15. Significance Level
▪ We need a cut-off point (decisive value) that we can compare our
P-value to and draw a conclusion or make a decision. In other
words, how much evidence do we need to reject H0 ?
▪ This cut-off point is the significance level. It is announced in
advance and serves as a standard on how much evidence against
H0 we need to reject H0. Usually denoted α.
▪ Typical values of α: 0.05, 0.01.
▪ If not stated otherwise, assume α=0.05.
16. Step 4: The conclusion/decision
▪ If the P-value is smaller than a fixed significance level α, then
we reject the null hypothesis (in favor of the alternative).
▪ Otherwise we don’t have enough evidence to reject the null.
‐ If we don’t reject the null, do we accept it?
▪ Note: Should always report a P-value with your conclusion
and write the conclusion in terms of the problem.
18. Statistical Significance
The final step in performing a significance test is to draw a
conclusion ―reject H0 or fail to reject H0.
▪ If our sample result is too unlikely to have happened by
chance assuming H0 is true, then we will reject H0.
▪ Otherwise, we will fail to reject H0.
• Note: A fail-to-reject H0 decision in a significance test
does not mean that H0 is true. For that reason, you
should never “accept H0” or use language implying
that you believe H0 is true.
19. Why “fail to reject” H0 vs. “accept” H0?
❑ 𝐻0 Hypothesis: There are NO racoons in the backyard.
• Observation 1: I randomly go out and do not see racoons.
• Conclusion: 𝑯 𝟎 hypothesis “seems” to be correct now.
• Observation 2 at a later time: I see racoons in the yard...
• Conclusion: 𝑯 𝟎 hypothesis is incorrect!!!
Why not “accept null hypothesis”?
Can NOT “prove truth”, only “disprove truth”
▪ We fail to reject 𝐻0 Hypothesis based on Observation 1 may be
DUE to bad sample or small sample size.
▪ Only rejection is significant, that is, if reject 𝐻0, we have
significant conclusion that 𝝁 = 𝝁 𝟎 is untrue.
20.
21. Tests for a Population Mean
Example 1: [Two-sided test]
• A scale is to be calibrated by weighing a 1000 g test weight 60 times.
The 60 scale readings have mean 1000.6 g and standard deviation 2 g.
• Find the P-value for testing 𝐻0: μ = 1000 versus 𝐻1 : μ ≠ 1000.
23. Example 2 [One-sided Test]
▪ The article “Wear in Boundary Lubrication” (S. Hsu, R. Munro, and M.
Shen, Journal of Engineering Tribology, 2002:427–441) discusses
several experiments involving various lubricants. In one experiment, 45
steel balls lubricated with purified paraffin were subjected to a 40 kg
load at 600 rpm for 60 minutes. The average wear, measured by the
reduction in diameter, was 673.2 μm, and the standard deviation was
14.9 μm. Assume that the specification for a lubricant is that the mean
wear be less than 675 μm.
▪ Find the P-value for the testing 𝐻0 : μ ≥ 675 versus 𝐻1 : μ < 675.
Tests for a Population Mean
24.
25. One-sided vs. two-sided
▪ If, based on previous data or experience, we expect “increase”,
“more”, “better”, etc. (“decrease”, “less”, “worse”, etc.), then
we can use a one-sided test.
▪ Otherwise, by default, we use two-sided. Key words:
“different”, “departures”, “changed”…
26. The Relationship between Hypothesis Tests and Confidence Interval
❑In a hypothesis test for a population mean μ, we specify a
particular value of μ (the null hypothesis) and determine
whether that value is plausible.
❑In contrast, a confidence interval for a population mean μ
can be thought of as the collection of all values for μ that
meet a certain criterion of plausibility, specified by the
confidence level 100(1 − α)%.
A level α two-sided significance test rejects H0: µ=µ0 exactly when
µ0 falls outside a level 1- α confidence interval for µ.
27. Conclusions after using a Confidence Interval to do a Hypothesis Testing
Claimed value from null hypothesis fits
inside the CI?
Yes No
Fail to reject H0. Reject H0.
28. Relationship between C.I. and H.T. – recall example 1
• A scale is to be calibrated by weighing a 1000 g test weight 60
times. The 60 scale readings have mean 1000.6 g and standard
deviation 2 g.
• Find the 90% C.I. for the mean weight of the scale readings.
C=90% → z*=1.645
margin of error = 1.645×
2
√60
= 0.425
C.I. = (1000.6-0.425, 1000.6+0.425) = (1000.175, 1001.025).
• At 𝜶 = 𝟎. 𝟏, since 𝜇0 = 1000 is outside the above C.I. We
reject H0. We have significant evidence that the population
mean is different from 1000 minutes.
29. Choosing the level of significance
• α=0.05 is accepted standard, but…
• if the conclusion that Ha is true has “costly” implications,
smaller α may be appropriate
• not always need to make a decision: describing the evidence by
P-value may be enough
• no sharp border between statistically significant and
insignificant
30. Statistical vs. practical significance
• Statistically significant effect may be small:
Example (“Executive” blood pressure):
• µ0 = 128
• σ = 15
• n = 1000 obs.
• sample mean = 127
‐ Z = (127-128)/ (15/sqrt(1000)) = -2.11
‐ P-value for two-sided Ha = 2*0.0174=0.0348
‐ Significant??
▪ Stat. significance is not necessarily practical significance.
▪ Outliers may produce or destroy statistical significance.