2. HYPOTHESIS TESTING: THE BASICS
• Why hypothesis testing?
• To investigate whether a statement about the population average (based on a sample
derived from that population) is valid.
• Assume that someone believes that the average annual income for data analysts (μ) is
BDT600K based on past evidence or an educated opinion.
• Assume that we derive a sample from the population
• If the sample average ( ҧ
𝑥) is BDT550K, does this mean that the population mean is
different from BDT600K and therefore the original statement is wrong ?
• Could it be that the statement is correct and that we simply derived a sample out of the
many possible quite different from the population mean?
• Or can we use it to tell us whether the population average cannot be equal to the
proposed average value?
3. HYPOTHESIS TESTING: THE REASONING
• The reasoning is based on the central limit theorem.
• We are testing a suggestion about the population mean.
• If it was to be true then it would have been at the centre of the normal distribution of all
the sample averages and every sample average would have been distributed around it.
We assume that this is the case.
• We will therefore estimate the z value by using the hypothesised population mean.
• By stating our level of significance (e.g. a=5%/2 = 2.5 %) and its corresponding critical
values (1.96, -1.96) we define a threshold for the ‘rejection’ or ‘not rejection’ of the null
hypothesis.
• That means if we estimate a z value greater than 1.96 or smaller than -1.96 (the critical
values) then we will reject the Null Hypothesis.
4. HYPOTHESIS TESTING: THE REASONING
• Why? Because assuming that the suggested population average is correct and it is at the
centre of our distribution the probability of selecting a sample that is more than 1.96
standard deviations away from the suggested population average is less than 2.5 %.
• Therefore we cannot believe that we were so unlucky that we selected a sample that
gives an average value that is less that 2.5 % likely of being observed, ASSUMING
that the population average that has been suggested is the real one.
• So instead we believe that we actually picked up a sample that has more than 2.5 %
chance of being selected, and the only way that this can be true is if the population mean
has a different value than the one being suggested.
• A different population mean (that had a value that is closer to the one derived from our
sample) would have shown that the probability of observing our sample average would
have been more than 2.5 %.
5. HYPOTHESIS TESTING: THE STEPS
1. Set up the Null Hypothesis (H0)
2. Set up the Alternative Hypothesis (Ha)
3. Specify the Significance Level (α) and identify its corresponding critical values
4. Look at your sample data
5. Calculate the statistics for the test (test statistic)
6. Technical conclusion (Reject or not reject)
7. Draw your conclusion in non-technical terms referring to the original hypothesis
6. NULL & ALTERNATIVE HYPOTHESIS
• When we conduct hypothesis testing, we identify two mutually exclusive
hypotheses:
• Null Hypothesis (H0)
• Alternative Hypothesis (Ha)
• In hypothesis testing, we always assume that H0 is true until proven
otherwise
• We can develop hypotheses using a single population parameter or more
than one population parameter
7. REJECTION REGION
Two-tail test: When our null hypothesis is of the form (e.g. μ = BDT600K) and the alternative hypothesis of
the form (e.g. μ BDT600K) then we have a two-tail test.
One-tail test: When our null hypothesis is of the form (e.g. μ ≥ BDT600K or μ ≤ BDT600K ) and the alternative
hypothesis of the form (e.g. μ < BDT600K or μ > BDT600K respectively) then we have a one-tail test.
Reject Ho Reject Ho
2.5% 2.5%
Accept Ho
+ z/t value
- z/t value
5%
Reject
Ho
- z/t value
Accept
Ho
5%
Reject
Ho
Accept
Ho
+ z/t value
8. ✓ Type 2 Error
β
✓
Type 1 Error
α
Accept Null Hypothesis
Reject Null Hypothesis
Null Hypothesis
True
Null Hypothesis
False
Decision
The guilty go
free
The innocent
are punished
Free
Punish
Innocent Guilty
TYPE 1 & 2 ERRORS
9. PARAMETRIC HYPOTHESIS TESTS
• Hypothesis tests that we want to carry out could be based on sample statistics
(sample average, proportion) and could be used to provide estimates about
population parameters (population average, proportion).
• They are therefore called parametric tests.
• However they cannot be applied on all types of data.
• They can be applied on interval or ratio data scales but not on ordinal or
nominal. Why?
• Because although in the former group of scales, statistical measures such as the mean
or standard deviation can be estimated, in the latter group they cannot.
10. ONE-SAMPLE HYPOTHESIS TESTS
Three types of one sample tests:
1. H0: parameter ≤ constant
Ha: parameter > constant
2. H0: parameter ≥ constant
Ha: parameter < constant
3. H0: parameter = constant
Ha: parameter ≠ constant
It is not correct to formulate a null hypothesis using >, <, or ≠.
11. EXAMPLE - 1
Assume that the average time it takes to answer a call on a switchboard is
taken for a random sample of 81 calls. The resulting sample mean is found
to be 28 seconds with a standard deviation of 9 seconds. Previously it has
been estimated that the average time to answer a call was 25 seconds.
Test this argument at the 5% significance level.
12. EXAMPLE 1 - SOLVED
1. Set up the null hypothesis (Ho): The average time it takes for a phone call to be
answered is 25. H0: μ = 25
2. Set up the alternative hypothesis (Ha): The average time that it takes for a phone
call to be answered is not 25. Ha: μ 25
3. Specify the significance level (α) and identify the theoretical value: 5%
significance level with a critical value of za/2 = 1.96
4. Look at your sample data: The sample data suggests that the average time it takes
to answer a call is more than 25 seconds
5. Calculate the statistics for the test: 𝑧 =
𝑥
−
−𝜇
Τ
𝜎 𝑛
=
28−25
Τ
9 81
= 𝟑
6. Technical conclusion: The calculated value (3) is greater than the critical value
(1.96) and therefore we reject the null hypothesis
7. Conclusion in non-technical terms: There isn’t sufficient evidence to suggest that
the mean time to answer a call is 25 seconds
13. EXAMPLE - 2
The average time it takes to answer a call on the switchboard is taken for a
random sample of 25 calls. The resulting sample mean is found to be 28
seconds with a standard deviation of 9 seconds. Previously it has been
estimated that the average time to answer the call was 25. Test at the 5%
significance level the hypothesis that the average time for all phone calls to
be answered is still μ = 25.
14. EXAMPLE 2 - SOLVED
1. Set up the null hypothesis (Ho): The average time it takes for a phone call to be
answered is 25. H0: μ = 25
2. Set up the alternative hypothesis (Ha): The average time that it takes for a phone
call to be answered is not 25. Ha: μ 25
3. Specify the significance level (α) and identify the theoretical value: 5%
significance level with a critical value of t(0.025,24) = 2.064
4. Look at your sample data: The sample data suggests that the average time it takes
to answer a call is more than 25 seconds
5. Calculate the statistics for the test: 𝑡 =
𝑥
−
−𝜇
Τ
𝑠 𝑛
=
28−25
Τ
9 25
= 𝟏. 𝟔𝟔𝟔
6. Technical conclusion: The calculated value is lower than the critical value (|1.666|<
|2.064|) and therefore we cannot reject the null hypothesis.
7. Conclusion in non-technical terms: We cannot reject the claim that the average
time it takes to answer a phone call is 25 seconds
15. EXERCISE - CADSOFT
• CadSoft, a producer of computer-aided design software for the aerospace
industry receives numerous calls for technical support. In the past, the
average response time has been at least 25 minutes. The company has
upgraded its information systems and believes that this will help reduce
response time to less than 25 minutes. The company collected a sample
of 44 response times in the Excel worksheet “CadSoft Response Times”.
Check whether the company is right in their belief.
16. EXERCISE - VACATION SURVEY
• The Excel worksheet “Vacation Survey” shows a portion of data collected
in a survey of 34 respondents by a travel agency. Suppose that the travel
agency wanted to target individuals who were approximately 35 years
old. Test whether the average age of respondents is equal to 35.
17. ONE-SAMPLE HYPOTHESIS TESTS
Many important business measures, such as market share or the fraction
of deliveries received on time, are expressed as proportions. We may
conduct a test of hypothesis about a population proportion in a similar
fashion as we did for means. The test statistic for a one-sample test for
proportions is:
18. EXERCISE - CADSOFT
• CadSoft also sampled 44 customers and asked them to rate the overall quality
of the company’s software product using a scale of 0-very poor, 1-poor, 2-good,
3-very good, 4-excellent. The company collected a sample of 44 response
times in the Excel worksheet “CadSoft Product Satisfaction”. The firm tracks
customer satisfaction of quality by measuring the proportion of responses in the
top two categories. Over the past, this proportion has averaged about 75%. Is
there sufficient evidence to conclude that this satisfaction measure has
significantly exceeded 75% using a significance level of 0.05?
19. TWO-SAMPLE HYPOTHESIS TESTS
We can also have two-sample tests in which we compare means of two samples
H0: μ1 – μ2 = 0 (can also be ≤ or ≥)
Ha: μ1 – μ2 ≠ 0 (can also be < or >)
The samples may be matched (a.k.a. paired) or may be independent
20. PAIRED TWO-SAMPLE T-TEST FOR MEANS
A matched-pairs t-test are samples that are paired or related in some
fashion.
For example:
• If you wished to compare prices of groceries across grocery stores. Would need
to compare the same products at two (or more) different stores
• If you wished to measure the effectiveness of a new diet you would weigh the
dieters at the start and at the finish of the program.
21. EXERCISE - PILE FOUNDATION
• The Excel worksheet “Pile Foundation” contains the estimates used in a bid
and actual auger-cast pile lengths that engineers ultimately had to use for a
foundation engineering project. The contractor’s past experience suggested
that the bid information was generally accurate, so the average of the paired
differences between the actual pile lengths and estimated lengths should be
close to zero. After this project was completed, the contractor found that the
average difference between the actual lengths and the estimated lengths
was 6.38. Could the contractor conclude that the bid information was poor?
22. INDEPENDENT SAMPLES: TWO-SAMPLE T-TEST
• The prior example assumed we had matched or dependent data (i.e., the data
from one population corresponds to data from another population)
• Sometimes we have data taken independently from two populations
• Independence means that the observations from one population have no influence on
the observations from the other population
• We use different hypothesis tests based on whether the variances between the
samples are statistically the same
• Test for equality of variances between two samples using the F-test.
23. EXERCISE - COMPARING SUPPLIER PERFORMANCE
• The last two columns in the “Purchase Orders” worksheet provide the order
date and arrival date of all orders placed with each supplier. The time
between placement of an order and its arrival is commonly called the lead
time (computed by subtracting the dates from each other). Purchasing
managers have noted that they order many of the same types of items from
Alum Sheeting and Durrable Products and are considering dropping Alum
Sheeting from its supplier base if its lead time is significantly longer than that
of Durrable Products. Should they?