2. Checklist
• The parameters of a distribution
• The idea of estimating parameters of a distribution
based on a sample
• Estimators and the estimation process
• Point vs interval estimates
• Sampling and non-sampling error
• Bias
• Sampling distribution of the mean and proportion
• Standard error
• The Central Limit Theorem
• The t distribution and working with t tables
• Confidence intervals
2
3. Introduc1on
• Inference
– “a conclusion reached on the
basis of evidence and reasoning.”
• Inferen0al sta0s0cs:
– Allows us to make decisions
about some characteris4cs of a
popula4on based on sample
informa4on.
– I.e. we draw conclusions about a
popula4on based on a sample. 3
4. Introduc1on
• We have discussed the characteris.cs and proper.es of
the probability distribu.ons of random variables
• These characteris.cs were the parameters:
– n and p for the Binomial Distribu4on
– λ for the Poisson Distribu4on
– μ and σ for the Normal Distribu4on
• In the real world we o;en don’t know the values of
these parameters and will have to es.mate them.
• Three key words:
–Es#ma#on(the process)
–Es#mate (the result)
–Es#mator (the facilitator)
4
5. Three approaches to estimating
unknown population parameters.
1. Census
2. Guess
3. The preferred method:
– draw a random sample of appropriate size from
the popula4on,
– use the sample data,
– choose a formula (called a sample sta4s4c) to
es4mate the unknown popula4on parameter.
5
6. Defini&on of Es&ma&on
Es#ma#on is the process by which we
es0mate the value of an unknown popula0on
parameter by making use of the data from a
random sample that was drawn from that
Popula0on.
6
7. THE ESTIMATION PROCESS
1. Iden'fy the Unknown Popula'on Parameter
2. Decide on the Size of the Random Sample: n
3. Select the Random Sample of Size n
4. Choose an Appropriate Sample Sta's'c [Es#mator]
5. Subs'tute the Sample Data into the Sample Sta's'c
6. Calculate the es'mate and interpret
7
8. Two Types of Es&mates
• Suppose we seek to es.mate the mean age of
Level I students on the Campus.
• We may draw a random sample of 100 Level I
students from the Campus, record their ages,
subs.tute the 100 values into the formula for the
mean of a sample (also called the sample sta)s)c
or es)mator), and read off the es)mate.
• The resul.ng es.mate can be
– a single value e.g. 20 or
– an interval of values ( 18 - 22).
8
9. Two Types of Es&mates
• Point es0mate
• Interval es0mate
of a popula4on parameter
9
10. Es&mators
• How do we use the data from our random sample
to arrive at an estimate?
• We substitute the sample data into a formula
better known as a sample statistic.
• These sample statistics are called estimators.
• A point estimator for an unknown population
parameter is a sample statistic into which the
data from the random sample is substituted, so
as to yield a point estimate of that parameter.
10
11. Commonly Used Point Estimators
Population Parameter Sample Statistic
Mean µ Sample Mean
Sample Median
Sample Mode
Standard Deviation σ Sample St. Dev s
Proportion p Sample Proportion p̂
11
12. Example
• The mean and standard deviation of the teaching experience
of faculty members in a department at a University are
unknown. A random sample of 5 faculty members was
selected; their teaching experience in years was as follows:
7, 8, 14, 7, 20
1. Identify suitable point estimators for the mean teaching
experience of the entire faculty
2. Identify suitable point estimators for the standard
deviation of teaching experience of the entire faculty
3. Find a point estimate of the mean teaching experience of
the entire faculty
4. Find a point estimate of the std deviation of the teaching
experience of the entire faculty.
12
13. Solution
1. We can use any of three point es'mators to es'mate the
popula'on mean: sample mean, sample mode or sample
median.
– On the basis of the three es.mators declared in 1. above, we
can compute three point es.mates.
• Sample Mean = 1/5 ( 7 + 8 + 14 + 7 + 20 ) = 11.2
• Sample Mode = 7
• Sample Median = 8
2. We can use the sample standard devia#on as the point
es#mator for the popula#on standard devia#on.
• The point es'mate of the popula'on standard devia'on is
the value of s .
s = 1/4 (4.2 2 + 3.22 + 4.22 +2.82 +8.82) = 5.718 13
14. Some Issues
• Since we must es?mate popula?on parameters from
samples, it is inevitable that we will make errors.
– Different sample sizes can give rise to different point
esRmates when the same esRmator is used
– Different esRmators can give rise to different point
esRmates when the same sample is used
– Different esRmators and different sample sizes can give
rise to different point esRmates
– Some esRmates will agree with the true value of the
populaRon parameter; others will not.
14
15. Error in Estimation
• The difference between the point estimate and the true value of the
population parameter is known as the total error in the estimate.
• This total error between the point estimate and the true value of the
population parameter can be the result of both sampling error and non-
sampling error.
• The sampling errors occur because of chance.
• Other errors may also arise as a result of human errors, and not chance;
these tend to impair the results obtained. Such errors are called non-
sampling errors.
TOTAL ERROR IN THE ESTIMATE = SAMPLING ERROR + NON-SAMPLING ERROR
15
16. Sources of Non-Sampling Error
• There are many poten0al sources of non-
sampling error:
– Inability to obtain all the required informa4on
from all elements of the sample
– Difficul4es in defining terms
– Differences in interpreta4on of ques4ons
– Errors in the data collec4on such as in recording or
coding
– Errors made in the data tabula4on ac4vity.
16
17. Example
• Consider a staRsRcs class of five students. Their exam scores
were: 70, 78, 80, 80 & 95.
• Find the populaRon mean.
• Suppose that a random sample of three students was drawn
i.e. 70, 80 & 95
• Use the sample data and the sample mean to esRmate the
populaRon mean.
• What is the difference due to chance?
• Now suppose that we mistakenly recorded 82 instead of 80.
• What would be the new esRmate of the populaRon mean?
• What is the new difference between the populaRon mean and
the point esRmate?
17
18. Example Continued
• It is this difference of 1.73 that we call the total error in the
estimate. It is subdivided into two components:
– The sampling error of 1.07
– The non-sampling error of 0.66
• As this error grows, the sample statistic will become less
useful as an estimator of the population parameter.
• We must therefore be able to determine the impact of the
error on the inferences that we will be making by
subjecting the estimators to specific tests. These are
discussed in the next chapter.
18
19. What is bias ?
• Bias is a tendency to lean in a certain direction, either in
favour of or against a particular thing. To be
truly biased means to lack a neutral viewpoint on a particular
topic.
• Statistical bias is a feature of a statistical technique or of its
results whereby the expected value of the results differs from
the true underlying quantitative parameter being estimated.
19
20. Unbiased Point Es7mator
• A point estimator !
𝜃 is said to be an unbiased
estimator of a population parameter 𝜃 if
E( !
𝜃)= 𝜃
• If E( !
𝜃)≠ 𝜃 then the point estimator is said to
be biased.
• The extent of the bias will be equal to
E( !
𝜃) – 𝜃
20
22. SAMPLING DISTRIBUTION OF THE MEAN
• Return to our popula.on of test scores for the class comprising five
students A, B, C, D and E.
• A = 70, B = 78, C = 80, D = 80, E = 95
• Popula'on Mean = 80.6 Popula'on Std Devia'on = 8.09
• We will now perform the following ac.vi.es.
1. Consider all possible samples of three scores from this popula6on;
there are 10 such samples.
2. Compute the sample mean for each of the 10 samples.
3. Construct the Frequency Distribu6on of Sample Means.
4. Construct the Rela6ve Frequency Distribu6on of Sample Means.
5. Rename Rela6ve Frequency as Probability to create the Probability
Distribu6on of the Sample Means
22
23. 1 & 2. Generating the 10 Random
Samples of Size 3
23
24. 3. The Frequency Distribution of
Sample Means
24
Sample
mean
Frequency
76.00 2
76.67 1
79.33 1
81.00 1
81.67 2
84.33 2
85.00 1
Σf= 10
25. 5. The Probability Distribu1on of
Sample Means (or The Sampling
Distribu1on of the Mean)
25
Sample
mean
Probability
76.00 0.2
76.67 0.1
79.33 0.1
81.00 0.1
81.67 0.2
84.33 0.2
85.00 0.1
Σ 1.00
Sample
mean
Frequency
76.00 2
76.67 1
79.33 1
81.00 1
81.67 2
84.33 2
85.00 1
Σf= 10
26. Sampling Distribu1ons in this Course
• In general, the probability distribu.on of a
Sample Sta.s.c is called its sampling distribu.on.
• We will focus on two sampling distribu3ons:
– Sampling Distribu0on of the Mean
– Sampling Distribu0on of the Propor0on
• In the Sampling Distribu.on of the Mean, the
random variable is the sample mean.
• In the Sampling Distribu.on of the Propor.on,
the random variable is the sample propor.on p̂
26
27. The Mean of the Sampling Distribution of the Mean
• The mean of the sampling distribution of the mean is
equal to the population mean μ.
Class Activity
• Compute the mean of the Sampling Distribution of the
Mean Score based on the ten random samples of size 3.
• Show that it is indeed equal to the population mean.
27
28. The Standard Devia1on of the
Sampling Distribu1on of the Mean
• The Standard Deviation of the Sampling Distribution of
Mean is given by σx where
• σx = σ/√n
• σx is also called the standard error.
• The spread of the Sampling Distribution of the Mean is
smaller than the spread of the corresponding
population distribution.
• The standard deviation of the Sampling Distribution of
Mean decreases as the sample size increases.
28
29. What kind of distribu2on will the Sampling Distribu2on of the Mean
have?
• If the population from which the samples are
drawn is normally distributed with mean μ and
standard deviation σ, then the Sampling
Distribution of the Mean will also be normally
distributed with mean μ and standard
deviation σx (irrespective of the sample size).
• Does the above result hold true if the
population is not normally distributed?
29
31. What kind of Probability Distribution does the Sampling Distribution
of the Mean possess when the population is not Normal ?
The Central Limit Theorem assures us that:
• If the sample size is large, the Sampling Distribution of
the Mean will be approximately normally distributed
with mean μ and standard deviation σx irrespective of
the distribution of the population.
• ‘Large’ is taken to mean n≥30
• What happens when the sample size is small i.e. n < 30?
31
32. What kind of Probability Distribu2on does the Sampling Distribu2on of
the Mean possess when the popula2on is not Normal and sample size
is small i.e. n < 30?
• We must look to the Student t Distribu2on
• The Student t DistribuRon is a specific type of bell-shaped
distribuRon with a lower height and a wider spread than the
Standard Normal DistribuRon.
• The Student t DistribuRon has only one parameter i.e. the number
of degrees of freedom abbreviated df
• The number of degrees of freedom is the number of observaRons
that can be freely chosen.
• The mean of the Student t DistribuRon is 0
• The standard deviaRon of the Student t DistribuRon is df/(df – 2)
• As the degrees of freedom increases the Student t DistribuRon
approaches the Standard Normal DistribuRon. 32
33. • If the popula0on from which the samples are
drawn is either of unknown distribu0on or not
normally distributed with mean μ and standard
devia0on σ, then the Sampling Distribu0on of the
Mean is specified by the Student t DistribuBon
with n - 1 degrees of freedom.
• The random variable of the Student t Distribu4on is given
by t where:
33
What kind of Probability Distribution does the Sampling Distribution
of the Mean possess when the population is not Normal and sample
size is small i.e. n < 30?
t =
!"#
$!
34. The Sampling Distribution of Proportion
The Sampling Distribu.on of Propor.on
• The probability distribu.on of the sample
propor.on is called the Sampling Distribu.on of
the Propor.on.
• The random variable of the Sampling Distribu.on
of the Propor.on is p̂
• The mean of the Sampling Distribu.on of the
Propor.on is the popula.on propor.on p.
• The standard devia.on of the Sampling
Distribu.on of the Propor.on is given by √(pq/n).
34
35. What is the shape of the Sampling
Distribution of the Proportion?
The Central Limit Theorem assures us that:
• If the sample size is sufficiently large, the
Sampling Distribu0on of the Propor0on will be
approximately normally distributed with mean
p and standard devia0on √(pq/n).
• Sufficiently Large means np > 5 and nq > 5.
35
36. Interval Estimates: Confidence
Intervals
• We were speaking all along about Unbiased Point
Estimators.
• Instead of assigning a single value to an unknown
population parameter, we can construct an interval
of values around the point estimate and make a
probabilistic statement that the interval contains the
value of the corresponding population parameter.
• Such activity is called interval estimation and interval
estimators are called Confidence Intervals.
• These estimators, when applied to the data from a
random sample, defines an interval that is likely to
contain the true value of the population parameter
being estimated. 36
39. Interval Estimates
• An interval that is constructed based on the confidence level is called a
confidence interval.
• A 90% Confidence Interval means a 10% significance level i.e. α = 10%
• A 95% Confidence Interval means a 5% significance level i.e. α = 5%
• Confidence Interval Estimates in this course are as follows:
– For the population mean based on large samples
– For the population mean based on small samples
– For the population mean based on large samples with σ unknown
– For the population mean based on small samples with σ unknown
– For the population proportion
39
40. A 100 (1 - α)% Confidence Interval
EsEmate for the PopulaEon Mean μ
• Let X ~ N(μ , σ) where σ is known. A single sample of size n
was drawn and the sample mean X is computed.
• On the basis of this sample mean we seek to find a
100(1 - α)% Confidence Interval Es#mate for μ.
• A 100( 1 – α)% interval es'mate for the popula'on mean μ
is given by:
X – Zα/2 σx ≤ μ ≤ X + Zα/2 σx
or
(X – Zα/2 σx , X + Zα/2 σx)
where Zα/2 is the standard score that cuts off a tail area of
α/2% in the Standard Normal Curve. 40
41. A 100( 1 – α)% Interval
Es2mate for the
Popula2on Mean μ
(μ – Zα/2 σx , μ + Zα/2 σx)
where Zα/2 is the
standard score
that cuts off a tail
area of
%
&
% in the
Standard Normal
Curve.
41
44. Example
• Find a 100( 1 – α)% Interval Estimate for the
Population mean μ using the following:
§ α = 5%
§ Sample mean = 52
§ σx= 4
CI = μ – Zα/2 σx to μ + Zα/2 σx
44
45. Example
• Find a 100( 1 – α)% Interval Es0mate for the
Popula0on mean μ using the following:
§ α = 5%
§ Sample mean = 52
§ σx= 4
CI = μ – Zα/2 σx to μ + Zα/2 σx
45
95% Confidence Interval=
μ – Zα/2 σx to μ + Zα/2 σx
52 – (1.96 x 4) to 52 + (1.96 x 4)
52 – 7.84 to 52 + 7.84
44.16 to 59.84
46. Example
• Find a 100( 1 – α)% Interval Es0mate for the
Popula0on mean μ using the following:
§ α = 5%
§ Sample mean = 52
§ σx= 4
CI = μ – Zα/2 σx to μ + Zα/2 σx
46
95% Confidence Interval=
μ – Zα/2 σx to μ + Zα/2 σx
52 – (1.96 x 4) to 52 + (1.96 x 4)
52 – 7.84 to 52 + 7.84
44.16 to 59.84
Find μ
Get Z from
tables (using
half of alpha)
Calculate σx
51. • Confidence level 99% or .99
• The sample size is large (n ≥ 30)
§ Therefore, we use the normal distribution
§ z = 2.58
§ Thus, we can state with 99% confidence that the current mean
annual cost to major U.S. banks of all individual checking
accounts is between $495.79 and $504.21
51
52. A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ where σ is unknown
Let X ~ N(μ , σ) where σ is unknown. A single sample of
size n was drawn and the sample mean X was
computed. On the basis of this single sample mean, find
a 100(1 - α)% Confidence Interval EsMmate for μ.
• Here we subs4tute s for the unknown σ.
• However, it mamers whether n is large i.e. (n ≥ 30) or
small i.e. (n < 30)
– If n ≥ 30 the CLT allows us to use the Normal Distribu'on
N(μ , s/√n ) as the Sampling Distribu'on
– If n < 30 the CLT allows us to use the Student-t
Distribu'on with n – 1 df as the Sampling Distribu#on.
52
53. A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ where σ is unknown and n ≥ 30
• A 100( 1 – α)% interval es.mate for the
popula.on mean μ when n ≥ 30 and σ is
unknown is given by
X – Zα/2 s/√n ≤ μ ≤ X+ Zα/2 s/√n
or
(X – Zα/2 s/√n, X+ Zα/2 s/√n)
• where Zα/2 comes from the Std Normal
Distribu.on and s is the sample standard
devia.on. 53
54. A 100 (1 - α)% Confidence Interval Estimate for the
Population Mean μ where σ is unknown and n ≤ 30
• A 100( 1 – α)% interval estimate for the
population mean μ when n < 30 and σ is
unknown is given by
X – tα/2 s/√n ≤ μ ≤ X + tα/2 s/√n
or
( X – tα/2 s/√n , X + tα/2 s/√n )
• where tα/2 comes from the Student-t Distribution
with (n – 1) degrees of freedom and s is the
sample standard deviation 54
60. 60
Find the values
of t for:
• 12 df and 0.025
area in the right
tail.
• 20 df and 0.01
area in the right
tail.
• 20 df and 0.05
area in the right
tail.
• 15 df and 0.005
area in the leA tail
• 22 df and 0.001
area in the leA
tail.
65. Class Exercise 1
• The standard deviation for a population is 14.8.
• A sample of 100 observations selected from this
population gave a mean of 143.72.
– Construct a 99% confidence interval for μ
– Construct a 95% confidence interval for μ.
– Construct a 90% confidence interval for μ.
– Does the width of the confidence intervals
constructed in parts a. to c. decrease as the
confidence level decreases? Explain.
65
66. Answer to Class Exercise 1
• 99% CI is (139.92 and 147.52)
• 95% CI is (140.82 and 146.62)
• 90% CI is (141.28 and 146.16)
• No0ce that the width of the Confidence
Interval decreases as the Confidence level
decreases.
• It makes sense right? Why?
66
67. Another Class Exercise
• A sample of 10 observa0ons taken from a
normally distributed popula0on produced the
following data:
44 52 31 48 46 39 47 36 41 57
a. What is the point es0mate of μ?
b. Construct a 95% confidence interval for μ.
67
68. A 100 (1 - α)% Confidence Interval Estimate for
the Population Proportion p.
• A 100( 1 – α)% interval es.mate for the popula.on
propor.on p is given by
p̂ – Zα/2 √(pq/n) ≤ p ≤ p̂ + Zα/2 √(pq/n)
or
(p̂ – Zα/2 √(pq/n) , p̂ + Zα/2 √(pq/n))
• where Zα/2 comes from the Std Normal Distribu.on.
68
69. IMPORTANT !!!
• Some versions of the on-line text say that
when popula0on standard devia0on is not
known, the t distribu0on should be used for
hypothesis tes0ng.
• In this course (and in prac0ce) we use the Z
tables for hypothesis tes0ng once the sample
size is large (at least over 30).
69
70. End of Lecture
• We have reviewed the Confidence Intervals
that form an integral part of the 5 stages of a
sta0s0cal analysis.
• Next we move on to another level of
inves0ga0on with respect to sample data.
• This involves Hypothesis tes0ng.
70