2. Objectives
At the end of this session the student will be able to:
Describe Statistical inference and estimation
Differentiate between point and interval estimation
Compute appropriate confidence intervals for population
means and proportions and interpret the findings
Describe methods of sample size calculation
October 6, 2023 2
3. Outline
Definition of Statistical inference and estimation
Point estimation
Interval estimation
Sample size calculation
October 6, 2023 3
4. Descriptive statistics help investigators describe and
summarize data.
Probability and sampling distribution concepts needed to
evaluate data using statistical methods.
Without probability and sampling distribution theory:
we could not make statements about populations without
studying everyone in the population
studying everyone population is an undesirable and often
impossible task
October 6, 2023 4
5. Statistical inference
Statistical inference
is the procedure by which we reach a conclusion about a
population on the basis of the information contained in a
sample that has been drawn from that population.
The two primary methods for making inference are
estimation and hypothesis testing.
October 6, 2023 5
7. Statistical Estimation
Estimation: is the process of determining a likely value for a
variable in the population based on information collected
from the sample.
The use of sample statistics to estimate population
parameters.
Researchers are usually interested in looking at estimates
of many statistics , totals, averages and proportions
E.g. Estimates for the proportion of smokers among all people
aged 15 to 24 in the population
October 6, 2023 7
9. 1. Point Estimation
A single numerical value is used to estimate the
corresponding population parameter.
Point estimate is always within the interval estimate
is an estimator of the population mean μ.
s is an estimator of the population standard deviation σ
p is an estimator of the population proportion π.
r is an estimator of the population correlation coefficient ρ
October 6, 2023 9
10. From a single sample we can calculate a sample statistic to estimate
a single parameter (a point estimate).
Point estimate for population mean µ is
Point estimate for population proportion is given by
Where x is the total number of success (events)
October 6, 2023 10
n
x
=
x
n
1
=
i
i
n
=
p
x
11. Properties of a Good Estimates
a.Unbiasedness
A sample statistic whose mean is equal to the population
parameter it estimates is unbiased.
The sample mean and median are unbiased estimators of
the population mean μ.
b. Minimum variance
An estimate which has a minimum standard error is a good
estimator.
For symmetrical distribution the mean has a minimum
standard error and
If the distribution is skewed the median has a minimum
standard error.
October 6, 2023 11
12. c. Consistency
As sample size increases, variation of the estimator from
the true population value decreases
October 6, 2023 12
13. 2. Interval estimation
Interval estimation: is a statement that a population parameter
has a value lying between two specified limits.
An interval estimate provides more information about a
population characteristic than a point estimate
The value of the sample statistic will vary from sample to
sample therefore to simply obtain an estimate of the single
value of the parameter is not generally acceptable.
October 6, 2023 13
14. We need to take into account the sample to sample variation of
the statistic.
A confidence interval defines an interval within which the
true population parameter is like to fall (interval estimate)
October 6, 2023 14
16. October 6, 2023 16
Interval estimate (Confidence interval) - consists of two
numbers, a lower limit and an upper limit which serve as
the bounding values within which the parameter is expected
to lie with a certain degree of confidence
19. Confidence interval therefore takes into account the sample
to sample variation of the statistic and gives the measure of
precision.
The general formula used to calculate a Confidence interval
is Estimate ± K × Standard Error, k is called reliability
coefficient
Confidence intervals express the inherent uncertainty in any
medical study by expressing upper and lower bounds for
anticipated true underlying population parameter.
Most commonly the 95% confidence intervals are
calculated, however 90% and 99% confidence intervals are
sometimes used.
October 6, 2023 19
21. The 95% confidence interval is interpreted in such a way that,
under the conditions assumed for underlying distribution, you are
95% confident that the interval contains the true parameter.
90% CI is narrower than 95% CI since we are only 90% certain
that the interval includes the population parameter.
The 99% CI is wider than 95% CI; the extra width meaning that
we can be more certain that the interval will contain the
population parameter.
October 6, 2023 21
23. But to obtain a higher confidence from the same sample, we must
be willing to accept a larger margin of error (a wider interval).
For a given confidence level (i.e. 90%, 95%, 99%) the width of
the confidence interval depends on the Standard Error of the
estimate which in turn depends on the:
Sample size:-The larger the sample size, the narrower the
confidence interval and the more precise our estimate.
Lack of precision means in repeated sampling the values of the
sample statistic are spread out or scattered.
The result of sampling is not repeatable.
October 6, 2023 23
24. You can make the precision as high as you want by taking a
large enough sample.
The margin of error decreases as√n increases.
Standard deviation:-The more the variation among the
individual values, the wider the confidence interval and the
less precise the estimate.
As sample size increases SD decreases.
October 6, 2023 24
26. Confidence Intervals for
• A single population mean
• A single population proportion
• Difference between two population means
• Difference between two population proportions
October 6, 2023 26
27. 1) C.I. for a population mean (normally distributed)
a) Known variance (large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values of α are
0.05, 0.01, 0.001 and 0.1.
October 6, 2023 27
28. Example
A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.
He assume that strength scores are approximately normally
distributed with a variance of 144.
A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.
October 6, 2023 28
29. Solution:
⇒ We are 99% confident that the population mean is between
76.3 and 92.3.
October 6, 2023 29
30. E.g. 2. A random sample of 100 cancer patients treated with a
new drug has a mean survival time of 46.9 months.
If the SD of the population is 43.3 months, find a 95%
confidence interval for the population mean.
Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ± 8.5 = (38.4 to 55.4
months)
Hence, there is 95% certainty that the limits (38.4, 55.4) contain
the mean survival times in the population from which the sample
arose.
October 6, 2023 30
31. b) Unknown variance (small sample size n ≤ 30)
A 100(1‐α)% C.I. for μ is
The t distribution density curve is bell shaped and symmetrical
about zero.
Different curves for different df (i.e. sample sizes) and for very
large df very close to Z.
October 6, 2023 31
32. The Z-test is applied when:
The distribution is normal
The population standard deviation σ is known or
When the sample size n is large ( n ≥ 30) and
With unknown σ (by taking S as estimator of σ).
October 6, 2023 32
33. But, what happens when n< 30 and σ is unknown?
We will use a t-distribution which depends on the number of
degrees of freedom (df).
The t-distribution is a theoretical probability distribution (i.e.
its total area is 100 percent) and is defined by a mathematical
function.
The distribution is symmetrical, bell-shaped and similar to the
normal but more spread out.
October 6, 2023 33
34. As the df decrease, the t-distribution becomes increasingly
spread out compared with the normal.
The sample standard deviation is used as an estimate of σ (the
standard deviation of the population which is unknown) and
appears to be a logical substitute.
For large sample sizes (n ≥ 30), both t and Z curves are so close
together and it does not much matter which you use.
October 6, 2023 34
35. Degrees of Freedom
It is defined as the number of values which are free to vary
after imposing a certain restriction on your data.
Example: If 3 scores have a mean of 10, how many of the scores
can be freely chosen?
Solution: The first and the second scores could be chosen freely
(i.e., 8 and 12, 9 and 5, 7 & 15, etc.)
But the third score is fixed (i.e., 10, 16, 8, etc.)
Hence, there are two degrees of freedom
October 6, 2023 35
36. Table of t-distributions
The table of t-distribution shows values of t for
selected areas under the t curve.
Different values of df appear in the first column.
The table is adapted for efficient use for either one or
two-tailed tests.
October 6, 2023 36
37. E.g. 1) If df = 8, 5% of t scores are above what value?
Solution:
Look at along the row labeled “one tail” to the value 0.05;
The intersection of the 0.05 column and the row with 8 in the df
column gives the value of t = 1.86.
E.g. 2) Find t if n =13 and 95% of t scores are between –t and +t.
Solution: df =13-1 = 12. If 95% of t scores are between -to and + to,
then 5% are in the two tails.
October 6, 2023 37
38. Look at the table along the row labeled “two tail” to
the value 0.05;
The intersection of this 0.05 column and the row
with 12 in the df column gives to = 2.179.
E.g 3. If df =5, what is the probability that a t score is
above 2.02 or below -2.02?
Solution:
Two tails are implied. Look along the “df =5” row to
find the entry 2.02.
The probability is 0.10
October 6, 2023 38
41. E.g. the mean pulse rate and standard deviation of a random sample
of 9 first year male medical students were 68.7 and 8.67 beats
per minute respectively. (Assume normal distribution).
Find a 95% C.I. for the population mean.
Solution: t α/2, n-1= t= t(0.025, 8) = 2.31
t tab (with α = 0.05 and (n-1 )df = ± 2.31 and S / √ 9 = 2.89
Therefore, 95% C.I. for μ = 68.7 ± (2.31 x 2.89) = 68.7 ± 6.7
= (62.0 to 75.4) beats per minute.
We are 95% sure that the population mean (μ) lies between 62.0
and 75.4
October 6, 2023 41
57. 3) C.I. for a population proportion (large sample size)
October 6, 2023 57
58. p = 123/300 = 0.41 a point estimator of π.
α = 0.05 ⇒ Z0.025 = 1.96
Example 2: An epidemiologist is worried about the ever increasing
trend of malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
• If he takes a random sample of 150 persons in that locality
during the peak transmission period and finds that 60 of them are
positive for malaria,
October 6, 2023 58
59. Find: a) 95% b) 90% c) 99% confidence intervals
for the proportion of the whole infected people in that
locality during the peak malaria transmission period.
Solution:
• Semple proportion = 60 / 150 =0.4
a) A 95% C.I for the population proportion (the
proportion of the whole infected people in that
locality) = 0.4 ± 1.96 (0.04) = (0.4 ± 0.078) =
(0.322, 0.478).
b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065)
c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1)
October 6, 2023 59
63. Sample size determination
How many samples should be taken from the larger
population to have a representative sample?
it depends on :
objective of the study;
design of the study;
How different or dispersed the population
accuracy of the measurements to be made;
degree of precision required for generalization;
degree of confidence with which to conclude
Availability of resources
October 6, 2023 63
64. Sample size determination
• Given confidence interval
• Hence the absolute precision denoted by d is given as
• Where s.e is the standard error of the estimator of the
parameter of interest.
October 6, 2023 64
e
s
z
proportion
mean .
)
(
2
e
s
z
d .
2
66. Sample size for single population proportion
If the study aims to be conducted on single population, then we
need the following :
1. What is the probability of the event occurring?
2. How much error is tolerable ?or How much precision do
we need?
3. How confident do we need to be that the true population
value falls within the confidence interval?
October 6, 2023 66
68. Where:
n-is minimum sample size
p-is estimate of the prevalence rate for the
population
d-is the margin of sampling error tolerated
Zα/2 is the standard normal variable at (1-α)100%
confidence level and α is mostly 5%
October 6, 2023 68
70. Example
1.A hospital administrator wishes to know what proportions of
discharged patients are unhappy with the care received
during hospitalization. If 95% Confidence interval is desired
to estimate the proportion within 5%, how large a sample
should be drawn?
n = Z2p(1-p)/d2=(1.96)2 (.5×.5)/(.05)2 =384.2
≈ 385 patients
October 6, 2023 70
Editor's Notes
Population parameter: The underlying (unknown) distribution of the variable of interest for a population.
Sample parameter: Estimates of the population parameters obtained from a sample.
Usually, we only have a sample and, don’t know the entire population.
Eg: Point estimate of 0.94 for population proportion
It is not reasonable to assume that the population proportion is exactly 0.94
The probability of getting a sample statistic value that is exactly equal to the corresponding population parameter is usually quite small.
It may be reasonable to assume that 0.94 is close to the population proportion
We use a point estimate to obtain an interval estimate
Usually, we only have a sample and, don’t know the entire population.
Eg: Point estimate of 0.94 for population proportion
It is not reasonable to assume that the population proportion is exactly 0.94
The probability of getting a sample statistic value that is exactly equal to the corresponding population parameter is usually quite small.
It may be reasonable to assume that 0.94 is close to the population proportion
We use a point estimate to obtain an interval estimate
Example
• A data on 199 patients on systolic blood pressure
gives a mean value of 125.8 mmHg. Let us assume
that the standard deviation for this patient
population is known to be 20 mmHg. Construct a 95
percent confidence interval for the population mean.
Solution
α -0.05 1.96 / 2 = ⇒ = α Z
199
125.8 ±1.96× 20
• The 95% CI is (123.0, 128.6 mmHg )
• We are 95% sure that the average systolic
blood pressure for similar patients is between
123 and 128.6.
Example
• In a study of preeclampsia, Kaminski and Rechberger
found the mean systolic blood pressure of 10 healthy,
nonpregnant women to be 119 with a standard
deviation of 2.1.
A. What is the estimated standard error of the mean?
B. Construct the 99% confidence interval for the mean
of the population from which the 10 subjects may
be presumed to be a random sample.
C. What is the precision of the estimate?
D. What assumptions are necessary for the validity of
the confidence interval you constructed?0.66, The 99% CI is (116.8, 121.2), C. Precision = 3.250 X 0.66
= 2.16, D. The population is normally distributed
The 10 subjects represent a random sample
from this population
Heterogeneity: need larger sample to study more
diverse population
• Desired precision: need larger sample to get smaller
error
• Sampling design: smaller if stratified, larger if cluster
• Nature of analysis: complex multivariate statistics
need larger samples
• Accuracy: depends upon sample size, not to ratio of
sample to population