1. 9
Estimation and Confidence Interval
ESTIMATION
[Q: Write short notes on: Estimation (BSMMU, Radiology,
January 2012)]
One area of concern in inferential statistics is the estimation
of the population parameter from the sample statistic. It is
important to realize the order here. The sample statistic is
calculated from the sample data and the population
parameter is inferred (or estimated) from this sample statistic.
Let me say that again: Statistics are calculated, parameters are
estimated.
Definition: procedures used to calculate the value of
some property of a population from observations of a
sample drawn from the population.
An area of inferential statistics is sample size
determination. That is, how large of a sample should be
taken to make an accurate estimation.
A good estimator must satisfy three conditions:
1. Unbiased: The expected value of the estimator must
be equal to the mean of the parameter
2. Consistent: The value of the estimator approaches the
value of the parameter as the sample size increases
3. Relatively Efficient: The estimator has the smallest
variance of all estimators which could be used
Types
When a parameter is being estimated, the estimate can be
either a single number or it can be a range of scores. When
2. Biostatistics-68
the estimate is a single number, the estimate is called a "point
estimate"; when the estimate is a range of scores, the
estimate is called an interval estimate. Confidence intervals
are used for interval estimates.
So, there are two types of estimates we will find: Point
Estimates and Interval Estimates
1. Point estimate: A point estimate of a population
parameter is a single value of a statistic.
o For example, the sample mean x is a point estimate of
the population mean μ. Similarly, the sample
proportion p is a point estimate of the population
proportion P.
o As an example of a point estimate: assume you
wanted to estimate the mean time it takes 12-year-
olds to run 100 yards. The mean running time of a
random sample of 12-year-olds would be an estimate
of the mean running time for all 12-year-olds. Thus,
the sample mean, M, would be a point estimate of
the population mean, μ.
o Often point estimates are used as parts of other
statistical calculations. For example, a point estimate
of the standard deviation is used in the calculation of
a confidence interval for μ. Point estimates of
parameters are often used in the formulas for
significance testing.
o Point estimates are not usually as informative as
confidence intervals. Their importance lies in the fact
that many statistical formulas are based on them.
2. Interval estimate: An interval estimate is defined by two
numbers, between which a population parameter is said
to lie.
o For example, a < x < b is an interval estimate of the
population mean μ. It indicates that the population
mean is greater than a but less than b.
3. Biostatistics-69
CONFIDENCE INTERVALS
[Q:
Write short note on: Confidence interval. (BSMMU,
MD, July, 2009)
Write short note on: 95% confidence interval.
(BSMMU, MD Radiology, July, 2010)]
Statisticians use a confidence interval to express the precision
and uncertainty associated with a particular sampling
method. Confidence intervals are preferred to point estimates,
because confidence intervals indicate (a) the precision of the
estimate and (b) the uncertainty of the estimate.
Definition
A confidence interval is a range of values that has a specified
probability of containing the parameter being estimated.
Elements
A confidence interval is based on three elements:
(a) A statistic (the mean, the correlation, etc.);
(b) the confidence interval (e.g., the 95% confidence interval
or the 99% confidence interval); and
(c) The standard error (SE) of the mean (or margin of error)
For example, suppose the local newspaper conducts an
election survey and reports that the independent candidate
will receive 30% of the vote. The newspaper states that the
survey had a 5% margin of error and a confidence level of
95%. These findings result in the following confidence
interval: We are 95% confident that the independent
candidate will receive between 25% and 35% of the vote.
4. Biostatistics-70
Calculating a confidence interval
[Q: Estimate 95% confidence interval (CI) from the
following data set of hemoglobin levels (g/dl): 10.9, 13.8,
18.0, 15.1, 13.5, 14.2 and 13.4. (BSMMU, Radiology,
January, 2012)]
To construct a confidence interval, a lower bound and an
upper bound are calculated.
Assume that the weights of 10-year old children are normally
distributed with a mean of 90 and a standard deviation of 36.
What is the sampling distribution of the mean for a sample
size of 9?
Here, standard error of the mean (SE) is
SE= SD N
SE= 36 9
SE= 36 3
SE=12
95% confidence interval will be
Mean ± 1.96 X SEM
90 - (1.96)(12) = 66.48
90 + (1.96)(12) = 113.52
That is, 95% confidence interval will be from 66.48 (lower
limit) to 113.52 (upper limit)
5. Biostatistics-71
Figure: The sampling distribution of the mean for N=9. The
middle 95% of the distribution is shaded.
In this set of notes, confidence intervals are first theoretically
defined in terms of the mean and standard deviation of a
large number of samples of equal size from a known
population.
[Q: A hospital administrator plans to draw a sample of
hospital records of past years three years in order to
estimate the mean age of adult male admitted patients.
He wants to be 95% confident and he will be satisfied to
let d = 1.1 years, and from previous study he found the
population standard deviation as 8.5 years. How many
hospital records should be draw? (BSMMU, MD Radiology,
January, 2009)]
Standard error (SE)
[Q: Write short notes on: Standard error of proportion (SE)
(BSMMU, MD Radiology, July, 2010)]
The standard deviation of sampling distribution is known as
standard error. And is given by formula-
In 1 sample situation
6. Biostatistics-72
If it is mean, SE =
SD
n
If it is proportion SE =
pq
n
For two sample situation:
If it is mean, SE =
2 2
1 2
1 2
SD SD
n n
If it is proportion SE = 1 1 2 2
1 2
p q p q
n n
In a sample of 100, it is found that proportion of HTN is
18% what will be the proportion of HTN in the population
from where the sample are taken?
Here,
n =100
p =18%
q = (100-18) =82
At the level of 95% CI: p 1.96 X SE
Here SE =
pq
n
=
18 82
100
X
= 3.84
[So, CI
= 18 1.96X3.84
= 18 7.52
=10.48-25.52]
7. Biostatistics-73
[Q:
A cross-sectional study observed the prevalence of
tonsillitis was 10% in a sample of 500 school children.
Determine standard error of proportion (SEP)?
(BSMMU, MD Radiology, July, 2009)
A cross-sectional study among 1500 school children
reported that 4% had Chronic otitis media. What is
the standard error of proportion (SEP)? (BSMMU, MD
Radiology, July, 2010)]