estimation.pptx

6.00
Statistical Estimation
October 6, 2023 1

Objectives
At the end of this session the student will be able to:
 Describe Statistical inference and estimation
 Differentiate between point and interval estimation
 Compute appropriate confidence intervals for population
means and proportions and interpret the findings
 Describe methods of sample size calculation
October 6, 2023 2

Outline
 Definition of Statistical inference and estimation
 Point estimation
 Interval estimation
 Sample size calculation
October 6, 2023 3

 Descriptive statistics help investigators describe and
summarize data.
 Probability and sampling distribution concepts needed to
evaluate data using statistical methods.
 Without probability and sampling distribution theory:
 we could not make statements about populations without
studying everyone in the population
 studying everyone population is an undesirable and often
impossible task
October 6, 2023 4

Statistical inference
Statistical inference
 is the procedure by which we reach a conclusion about a
population on the basis of the information contained in a
sample that has been drawn from that population.
 The two primary methods for making inference are
estimation and hypothesis testing.
October 6, 2023 5

Statistical Estimation
Estimation: is the process of determining a likely value for a
variable in the population based on information collected
from the sample.
 The use of sample statistics to estimate population
parameters.
 Researchers are usually interested in looking at estimates
of many statistics , totals, averages and proportions
E.g. Estimates for the proportion of smokers among all people
aged 15 to 24 in the population
October 6, 2023 7

Types of Estimation
1. Point Estimation
2. Interval Estimation
October 6, 2023 8

1. Point Estimation
 A single numerical value is used to estimate the
corresponding population parameter.
 Point estimate is always within the interval estimate
 is an estimator of the population mean μ.
 s is an estimator of the population standard deviation σ
 p is an estimator of the population proportion π.
 r is an estimator of the population correlation coefficient ρ
October 6, 2023 9

From a single sample we can calculate a sample statistic to estimate
a single parameter (a point estimate).
Point estimate for population mean µ is
Point estimate for population proportion is given by
Where x is the total number of success (events)
October 6, 2023 10
n
x
=
x
n
1
=
i
i

n
=
p
x


Properties of a Good Estimates
a.Unbiasedness
 A sample statistic whose mean is equal to the population
parameter it estimates is unbiased.
The sample mean and median are unbiased estimators of
the population mean μ.
b. Minimum variance
 An estimate which has a minimum standard error is a good
estimator.
For symmetrical distribution the mean has a minimum
standard error and
If the distribution is skewed the median has a minimum
standard error.
October 6, 2023 11

c. Consistency
 As sample size increases, variation of the estimator from
the true population value decreases
October 6, 2023 12

2. Interval estimation
Interval estimation: is a statement that a population parameter
has a value lying between two specified limits.
 An interval estimate provides more information about a
population characteristic than a point estimate
 The value of the sample statistic will vary from sample to
sample therefore to simply obtain an estimate of the single
value of the parameter is not generally acceptable.
October 6, 2023 13

 We need to take into account the sample to sample variation of
the statistic.
 A confidence interval defines an interval within which the
true population parameter is like to fall (interval estimate)
October 6, 2023 14

October 6, 2023 16
Interval estimate (Confidence interval) - consists of two
numbers, a lower limit and an upper limit which serve as
the bounding values within which the parameter is expected
to lie with a certain degree of confidence

 Confidence interval therefore takes into account the sample
to sample variation of the statistic and gives the measure of
precision.
 The general formula used to calculate a Confidence interval
is Estimate ± K × Standard Error, k is called reliability
coefficient
 Confidence intervals express the inherent uncertainty in any
medical study by expressing upper and lower bounds for
anticipated true underlying population parameter.
 Most commonly the 95% confidence intervals are
calculated, however 90% and 99% confidence intervals are
sometimes used.
October 6, 2023 19

proportion
estimating
for
n
P
P
z
p
n
P
P
z
p
e
s
by
estimated
be
can
it
unknown
is
if
mean
estimating
for
n
z
x
n
z
x
]
/
)
1
(
.
,
/
)
1
(
.
[
.
,
]
.
,
.
[
2
2
2
2

















October 6, 2023 20
A (1-α) 100% confidence interval for unknown population mean
and population proportion is given as follows;

 The 95% confidence interval is interpreted in such a way that,
under the conditions assumed for underlying distribution, you are
95% confident that the interval contains the true parameter.
 90% CI is narrower than 95% CI since we are only 90% certain
that the interval includes the population parameter.
 The 99% CI is wider than 95% CI; the extra width meaning that
we can be more certain that the interval will contain the
population parameter.
October 6, 2023 21

 But to obtain a higher confidence from the same sample, we must
be willing to accept a larger margin of error (a wider interval).
 For a given confidence level (i.e. 90%, 95%, 99%) the width of
the confidence interval depends on the Standard Error of the
estimate which in turn depends on the:
Sample size:-The larger the sample size, the narrower the
confidence interval and the more precise our estimate.
 Lack of precision means in repeated sampling the values of the
sample statistic are spread out or scattered.
 The result of sampling is not repeatable.
October 6, 2023 23

 You can make the precision as high as you want by taking a
large enough sample.
 The margin of error decreases as√n increases.
Standard deviation:-The more the variation among the
individual values, the wider the confidence interval and the
less precise the estimate.
 As sample size increases SD decreases.
October 6, 2023 24

Confidence Intervals for
• A single population mean
• A single population proportion
• Difference between two population means
• Difference between two population proportions
October 6, 2023 26

1) C.I. for a population mean (normally distributed)
a) Known variance (large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values of α are
0.05, 0.01, 0.001 and 0.1.
October 6, 2023 27

Example
 A physical therapist wished to estimate, with 99% confidence,
the mean maximal strength of a particular muscle in a certain
group of individuals.
 He assume that strength scores are approximately normally
distributed with a variance of 144.
 A sample of 15 subjects who participated in the experiment
yielded a mean of 84.3.
October 6, 2023 28

Solution:
⇒ We are 99% confident that the population mean is between
76.3 and 92.3.
October 6, 2023 29

 E.g. 2. A random sample of 100 cancer patients treated with a
new drug has a mean survival time of 46.9 months.
 If the SD of the population is 43.3 months, find a 95%
confidence interval for the population mean.
Solution: 46.9 ± (1.96) x(43.3 /√100) = 46.9 ± 8.5 = (38.4 to 55.4
months)
 Hence, there is 95% certainty that the limits (38.4, 55.4) contain
the mean survival times in the population from which the sample
arose.
October 6, 2023 30

b) Unknown variance (small sample size n ≤ 30)
A 100(1‐α)% C.I. for μ is
 The t distribution density curve is bell shaped and symmetrical
about zero.
 Different curves for different df (i.e. sample sizes) and for very
large df very close to Z.
October 6, 2023 31

The Z-test is applied when:
 The distribution is normal
 The population standard deviation σ is known or
 When the sample size n is large ( n ≥ 30) and
 With unknown σ (by taking S as estimator of σ).
October 6, 2023 32

But, what happens when n< 30 and σ is unknown?
 We will use a t-distribution which depends on the number of
degrees of freedom (df).
 The t-distribution is a theoretical probability distribution (i.e.
its total area is 100 percent) and is defined by a mathematical
function.
 The distribution is symmetrical, bell-shaped and similar to the
normal but more spread out.
October 6, 2023 33

 As the df decrease, the t-distribution becomes increasingly
spread out compared with the normal.
 The sample standard deviation is used as an estimate of σ (the
standard deviation of the population which is unknown) and
appears to be a logical substitute.
 For large sample sizes (n ≥ 30), both t and Z curves are so close
together and it does not much matter which you use.
October 6, 2023 34

Degrees of Freedom
 It is defined as the number of values which are free to vary
after imposing a certain restriction on your data.
Example: If 3 scores have a mean of 10, how many of the scores
can be freely chosen?
Solution: The first and the second scores could be chosen freely
(i.e., 8 and 12, 9 and 5, 7 & 15, etc.)
But the third score is fixed (i.e., 10, 16, 8, etc.)
 Hence, there are two degrees of freedom
October 6, 2023 35

Table of t-distributions
The table of t-distribution shows values of t for
selected areas under the t curve.
Different values of df appear in the first column.
The table is adapted for efficient use for either one or
two-tailed tests.
October 6, 2023 36

E.g. 1) If df = 8, 5% of t scores are above what value?
Solution:
 Look at along the row labeled “one tail” to the value 0.05;
 The intersection of the 0.05 column and the row with 8 in the df
column gives the value of t = 1.86.
E.g. 2) Find t if n =13 and 95% of t scores are between –t and +t.
Solution: df =13-1 = 12. If 95% of t scores are between -to and + to,
then 5% are in the two tails.
October 6, 2023 37

 Look at the table along the row labeled “two tail” to
the value 0.05;
 The intersection of this 0.05 column and the row
with 12 in the df column gives to = 2.179.
E.g 3. If df =5, what is the probability that a t score is
above 2.02 or below -2.02?
Solution:
Two tails are implied. Look along the “df =5” row to
find the entry 2.02.
 The probability is 0.10
October 6, 2023 38

E.g. the mean pulse rate and standard deviation of a random sample
of 9 first year male medical students were 68.7 and 8.67 beats
per minute respectively. (Assume normal distribution).
 Find a 95% C.I. for the population mean.
Solution: t α/2, n-1= t= t(0.025, 8) = 2.31
t tab (with α = 0.05 and (n-1 )df = ± 2.31 and S / √ 9 = 2.89
Therefore, 95% C.I. for μ = 68.7 ± (2.31 x 2.89) = 68.7 ± 6.7
= (62.0 to 75.4) beats per minute.
We are 95% sure that the population mean (μ) lies between 62.0
and 75.4
October 6, 2023 41

3) C.I. for a population proportion (large sample size)
October 6, 2023 57

 p = 123/300 = 0.41 a point estimator of π.
 α = 0.05 ⇒ Z0.025 = 1.96
Example 2: An epidemiologist is worried about the ever increasing
trend of malaria in a certain locality and wants to estimate the
proportion of persons infected in the peak malaria transmission
period.
• If he takes a random sample of 150 persons in that locality
during the peak transmission period and finds that 60 of them are
positive for malaria,
October 6, 2023 58

Find: a) 95% b) 90% c) 99% confidence intervals
for the proportion of the whole infected people in that
locality during the peak malaria transmission period.
Solution:
• Semple proportion = 60 / 150 =0.4
a) A 95% C.I for the population proportion (the
proportion of the whole infected people in that
locality) = 0.4 ± 1.96 (0.04) = (0.4 ± 0.078) =
(0.322, 0.478).
b) A 90 = 0.4 ± 1.64 (0.04) = (0.4 ± 0.065)
c) A 99= 0.4 ± 2.57 (0.04) = (0.4 ± 0.1)
October 6, 2023 59

Sample size determination
How many samples should be taken from the larger
population to have a representative sample?
it depends on :
 objective of the study;
 design of the study;
How different or dispersed the population
 accuracy of the measurements to be made;
 degree of precision required for generalization;
 degree of confidence with which to conclude
Availability of resources
October 6, 2023 63

Sample size determination
• Given confidence interval
• Hence the absolute precision denoted by d is given as
• Where s.e is the standard error of the estimator of the
parameter of interest.
October 6, 2023 64
e
s
z
proportion
mean .
)
(
2


e
s
z
d .
2



Estimating a single population mean
October 6, 2023 65

Sample size for single population proportion
 If the study aims to be conducted on single population, then we
need the following :
1. What is the probability of the event occurring?
2. How much error is tolerable ?or How much precision do
we need?
3. How confident do we need to be that the true population
value falls within the confidence interval?
October 6, 2023 66

Single population proportion
• Let p denotes proportion of success, then
October 6, 2023 67

Where:
 n-is minimum sample size
 p-is estimate of the prevalence rate for the
population
 d-is the margin of sampling error tolerated
 Zα/2 is the standard normal variable at (1-α)100%
confidence level and α is mostly 5%
October 6, 2023 68

Point to be considered
October 6, 2023 69

Example
1.A hospital administrator wishes to know what proportions of
discharged patients are unhappy with the care received
during hospitalization. If 95% Confidence interval is desired
to estimate the proportion within 5%, how large a sample
should be drawn?
n = Z2p(1-p)/d2=(1.96)2 (.5×.5)/(.05)2 =384.2
≈ 385 patients
October 6, 2023 70

estimation.pptx

Recommended

Recommended

More Related Content

Similar to estimation.pptx

Similar to estimation.pptx (20)

Recently uploaded

Recently uploaded (20)

estimation.pptx

Editor's Notes