Lecture 3_Inferential statistics-Estimation_071853.pptx

School of Public Health
Statistical Estimation
By: Hagazi G.

• At the end of this session students will be able to:
• Define statistical estimation
• Explain two ways of estimation
• Understand and compute two-sided and one-
sided CIs
• Compute CI for Means (single and two
population means)
• Compute CI for proportions (single and double
population proportions)
Learning objectives
2

• The procedure by which we reach a conclusion about a
population on the basis of the information contained
in a sample drawn from that population is known as
statistical inference.
• There are two ways of statistical inference;
• Estimation and
• Hypothesis testing
Estimation
3

• Estimation: is about estimating population parameters
based on sample statistics (by computation of a statistic
from sample data)
• The statistic itself is called an estimator and can be of
two types: point or interval.
• The value or values that the estimator assumes are
called estimates.
Estimation, Estimator & Estimate
4

• There are two ways to estimate population values from sample
values
– Point estimation
• using a sample statistic to estimate a population parameter
based on a single value
• e.g. if a random sample of Tigray births gave =3.5kg, and
we use it to estimate , the mean birth weight of all
Tigray births in the sampled population, we are making a
point estimation
• Point estimation ignores sampling error !
– Interval estimation
• using a sample statistic to estimate a population parameter
by making allowance for sample variation (error)
Statistical Estimation
X
5

• An estimator that represents a "single best guess" is
called a point estimator.
• When the estimate is of the form of a "range of
plausible values", it is called an interval estimator.
• Thus,
– A point estimate is of the form: [Value ],
– Whereas, an interval estimate is of the form: [ lower
limit, upper limit ]
Point Vs. Interval Estimators
6

Estimation…
7

Sample Statistics are Estimators of Population Parameters
Sample mean,
Sample variance, S2
Sample proportion, p
Sample Odds Ratio, OŔ
Sample Relative Risk, RŔ
Sample correlation coefficient, r
µ
2
P or π
OR
RR
ρ
1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
X
8

• Provide an estimation of the population parameter by
defining an interval or range within which the
population parameter could be found with a given
probability or likelihood
• A confidence interval is a particular type of interval
estimator.
2. Interval estimation
9

• Give a plausible range of values of the estimate likely
to include the “true” (population) value with a given
confidence level.
• An interval estimate provides more information about
a population characteristic than does a point estimate
• Such interval estimates are called confidence
intervals.
Confidence Intervals (CIs)
10

• CIs also give information about the precision of an
estimate.
• How much uncertainty is associated with a point
estimate of a population parameter?
• When sampling variability is high, the CI will be wide
to reflect the uncertainty of the observation.
• Wider CIs indicate less certainty.
CIs…
11

• A CI in general:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from 1 sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
CIs…
12

General Formula:
Point estimate  (how confident we want to be)  (standard error)
The value of the statistic in the sample (eg., mean, proportion,
difference of mean/ proportion, etc.)
From a Z table or a T table, depending on the
sampling distribution of the statistic.
Standard error
of the statistic.
13

Lower limit = Point Estimate - (Critical Value) x (Standard Error)
Upper limit = Point Estimate + (Critical Value) x (Standard Error)
• A wide interval suggests imprecision of estimation.
• Narrow CI width reflects large sample size or low variability or
both.
• Note: Measure of how confident we want to be = critical value
= confidence coefficient =confidence level
CIs…
14

• Confidence Level
– Confidence in which the interval will contain the
unknown population parameter
• A percentage (less than 100%)
– Example: 95%
• Also written (1 - α) = .95
• Can be a two or one-sided
Confidence Level
15

Definition: 95% CI (Two sided CI)
1. Probabilistic interpretation:
16

Two sided..
17

Two sided..
18

2. Practical interpretation:
• When sampling is from a normally distributed
population with known standard deviation, we are 100
(1-α) [e.g., 95%] confident that the single computed
interval contains the unknown population parameter.
Two sided…
19

• The 95% confidence interval gives an interval of
values within which there is a 95% chance of
locating the true population mean 
Practical interp. 95% CI…
+1.96
n

1.96
n

X
X X
95% chance of finding  within this interval
Standard
error of the
sample
mean(S.E. )
X
It quantifies the precision
of the sample mean
20

One-sided CI
21

Confidence Level to Z-Value Guide
Confidence Level Z/2 (2-Tail) Z (1-Tail)
80%  = 20% 1.28 0.84
90%  = 10% 1.645 1.28
95%  = 5% 1.96 1.645
99%  = 1% 2.575 2.325
c  = 1.0-c Z(c/2) z(c-0.5)
Using statistical tables
The (1-) percent confidence interval (C.I.) for :
We want to find two values L and U between which  lies with
high probability, i.e.
P( L ≤  ≤ U ) = 1-
22

Z-table
23

T-table
25

• Suppose researchers wish to estimate the mean of
some normally distributed population.
• They draw a random sample of size n from the
population and compute , which they use as a point
estimate of .
• Because random sampling involves chance, then
can’t be expected to be equal to .
• The value of may be greater than or less than .
• It would be much more meaningful to estimate  by
an interval.
CI for a Population Mean
x
x
x
26

Recall
27

A) When the population is normal
1) When the  is known and the sample size is large or small, the C.I. has the
form:
2) When  is unknown, and the sample size is small, the C.I. has the form:
We have the following cases:



 
 




 
 1
)
/
/
( )
2
/
1
(
)
2
/
1
( n
Z
x
n
Z
x
P

 
 




 


 1
)
/
/
( )
1
(
),
2
/
1
(
)
1
(
,
)
2
/
1
( n
s
t
x
n
s
t
x
P n
n
28

B) When the population is not normal and n large (n>30)
1) When the  is known the C.I. has the form:
2) When  is unknown, the C.I. has the form:
CI...



 
 




 
 1
)
/
/
( )
2
/
1
(
)
2
/
1
( n
Z
x
n
Z
x
P
29

• Suppose a researcher is interested in obtaining an
estimate of the average level of some enzyme in a
certain human population, takes a sample of 10
individuals, determines the level of the enzyme in each,
and computes a sample mean of approximately
• Suppose further it is known that the variable of interest
is approximately normally distributed with a variance
of 45. We wish to estimate the CI of . With =0.05
Example 1
22

x
30

1- =0.95→ =0.05→ /2=0.025,
variance = σ2
= 45 → σ= 45,n=10,
95%confidence interval for  is given by:
Z (1- /2) = Z 0.975 = 1.96 (refer table)
Z 0.975(/n) =1.96 ( 45 / 10) ≈ 4.16
22 ± 4.16) → [22-4.16; 22+4.16] → [17.84; 26.16]
Solution
22

x



 
 




 
 1
)
/
/
( )
2
/
1
(
)
2
/
1
( n
Z
x
n
Z
x
P
31

• The activity values of a certain enzyme measured in normal
gastric tissue of 35 patients with gastric carcinoma has a mean
of 0.718 and a standard deviation of 0.511.We want to
construct a 90 % confidence interval for the population mean.
Note that the population is not normal, however
n=35 (n>30) n is large and  is unknown, s=0.511
1- =0.90→ =0.1→ 1-/2=0.95,
Z (1- /2) = Z0.95 = 1.645 (refer Z- table)
Z 0.95(s/n) =0.1421
0.718 ± 1.645 (0.511) / 35→ [0.576; 0.860]
Example 2

 
 




 
 1
)
/
/
( )
2
/
1
(
)
2
/
1
( n
s
Z
x
n
s
Z
x
P
32

• Suppose a researcher, studied the effectiveness of early
weight bearing and ankle therapies following acute
repair of a ruptured Achilles tendon. One of the variables
they measured following treatment the muscle strength.
In 19 subjects, the mean of the strength was 250.8 with
standard deviation of 130.9
we assume that the sample was taken from
approximately normally distributed population.
Calculate 95% confidence interval for the mean of the
strength ?
Example 3
33

1- =0.95→ =0.05→ /2=0.025,
Standard deviation= S = 130.9 ,n=19
95%confidence interval for  is given by:
t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer t-table )
t 0.975,18(s/n) =2.1009 (130.9 / 19)=63.1
250.8 ± 63.1) → [187.7; 313.9]
Solution
8
.
250

x

 
 




 


 1
)
/
/
( )
1
(
)
2
/
1
(
)
1
(
)
2
/
1
( n
s
t
x
n
s
t
x
P n
n
34

• If we draw two samples from two independent population and
we want to get the confidence interval for the difference between
two population means, then we have the following cases :
• The interpretation of the CI of the difference between population
means rests on the same assumptions as the CI of the means.
a) When the population is normal
1) When the variances are known and the sample sizes are large
or small, the C.I. has the form:
Confidence Interval for the difference between
two Population Means:
2
2
2
1
2
1
2
1
2
1
2
1
2
2
2
1
2
1
2
1
2
1 )
(
)
(
n
n
Z
x
x
n
n
Z
x
x







 










35

2) When variances are unknown but equal, and
the sample size is small, the C.I. has the form:
Cont’d
2
)
1
(
)
1
(
1
1
)
(
1
1
)
(
2
1
2
2
2
2
1
1
2
2
1
)
2
(
,
2
1
2
1
2
1
2
1
)
2
(
,
2
1
2
1
2
1
2
1





















n
n
S
n
S
n
S
where
n
n
S
t
x
x
n
n
S
t
x
x
p
p
n
n
p
n
n

 

36

b) When the population is non-normal
1) When the variances are unknown and the
sample sizes are large, the C.I. has the form:
Assumptions…
2
2
2
1
2
1
2
1
2
1
2
1
2
2
2
1
2
1
2
1
2
1 )
(
)
(
n
S
n
S
Z
x
x
n
S
n
S
Z
x
x 











 

37

The researcher team interested in the difference between serum uric
acid level in a patient with and without Down’s syndrome. In a large hospital for the
treatment of the mentally retarded, a sample of 12 individual with Down’s Syndrome
yielded a mean of mg/100 ml. In a general hospital a sample of 15 normal
individual of the same age and sex were found to have a mean value of
If it is reasonable to assume that the two population of values are normally distributed with
variances equal to 1 and 1.5, find the 95% C.I for μ1 - μ2
Solution:
1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96
1.1±1.96(0.4472) = 1.1± 0.88 = ( 0.22, 1.98). We are 95% sure the true difference between means lies
within the interval 0.22 and 1.98.
Example 1
5
.
4
1 
x
4
.
3
2 
x
2
2
2
1
2
1
2
1
2
1 )
(
n
n
Z
x
x


 



38

The purpose of the study was to determine the effectiveness of an
integrated outpatient dual-diagnosis treatment program for
mentally ill subject. The authors were addressing the problem of
substance abuse issues among people with sever mental disorder.
A retrospective chart review was carried out on 50 patients, the
researcher was interested in the number of inpatient treatment
days for the disorder during a year following the end of the
program. Among 18 patient with schizophrenia, The mean
number of treatment days was 4.7 with standard deviation of 9.3.
For 10 subject with bipolar disorder, the mean number of
treatment days was 8.8 with standard deviation of 11.5. We wish
to construct 99% C.I for the difference between the means of the
populations represented by the two samples
Example 2
39

1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
n1 +n2 – 2 = 18 + 10 -2 = 26t (1- /2),(n1+n2-2)
= t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2
• where
then
(4.7-8.8)± 2.7787√102.33 √(1/18)+(1/10)
-4.1 ± 11.086 =( - 15.186 , 6.986)
Solution
2
1
)
2
(
,
2
1
2
1
1
1
)
(
2
1 n
n
S
t
x
x p
n
n







33
.
102
2
10
18
)
5
.
11
9
(
)
3
.
9
17
(
2
)
1
(
)
1
( 2
2
2
1
2
2
2
2
1
1
2











x
x
n
n
S
n
S
n
Sp
40

Remark
Independent
1. Are samples come from two
distinct populations/groups
2. have different Data sources
3. The data of the samples are
 Unrelated
 Independent
4.Use difference between
the 2 Sample Means:
Two different diets. Does one increase
longevity relative to the other?
• We can use independent t-test statistic
Patients assigned randomly to receive a
vaccine or placebo. Is the rate of the
disease the same in both groups, or did
the vaccine prevent disease?
Related/Dependent
1. Are samples come from related
/the same/ populations
2. Have Same/related Data Source
3. The data are either
 Paired or Matched
 Repeated Measures
(Before/After)
4.Use difference between each pair
of observations
Di = X1i - X2i
• We can use paired t-test statistic
RBS level of study subjects before and
after breakfast.
7 January 2026 41
)
( 2
1 x
x 

• You can construct a 100(1-a)% confidence interval for a paired experiment using
• Once you have designed the experiment by pairing, you MUST analyze it as a paired experiment.
If the experiment is not designed as a paired experiment in advance, do not use this procedure.
• The interpretation of the CI of the mean difference of paired measurements depends on these
assumptions:
a. Your pair of subjects are randomly selected from the population of pairs or at least are
representative of the populations.
b. The overall population of pairs, the difference is distributed in a Gaussian manner.
c. The two measurements are before/after measurements on one subject or are measurements on two
subjects matched before the data were collected.
d. All subjects come from the same population, and each subject (if before/after) or each pair of
matched subjects has been selected independently of the others.
Remark…
n
s
t
d d
2
/


42

Example
4.4mmHg (X1) 9.9mmHg (X2)
62.5 51.7
65.2 54.2
71.3 57.0
69.9 56.4
74.5 61.5
67.8 57.2
70.3 58.1
67.0 56.2
68. 5 58.4
62.4 55.5
VAPOR PRESSURE
The data here are on the sugar concentration
of juice in half heads of red clover kept at
different vapor pressure for 8 hours.
Construct the 99% confidence interval for
the difference in mean sugar concentration
43

Solution
xi yi di di
2
62.50 51.70
10.80 116.64
65.20 54.20
11.00 121
71.30 57.00
14.30 204.49
69.90 56.40
13.50 182.25
74.50 61.50
13.00 169
67.80 57.20
10.60 112.36
70.30 58.10
12.20 148.84
67.00 56.20
10.80 116.64
68.50 58.40
10.10 102.01
62.40 55.50
6.90 47.61
Sum
di=113.2 di
2=12824.24
The 99% confidence interval for µ1-µ2
or µd is given by:-
Therefore, the 99% confidence interval
for µ1-µ2 is (9.17, 13.47)
7 January 2026 44
   
   
    

































.47
3
1
2.15
11.32
9.17
2.15
-
11.32
2.15
0.662
250
.
3
1
250
.
3
1
10
1
005
.
0
2
01
.
0
%
1
,
1
,
,
2
005
.
0
2
2
d
d
s
n
t
t
n
t
t
Hence
s
n
t
where
d
d
d
critical
d






• A sample is drawn from the population of interest,
then compute the sample proportion such as .
This sample proportion is used as the point estimator of
the population proportion. A confidence interval is
obtained by the following formula
Confidence Interval for a Population
proportion (P)
n
a
p 

sample
in the
element
of
number
Total
istic
charachtar
some
with
sample
in the
element
of
number
ˆ
n
P
P
Z
P
)
ˆ
1
(
ˆ
ˆ
2
1




P̂
45

In order to better counsel the parents of premature
babies, researchers investigated the survival of
premature infants. They retrospectively studied all
premature babies born at 22 to 25 weeks gestation at
the AUH during a 3-year period. The investigators
separately tabulated deaths for infants by their
gestational age. Of 29 infants born at 22 weeks
gestation, none survived 6 months. Of 39 infants born
at 25 weeks gestation, 31 survived for at least 6
months. Construct 95% CI for P for both cases?
Example
46

1-α =0.95 → α = 0.05 → α/2 =0.025 → 1- α/2 = 0.975
Z 1- α/2 = Z 0.975 =1.96 , n=39,
For the infants born at 25 weeks gestation the 95% C. I for P
95% CI for P =(0.67,0.92)
This means that if the true proportion of surviving infants was any
less than 67%, there is less than 2.5% chance of observing such a
large proportion just by chance. It also means that if the true
proportion were any greater than 92%, the chance observing such
a small proportion just by chance is less than 2.5%.
Exercises: Do for the infants born at 22 weeks gestation
Solution
79487
.
0
39
31
ˆ 

p
39
)
39
/
31
1
(
39
/
31
96
.
1
39
/
31
)
ˆ
1
(
ˆ
ˆ
2
1





 n
P
P
Z
P 
47

• Two samples are drawn from two independent population of
interest, then compute the sample proportion for each sample
for the characteristic of interest. An unbiased point estimator
for the difference between two population proportions
• A 100(1-α)% confidence interval for P1 - P2 is given by:
CI for difference between two population Proportions
2
2
2
1
1
1
2
1
2
1
)
ˆ
1
(
ˆ
)
ˆ
1
(
ˆ
)
ˆ
ˆ
(
n
P
P
n
P
P
Z
P
P







48

Assumption:
The subjects are randomly selected from the population or at least are
representative of that population.
Each subject was selected independently of the rest.
The only difference between groups is exposure to the risk factor or
exposure to the treatment
Example
A researcher investigated gender differences in proactive and reactive
aggression in a sample of 323 adults (68 female and 255 males ). In
the sample, 31 of the female and 53 of the males were using internet in
the internet café. We wish to construct 99 % confidence interval for the
difference between the proportions of adults go to internet café in the
two sampled population .
CI for difference between two population
Proportions
49

1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,
The 99% C. I is
0.2481 ± 2.58(0.0655) = ( 0.07914 , 0.4171 )
Solution :
2078
.
0
255
53
ˆ
,
4559
.
0
68
31
ˆ 





M
M
M
F
F
F n
a
p
n
a
p
M
M
M
F
F
F
M
F
n
P
P
n
P
P
Z
P
P
)
ˆ
1
(
ˆ
)
ˆ
1
(
ˆ
)
ˆ
ˆ
(
2
1







255
)
2078
.
0
1
(
2078
.
0
68
)
4559
.
0
1
(
4559
.
0
58
.
2
)
2078
.
0
4559
.
0
(





50

Lecture 3_Inferential statistics-Estimation_071853.pptx

More Related Content

Similar to Lecture 3_Inferential statistics-Estimation_071853.pptx

More from mekulecture

Recently uploaded

Lecture 3_Inferential statistics-Estimation_071853.pptx

Editor's Notes