3. Statistical inference_anesthesia.pptx

► Estimation
► Hypothesis testing
Statistical inference
7/5/2023
1
By Asaye

Objectives
7/5/2023
By Asaye
2
After complete this session, learners will be able to do
 Parameter estimations
Point estimate
Confidence interval
 Hypothesis testing
Z-test
T-test
 Testing associations
Chi-square test

Sampling distribution
3
 A sampling distribution is a distribution of all possible values of a statistic
computed from samples of the same size randomly selected from the same
population.
 Sampling distribution is the probability distribution of sample statistic.
 It is formed when samples of size n repeatedly taken from population.
 Some would be higher than the population parameters and some would be
lower.
7/5/2023
Asaye.A

Sampling distribution….
4
 We consider sample statistic as random variables.
For example:
 Age of individuals is a random variable.
 Similarly, mean of age is a random variable.
 No conclusion about values of population parameters based on one
individual value.
 It should be based on sample statistic computed from adequate
sample size.
7/5/2023
Asaye.A

Sampling distribution….
5
Construction of sampling distributions
1. From a population of size N, randomly draw all possible samples
of size n.
2. Compute the statistic of interest for each sample.
3. Create a frequency distribution of the statistic.
7/5/2023
Asaye.A

A. Sampling distribution of sample mean
6
7/5/2023
Asaye.A

Example: sampling distribution of sample mean
7/5/2023
Asaye.A
7
The population values {18, 20, 22, 24} put in a box. Two observations
are randomly selected, with replacement.
Find the mean, variance, and standard deviation of the population.
Solution:
Mean: μ =
𝑋𝑖
𝑁
=
84
4
= 21
Variance: 𝜎2 =
𝑋𝑖
−𝜇 2
𝑁
=
20
4
= 5
Standard deviation: 5 = 2.236

7/5/2023
Asaye.A
8
Now consider all possible samples of size “n=2”
16 Sample Means
16 possible samples (with replacement)

7/5/2023
Asaye.A
9
List all the possible samples of size n = 2 and calculate the mean of
each sample.
Solution:
Samples 𝑿 Samples 𝑿
18,18 18 22,18 20
18,20 19 22,20 21
18,22 20 22,22 22
18,24 21 22,24 23
20,18 19 24,18 21
20,20 20 24,20 22
20,22 21 24,22 23
20,24 22 24,24 24
These means form the
sampling distribution of
sample means

7/5/2023
Asaye.A
10
Construct the frequency distribution of the sample means;

7/5/2023
Asaye.A
11
Find mean, variance and standard deviation of the 16 sample means
are;
Mean: 𝜇𝑥 =
𝑥𝑖
𝑛
=
18+19+21+⋯+24
16
=21
Variance: 𝜎𝑥
2
=
𝑥𝑖−𝜇𝑥
2
𝑛
= 2.5, 𝜎𝑥 = 2.5 = 1.581
These results satisfy the properties of sampling distributions of sample
means.
𝜇𝑥 = 𝜇 = 21, 𝜎𝑥 =
𝜎
𝑛
=
5
2
= 1.581

1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
12
18 19 20 21 22 23 24
0
.1
.2
.3
Sample Means
Distribution
16 Sample Means
P(𝑋)
𝑋
7/5/2023
Asaye.A

Comparing the population with its sampling distribution
13
18 19 20 21 22 23 24
0
.1
.2
.3
P(x)
Mean
18 20 22 24
𝒙
0
.1
.2
.3
P(x)=1/4
Population, N = 4
𝜇 = 21 𝜎 = 2.236
Sample means distribution, n = 2
𝜇𝑥=21 𝜎𝑥= 1.58
𝒙
7/5/2023
Asaye.A

Properties of sampling distribution of mean
14
A. Sampling from normally distributed populations
a. If a population is normal with mean 𝜇 and standard
deviation σ, the sampling distribution of 𝑥 is also normally
distributed with 𝜇𝑥 = 𝜇 and 𝜎𝑥 =
𝜎
𝑛
,
OR, the standard deviation of any sample statistic is called its
standard error.
7/5/2023
Asaye.A

Cont…
15
b. The mean 𝜇 of the distribution of sample mean is equal to the mean
of the population from which the samples were drawn.
c. The variance of the distribution of sample mean is equal to the
variance of the population divided by the sample size.
7/5/2023
Asaye.A

Sampling from non-normally distributed populations
16
We can apply the Central Limit Theorem:
 Even if the population is not normal,
 Sample means from the population will be approximately
normal if the sample sizes ≥ 30 are drawn from any population
with mean 𝜇 and standard deviation 𝜎.
 The sampling distribution of sample means has 𝜇𝑥 = 𝜇 and
𝜎𝑥 =
𝜎
𝑛
7/5/2023
Asaye.A

Sampling distribution of Proportion
7/5/2023
By Asaye
17
o Suppose we choose a random sample of size n, the sampling distribution
of the sample means p posses the following properties.
o The sample proportion p will be an estimate of the population mean P.
o The standard deviation of p is equal to p(1−p) /n called the standard
error of the proportion).
o Provided n is large enough the shape of the sampling distribution of p is
normal.

Types of estimation
7/5/2023
By Asaye
18
There are two methods of estimation:
1. Point estimation
2. Interval estimation

 Point estimation involves the calculation of a single value to
estimate the population parameter.
 Interval estimation specifies a range of values assumed to
include population parameter.
19

1. Point Estimation
 A parameter : is a numerical descriptive measure of a population
(e.g. μ).
 A statistic: is a numerical descriptive measure of a sample (e.g.
𝑋). It estimates the population parameter.
 A point estimate of some population parameter is a single value
of a sample statistic.
 To each sample statistic there corresponds a population parameter.
20

Sample statistic & corresponding population parameter
Sample statistic
 Sample mean ( 𝑋 )
 Sample variance (S2 )
 Sample Standard deviation (SD)
 Sample proportion (p)
Population parameters
 μ (population mean)
 σ2 (population variance)
 σ(population standard deviation)
 P or π (Population proportion)
21

Point estimation…..
If a random sample of 100 drug related patients has a mean survival
time of 46.9 months then ,what is the point estimate of the population
mean?
Answer = 46.9
22

2. Interval Estimation
 A point estimate does not give any indication on how far away the
parameter lies.
 But an interval which has a high probability of containing the value
parameter lies.
 An interval estimate is a statement that a population parameter has a
value lying between two specified limits.
 Such interval estimates are called Confidence Intervals (CI)
23

Confidence Interval (CI)
7/5/2023
By Asaye
24
 A confidence interval defines an interval within which the true
population parameter is like to fall (interval estimate).
 Confidence interval therefore takes into account the sample to
sample variation of the statistic and gives the measure of precision.
 Confidence intervals express the inherent uncertainty in any medical
study by expressing upper and lower bounds for anticipated true
underlying population parameter.

Confidence Interval (CI)…
7/5/2023
By Asaye
25
 Most commonly the 95% confidence intervals are calculated, however 90%
and 99% confidence intervals are sometimes used.
 The probability that the interval contains the true population parameter is
(1-α)100%.
 If we were to select 100 random samples from the population and calculate
confidence intervals for each, approximately 95 of them would include the
true population mean B (and 5 would not).

Confidence Interval (CI)…
Interval Estimate components
Estimator ± Margin of error
Estimator ± (Reliability coefficient) x (Standard error)
 Precision of the estimate or Margin of error (d)= reliability coefficient
x standard error.
Where;
 Reliability Coefficient (RC) is the 1 − α 100% percentile of the
given probability distribution.
 Standard Error (SE) is the standard deviation of the sampling
distribution of the sample statistic.
26

Reliability Coefficient
7/5/2023
By Asaye
27
The standardized “z” value corresponding to the given level of confidence.
Z = 1.64 if your confidence level is 90%
A wide interval suggests imprecision of estimation.
Narrow CI widths reflects large sample size, low variability and low
confidence level
e.g. if you had a confidence level of 99%, the confidence coefficient would be
. 99.

Confidence Level
Conﬁdence level is the probability that the interval estimate will contain the
parameter, assuming that a large number of samples are selected and that the
estimation process on the same parameter is repeated.
 Denoted by 100(1- 𝛼)%.
 A relative frequency interpretation:
 In long run; 100(1-𝛼 )% of all confidences intervals that can be
constructed will contain unknown parameter.
 A specific interval will either contain or not contain unknown parameter.
28

Normal or t-distribution
Is n≥30?
Is a population normally, or
approximately normally distributed
Is variance 𝜎 known?
Use t-distribution with n-1 degree
of freedom
Use normal distribution (Z)
Con not use normal or t-distribution
Use normal distribution (Z)
If 𝜎 is unknown , use s instead.
No
Yes
No
Yes
No
Yes

Confidence Interval for single population mean
1.When the variance is known and the sample size is large or small, the C.I. has
the form:
𝑋 - Z (1- α/2) δ /√n < μ < 𝑋 + Z (1- α /2) δ / √ n or 𝑋 ± 𝑍𝛼 2
𝑆
𝑛
for n ≥
30, 𝑏𝑢𝑡 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛.
2. When variance is unknown, and the sample size is small , the C.I. has the
form:
𝑋 - t (1- α /2),n-1 s/ √ n < μ < 𝑋+ t (1- α /2),n-1 s/ √ n , d.f = n-1
30

Example
E.g. In normally distributed population mean reading speed of a
random sample of 81 adults is 325 words per minute. Find a 90% C.I.
for the mean reading speed of all adults (μ) if it is known that the
standard deviation for all adults is 45 words per minute .
Given n = 81 σ = 45 𝑥 = 325
Zα/2 = 1.645
A 90% C.I. for μ is 325 ± (1.64 x 5 ) = 325 ± 8.2= (316.8, 333.2)
Therefore, A 90% CI for μ is 316.8 to 333.2 words per minute.
31

CI for the difference of means & independent samples
1. When variance known
CI = 𝑥1- 𝑥2 ± Z
 / 2
ẟ12
𝑛1
+
ẟ22
𝑛2
2. When variance unknown and if the sample size is less than 30
Use t – distribution instead of z – distribution
CI = 𝑥1- 𝑥2 ± t
 / 2, 𝑛1 + 𝑛2 − 2
𝑆1
2
𝑛1
+
𝑆2
2
𝑛2
33

Example
If a random sample of 50 non-smokers have a mean life of 76 years
with a standard deviation of 8 years, and a random sample of 65
smokers live 68 years with a standard deviation of 9 years,
Find a 95% C.I for the difference of mean lifetime of non-smokers and
smokers?
34

Confidence Interval for a Single Population proportion (P):
 A sample is drawn from the population of interest ,then compute the
sample proportion p such as;
 This sample proportion is used as the point estimator of the population
proportion
n
P
P
Z
P
)
ˆ
1
(
ˆ
ˆ
2
1




35
p =
no. of elements in the sample with some characterstics
Total no. of element in the sample
=
x
n

Single proportion cont….
2. In Addis Ababa, a survey of 350 students showed that 28% carried
their lunch to school. Find the 95% CI for the true population
proportion of students who carried their lunch to school?
3. Suppose that 22 people were obese from 100 people in Debre Tabor.
Find the 95% confidence interval for the true population proportion?
36

CI for the difference between two Population proportions
 Two samples are drawn from two independent population of interest,
 then compute the sample proportion for each sample for the
characteristic of interest.
 An unbiased point estimator for the difference between two population
proportions 𝑝1 − 𝑝2.
37

CI for the difference between two Population proportions
 A 100(1-α)% confident interval for P1 - P2 is given by
38
2
2
2
1
1
1
2
1
2
1
)
ˆ
1
(
ˆ
)
ˆ
1
(
ˆ
)
ˆ
ˆ
(
n
P
P
n
P
P
Z
P
P








Example
A researcher investigated gender differences in sexual abuse in a
sample of 323 adults (68 female and 255 males ). In the sample, 31 of
the female and 53 of the males reported sexual abuse. We wish to
construct 99% C.I. for the difference between the proportions of
sexual abuse in the two sampled population .
39

Example cont…..
1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,
40
2078
.
0
255
53
ˆ
,
4559
.
0
68
31
ˆ 





M
M
M
F
F
F n
a
p
n
a
p
M
M
M
F
F
F
M
F
n
P
P
n
P
P
Z
P
P
)
ˆ
1
(
ˆ
)
ˆ
1
(
ˆ
)
ˆ
ˆ
(
2
1







255
)
2078
.
0
1
(
2078
.
0
68
)
4559
.
0
1
(
4559
.
0
58
.
2
)
2078
.
0
4559
.
0
(






Example cont…..
0.2481 ± 2.58(0.0655) = ( 0.07914 , 0.4171 )
 Interpretation: ??????
41

C. Paired Samples
7/5/2023
By Asaye
42
 Tests Means of two Related Populations
∆ Paired or matched samples
∆ Repeated measures (before/after)
∆ Use difference between paired values:
d = x1-x2
 Eliminates variation among subjects
 Assumptions:
 Both populations are normally distributed,
 Or, if not normal, use large samples.

Examples
7/5/2023
By Asaye
43
Paired data arises when each individual (more specifically, each unit of
measurement) in a sample is measured twice.
e.g. Blood pressure prior to and following treatment,
 Notice in each of these examples that the two occasions of
measurement are linked by virtue of the two measurements being
made on the same individual.

7/5/2023
By Asaye
44
Where t𝛼 2 has n-1 df.

Example
7/5/2023
By Asaye
45
Ten hypertensive patients are screened at a neighborhood health clinic
and are given methyl dopa, a strong antihypertensive medication for
their condition. They are asked to come back 1 week later and have
their blood pressures measured again. Suppose the initial and follow-up
SBPs (mm Hg) of the patients are given below.

7/5/2023
By Asaye
47
We have the following data and summary statistics

Summary
7/5/2023
By Asaye
48
 Students sometimes have difficulty deciding whether to use 𝑍𝛼/2 or 𝑡𝛼/2
values when ﬁnding conﬁdence intervals.

Hypothesis testing
 A statistical hypothesis is a statement about the population under study
or about the distribution of a quantity under consideration.
 Researchers are interested in answering many types of questions. For example,
A physician might want to know whether a new medication will lower a
person’s blood pressure.
 These types of questions can be addressed through statistical hypothesis testing,
which is a decision-making process for evaluating claims about a population.
49

Hypothesis testing
7/5/2023
By Asaye
50
 Hypothesis is a testable statement that describes the nature of the
proposed relationship between two or more variables of interest.
 In hypothesis testing, the researcher must deﬁned the population
under study, state the particular hypotheses that will be
investigated, give the signiﬁcance level, select a sample from the
population, collect the data, perform the calculations required for
the statistical test, and reach a conclusion.

Type of Hypotheses
 Null hypothesis (represented by HO) is the statement about the value of the
population parameter (normal statement).
 The null hypothesis postulates that ‘there is no difference between factor and
outcome’ or ‘there is no an intervention effect.’
 Alternative hypothesis (represented by HA) is the hypothesis that a researcher
want to test or claim, or states the ‘opposing’ view that ‘there is a difference
between factor and outcome’ or ‘there is an intervention effect.
 Level of significance: the percentage of the sample means that is outside certain
prescribed limits.
51

Methods of hypothesis testing
7/5/2023
By Asaye
53
 Hypotheses concerning about parameters which may or may not be
true.
 The three methods used to test hypotheses are:-
The traditional method
The P-value method
The conﬁdence interval method.

Steps in hypothesis testing
7/5/2023
By Asaye
54
1. Identify the null hypothesis H0 and the alternate hypothesis HA.
2. Choose 𝛼. The value should be small, usually less than 10%. It is
important to consider the consequences of both types of errors.
3. Select the test statistic and determine its value from the sample data.
4. Compare the observed value of the statistic to the critical value obtained
for the chosen 𝛼.
5. Make a decision
6. Conclusion

Test Statistics
 A test statistics is a value we can compare with known distribution
of what we expect when the null hypothesis is true.
 The general formula of the test statistics is:
 Test statistics =
55

Critical value
7/5/2023
By Asaye
56
 The critical value separates the critical region from the non-critical
region for a given level of significance.

Decision making
Accept or Reject the null hypothesis
There are 2 types of errors
Type I error is more serious error and it is the level of significant.
Power is the probability of rejecting false null hypothesis and it is given by 1-β
57

Types of errors
7/5/2023
By Asaye
60
Type I errors: refers to the situation when we reject the null hypothesis when
it is true (Ho is wrongly rejected)
E.g. Ho: there is no differences between two drugs on average
Type I error will occur if we conclude that the two drugs produce different
effects when actually there isn’t a difference. Prob(type I error)=α
Type II errors: refers to the situation when we accept the null hypothesis
when it is false.
E.g. Ho: there is no differences between two drugs on average
Type II error will occur if we conclude that the two drugs produce the same
effects when actually there is a difference. Prob(type II error)=𝛽

Hypothesis testing about a Population mean (μ)
Two Tailed Test (The value of sample statistic failing into either tail of
the distribution)
The large sample (n > = 30) test of hypothesis about a population mean
μ is as follows
1 . H 0 :𝜇1 = 𝜇0 vs H A : 𝜇1 ≠ 𝜇0
2. Z cal=
𝑥−𝜇0
ẟ
𝑛
61

Hypothesis testing about a Population mean (μ)
7/5/2023
By Asaye
62
Ztab  Z α / 2
Decision rule :
Reject Ho if the Z value falls in the rejection region.
Don’t reject Ho if the Z value falls in the non-rejection region.
if |zcal| Ztab reject H 0
i f | zcal |< Ztab accept H 0
 If n < 30 and variance unknown
tcal =
𝑥−𝜇0
𝑠
𝑛
at n-1 d.f
And the decision is similar to z calculated

One tailed tests
2 . H 0 :    0 vs H A :  1 <  0
Ztab  Zα
D e c i s i o n : if zcal  - Ztab accept H0
if zcal < - Ztab reject H0
H 0 :    0 vs H A :  1   0
3. H 0 :    0 vs H A :  1   0
Decision : if zcal  Ztab reject H0
if zcal < Ztab accept H0
63

The P- Value
7/5/2023
By Asaye
64
 P-value is the probability that the observed difference is due to chance.
 A large p-value implies that the probability of the value observed, occurring
just by chance, when the null hypothesis is true.
 With small p-value, we can ignore the effect of chance, and suggests that
there might be sufficient evidence for rejecting the null hypothesis.
 The p-value is defined as the probability of observing the computed
significance test value or a larger one, if the H0 hypothesis is true. For
example, P[ Z >= Zcal/H0 true].

The P- Value…
7/5/2023
By Asaye
65
 A p-value is the probability of getting the observed difference, or one
more extreme, in the sample purely by chance from a population
where the true difference is zero.
 If the p-value is greater than 0.05 then, by convention, we conclude
that the observed difference could have occurred by chance and there
is no statistically significant evidence (at the 5% level of
significance) for a difference between the groups in the population.

P-value and confidence interval
7/5/2023
By Asaye
66
 Confidence intervals are preferable because they give information about the
size of any difference in the population, and they also indicate the amount
of uncertainty remaining about the size of the difference.
 When the null hypothesis is rejected in a hypothesis-testing situation, the
conﬁdence interval for the mean using the same level of signiﬁcance will
not contain the hypothesized mean.

P-value and confidence interval…..
7/5/2023
By Asaye
67
 But for what values of p-value should we reject the null hypothesis?
 By convention, a p-value of 0.05 or smaller is considered
sufficient evidence for rejecting the null hypothesis.
 By using p-value of 0.05, we are allowing a 5% chance of wrongly
rejecting the null hypothesis when it is in fact true.
 When the p-value is less than to 0.05, we often say that the result is
statistically significant.

Example1
A simple random sample of 10 people from a certain population
has a mean age of 27. Can we conclude that the mean age of the
population is not 30? The variance is known to be 20. Let 𝛼 =
.05.
68

Example….
7/5/2023
By Asaye
69
Solution
1. State hypothesis test: Ho: µ = 30 VS HA: µ ≠ 30
2. Determine level of significance: α = 0.05
3. Calculate test statistics: Zcal = (27-30)/ 20
10 = -2.12
4. Determine critical value: Z-critical value at 0.025 is equal to 1.96.
5. Make decision: We reject the null hypothesis since |Zcal | = 2.12 ≥ Ztab = 1.96.
That is Zcal =-2.12 is in the rejection region.
6. Conclusion: The mean of age of the population is different from 30 at 5%
level of significance. We conclude that µ is not 30 since p-value= 0.034.

Example 2
Suppose that we have a population mean 3.1 and n=20 people,
𝑥 = 4.5,
1. H0:  3.1 vs HA:   3.1
2. α= 0.5 at 95% CI
3. Our test statistic is:
70

Example 2…
7/5/2023
By Asaye
71
4. The observed value of the test statistic falls within the range of the
non-rejection region. i.e. tcal = 1.14 < ttab = 2.09, since do not reject Ho.
5. We accept Ho and we conclude that there is no enough evidence to
reject the null hypothesis

Hypothesis testing for single proportions
Example: In the study of childhood abuse in psychiatry patients,
brown found that 166 in a sample of 947 patients reported histories of
physical or sexual abuse. Test the hypothesis that the true population
proportion is 30%?
Solution
 To the hypothesis we need to follow thesteps.
72

Example:…
7/5/2023
By Asaye
73
Step 1: State the hypothesis
H0: P= Po = 0.3 vs Ha: P ≠ Po ≠ 0.3
Step 2: Fix the level of significant (α=0.05)
Step3: determine critical value: Ztab= Z𝛼/2= 1.96
Step 4: Compute the calculated and tabulated value of the teststatistic:
Zcal =
𝑃−𝑃0
𝑃∗𝑞
𝑛
=
0.175−0.3
0.3(0.7)
947
=
−0.125
0.0149
Zcal = -8.39

Example:…
7/5/2023
By Asaye
74
Step 5: make decision: reject Ho sine |Zcal|=8.39 ≥ Ztab=1.96.
Step 6: making conclusion: we conclude that there is statistical
evidence to reject the true population proportion is different from zero.

Hypothesis testing for two sample means
Ho: µ1-µ2 =0
VS
HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2>0
75

Example
A researchers wish to know if the data they have collected provide sufficient
evidence to indicate a difference in mean serum uric acid levels between normal
individual and individual with down’s syndrome. The data consists of serum
uric acid readings on 12 individuals with down’s syndrome and 15 normal
individuals. The means are 4.5mg/100ml and 3.4 mg/100ml with standard
deviation of 2.9 and 3.5 mg/100ml, respectively with variances (2=1, 2=1.5,
respectively). Is there a difference between the means of both groups at 5% level
of significance?
Hypothesis test: HA: µ1 - µ2 ≠ 0 or HA: µ1 ≠ µ2
76

Cont…
7/5/2023
77
With α = 0.05, the critical values of Z are -1.96 and +1.96. We reject Ho if Z
< -1.96 or Z > +1.96.
Reject Ho because 2.57 > 1.96.
 We are 95% confident that there is a statistically significant evidence the
population means are not equal.

Hypothesis testing for two proportions
 Suppose that n1 and n2 are large enough sothat;
n1·p1≥5, n1·(1 - p1)≥5, n2·p2≥5, and n2·(1 – p2)≥5
 To test the hypothesis
Ho: P1-P2 =0
VS
HA: P1-P2 ≠0
Test statistics:
78
𝜎𝑃1−𝑃2
=
𝑍𝑐𝑎𝑙 =
𝑃1 − 𝑃2 − 𝐷0
𝜎𝑃1−𝑃2
Where; 𝐷0 = (𝑃1 − 𝑃2)

Example
7/5/2023
By Asaye
79
Two hundred patients suffering from a certain disease were randomly divided into two
equal groups. Of the first group, 78 recovered within three days. Out of the other 100,
who were treated by a new method, 90 recovered within three days. The physician
wishes to know whether the data provide sufficient evidence at 90% level of
confidence to indicate that the new treatment is more effective than the standard
treatment.
Solution;
Given: n1= n2= 100;
p1=78/100= 0.78 p2=90/100=0.90
1. State the hypothesis: Ho: P1=P2 vs H1: P1< P2
2. Determine level of significance.

Example…
7/5/2023
By Asaye
80
3. Test statistic:
𝑍𝑐𝑎𝑙 =
0.78 − 0.90 − 0
0.78(0.32)
100
+
0.90(0.10)
100
=
−0.12
0.058
= −2.07
4. Critical value: It is one-tailed test and therefore Zα = Z0.05 = ±1.645
5. Decision: since 𝑍𝑐𝑎𝑙<−Zαi.e. -2.07 < -1.645 we reject the Ho
6. Conclusion: the data suggests that the new treatment is more
effective than the standard at 95% level of significance.

Chi-square test
7/5/2023
By Asaye
81
 Chi-square test is used to determine a significant difference between the
observed and expected frequencies in categorical attributes.
 In recent years, the use of specialized statistical methods for categorical
data has increased dramatically, particularly for applications in the
biomedical and social sciences.
 Categorical scales occur frequently in the health sciences, for measuring
responses.

Cont…
7/5/2023
By Asaye
82
For example:
Patient survives an operation (yes, no),
Severity of an injury (none, mild, moderate, severe), and
Stages of a disease (initial, advanced).
 Studies often collect data on categorical variables that can be summarized
as a series of counts and commonly arranged in a tabular format known as a
contingency table.

Cont…
7/5/2023
By Asaye
83
 As with the z and t distributions, there is a different chi-square
distribution for each possible value of degrees of freedom.
Chi-square distributions with a small number of degrees of freedom
are highly skewed; however, this skewness is attenuated as the
number of degrees of freedom increases.

Cont…
7/5/2023
By Asaye
84
The chi-squared distribution is concentrated over non-negative values. It
has mean equal to its degrees of freedom (d.f), and its standard deviation
equals √(2df ). As d.f increases, the distribution concentrates around larger
values and is more spread out.
The distribution is skewed to the right, but it becomes more bell-shaped
(normal) as d.f increases.

Cont…
7/5/2023
By Asaye
85
 For contingency table, d.f is equal to (r-1)x(c-1)

Test of association
7/5/2023
By Asaye
86
 The chi-squared (2) test statistics is widely used in the analysis of
contingency tables.
 It compares the actual observed frequency in each group with the expected
frequency.
 The chi-squared test (Pearson’s χ2) allows us to test for association
between categorical (nominal) variables.
 The null hypothesis for this test is there is no association between the
variables. Consequently a significant p-value implies association.

Cont…
7/5/2023
By Asaye
87
Test Statistic: 2-test with d.f. = (r-1)x(c-1)
 



j
i ij
ij
ij
E
E
O
,
2
2

Oij=observed frequency, Eij=expected frequency of the cell at the
juncture of i th raw & j th column
𝐸𝑖𝑗 =
𝑖𝑡ℎ
𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 × 𝑗𝑡ℎ
𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙
𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙
=
𝑅𝑖 × 𝐶𝑗
𝑛

Procedures of Hypothesis Testing
7/5/2023
By Asaye
88
1. State the hypothesis
2. Fix level of significance
3. Find the critical value (𝜒2 (d.f, α))
4. Compute the test statistics
5. Decision rules; reject null hypothesis if test statistics > table value.
6. Make conclusion

Test of associations for 2x2 tables
7/5/2023
By Asaye
89
If we call the frequencies in the four cells of 2x2 table a, b, c and d
then the table is given by
Exposure status
Disease status
Row
total
diseased Non-
diseased
Exposed a b a+b
Non-exposed c d c+d
Column total a+c b+d a+b+c+
d

Cont…
7/5/2023
By Asaye
90
If the contingency table is 2x2
The d.f is (r-1)x(c-1), then
 
)
)(
)(
)(
(
2
2
d
c
b
a
d
b
c
a
bc
ad
n







 



j
i ij
ij
ij
E
E
O
,
2
2


Assumptions of the 2 - test
7/5/2023
By Asaye
91
The chi-squared test assumes that
 Data must be categorical data.
 The data be a frequency data.
 The numbers in each cell are ‘not too small’. No expected frequency
should be less than 1, and
 No more than 20% of the expected frequencies should be less than 5.
 If this does not hold row or column variables categories can sometimes be
combined (re-categorized) to make the expected frequencies larger or use.

Example:
7/5/2023
By Asaye
92
 Consider hypothetical example on smoking and symptoms of asthma. The
study involved 150 individuals and the result is given in the following
table:
 Is there association between smoking cigarettes and symptoms of asthma at
0.05 level of significance?
Symptoms of
Asthma
Ever smoking
Total
Yes No
Yes 20 30 50
No 22 78 100
Total 42 108 150

7/5/2023
By Asaye
93
dfarea 0.995 0.99 0.975 0.95 0.9 0.25 0.1 0.05 0.025 0.01 0.005
1 0.000 0.000 0.001 0.004 0.016 1.323 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 2.773 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 4.108 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 5.385 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 6.626 9.236 11.071 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 7.841 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 9.037 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 3.490 10.219 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 11.389 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 12.549 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 13.701 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 14.845 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 15.984 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 17.117 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 18.245 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 19.369 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 20.489 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 21.605 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 22.718 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 23.828 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 24.935 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 26.039 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 27.141 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 28.241 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 29.339 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 30.435 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 31.528 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 32.620 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 33.711 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 34.800 40.256 43.773 46.979 50.892 53.672
Table C. Right tail areas for the Chi-square

Solution
7/5/2023
By Asaye
94
Hypothesis:
 H0: there is no association between smoking and symptoms of asthma.
 H0: there is association between smoking and symptoms of asthma.
The critical value is given by 𝜒2(0.05,1) = 3.841
Test statistics

Cont…
7/5/2023
By Asaye
95
The corresponding p-value to 5.36 at 1 degree of freedom is estimated
by 0.02.
Decision: Hence, the decision is reject the null hypothesis and accept
the alternative hypothesis
Conclusion: there is association between smoking and symptoms of
asthma).

Exercise
7/5/2023
By Asaye
96
Consider the data on the assessment of the effectiveness of antidepressant. The
data is given below:
Is there association between treatments and depression at 0.01 level of
significance?
Treatment
Depression status
Total
Yes No
Desipramine 14(8) 10(16) 24
Lithium 6(8) 18(16) 24
Placebo 4(8) 20(16) 24
Total 24 48 72

7/5/2023
By Asaye
97
Measure of Association

Measure of Association
7/5/2023
By Asaye
98
 Chi-square test only tells us whether there is association between the two
categorical variables or not, however, it did not tell us about the direction
and strength of the association.
 Statistical relationship between exposure and disease.
 An association is said to exist between two variables when a change in one
variable parallels or coincides with a change in another variable.
 Requires comparing two groups:
 Exposed Vs Unexposed
 Cases Vs non cases/controls

Cont…
7/5/2023
By Asaye
99
 Variables can be related or unrelated to one another.
 If they have relation, it can be:
 Positively or negatively
 Strongly or weakly (one variable can have large or small effect
on the other)
 Significantly or not significantly related
 Statistically significant association is the association is unlikely to be
due to chance.

Cont…
7/5/2023
By Asaye
100
Commonly, the strength of the association is measured by the
 Relative risk (RR)
 Odds Ratio (OR)

Relative Risk (RR)
7/5/2023
By Asaye
101
 Risk: The probability of an event occurring over time
 Risk Ratio: The ratio of the risk of disease incidence in exposed
group compared to the risk in those unexposed.
 Risk measures the probability of disease incidence among groups.
 Relative risk is used to compare the risk in two different groups of
people.
Risk =
number of cases of disease
number of people at risk

Cont…
7/5/2023
By Asaye
102
 It estimates the magnitude (size) of an association between exposure
and outcome.
 It indicates the chance of developing the disease in the exposed
group relative to those who are not exposed group to a risk factor.

Cont…
7/5/2023
By Asaye
103
 Table 1: a 2 by 2 table indicating findings of a cohort study

Cont…
7/5/2023
By Asaye
104
 From the above table the RR is calculated as:

Example1
7/5/2023
By Asaye
105
Table 2: Data from a cohort study of oral contraceptive (OC) use and
bacteriuria among women aged 15-49 years.
Current OC
use
Bacteriuria
Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390

Cont…
7/5/2023
By Asaye
106
Calculate RR?
RR =
𝑙𝑒
𝑙𝑜
or 𝑎/(𝑎+𝑏)
𝑐/(𝑐+𝑑)
=
27/482 ∗1000
77/1908 ∗1000
=1.4
 Interpretation: women who used oral contraceptive had 1.4 times higher risk
of developing bacteriuria when compared to non-users.
RR = Incidence among exposed (Ie)
Incidence among non-exposed (Io)

Interpretation
7/5/2023
By Asaye
107
 The value of RR ranges from 0 and infinity.
 RR is always a positive number.
 RR=1
 Risk in exposed = risk in non-exposed
 No association
 RR>1
 Risk in exposed > risk in non-exposed
 Implies that exposed individuals are x times highly likely to develop the outcome
as compared to non-exposed.
 Positive association, factor is associated with disease
 Larger RR  stronger association

Cont…
7/5/2023
By Asaye
108
 RR<1
 Risk in exposed < risk in non-exposed
 Indicates the risk of acquiring the disease is less among subjects with the
risk factor than among subjects without the risk factor.
 Negative association, factor is “protective”

Interpretation cont’d…
1
No association
Preventive Risk
0 ∞

Guideline for strength of association
7/5/2023
By Asaye
110
 1.0 = No association
 1.1-1.3 = Weak
 1.4-1.7 = Mild
 1.8-3.0 = Moderate
 3.0-8.0 =Strong
Q. What if RR is less than 1?

Cont…
7/5/2023
By Asaye
111
 For inverse associations (RR is less than 1.0), take the reciprocal and
look in above table, e.g., reciprocal of 0.5 is 2.0, which corresponds
to a “moderate” association.
 The further RR away from 1, the stronger the association between
exposure and disease.

Odds Ratio (OR)
7/5/2023
By Asaye
112
The Odds of disease is the probability that an individual experiences the
disease as a function of exposure.
Odds: The probability of an event's occurring to the probability of its not
occurring.
Odds = P/1-P
Where ; p = the probability of an event
1-p = the probability that the event does not occur
 Indicates the likelihood of having been exposed among cases relative to
controls.

Cont…
7/5/2023
By Asaye
113
Consider the following 2x2 table:
Treatment
Outcome status
Total
X
-
X+
Y
-
a b a+b
Y+
c d c+d
Total a+c b+d a+b+c+d

Cont…
7/5/2023
By Asaye
114
Odds Ratio: The ratio of two odds or the ratio of the odds of exposure in
cases compared with the odds of the exposure in controls.
Odds Ratio =
Odds of positive outcome among cases
Odds of positive outcome aomg control
= OR =
a/c
b/d
=
a∗d
b∗c
Odds – the ratio of the probability of occurrence of an event to that of
nonoccurrence.
 We can calculate either exposure or disease odds ratio, which are exactly
the same.

Example
7/5/2023
By Asaye
115
Table 3: Data from a case-control study of current oral contraceptive
(OC) use and MI in pre-menopausal female nurses.
Current OC
use
Myocardial infraction
Total
Yes No
Yes 23 304 327
No 133 2816 2949
Total 156 3120 3276

Cont…
7/5/2023
By Asaye
116
Calculate OR
OR =
a/c
b/d
=
23∗2816
304∗133
= 1.6
Interpretation: the odds of having MI is 1.6 times higher among OCP
users as compared to that of the non OCP users.

Interpretation cont’d…
 OR can be ranges from 0 to positive infinity.
 OR = 1 then exposure not related to disease.
 OR >1 then exposure positively related to disease.
 OR <1 then exposure negatively related to disease.
0
1.0 ∞
Positive
Negative
No
weak

Interpretation
7/5/2023
By Asaye
118
 The odds of having the disease in question are OR times greater among
those exposed to the suspected risk factor than among those with no such
exposure.
 The formula for standard error of the log odds ratio is given by
𝑆𝐸(ln 𝑂𝑅 ) =
1
𝑎
+
1
𝑏
+
1
𝑐
+
1
𝑑
 The 95% confidence interval for the log odds ratio is given by
ln 𝑂𝑅 − 𝑍𝛼
2
∗ 𝑆𝐸 ln 𝑂𝑅 , ln 𝑂𝑅 + 𝑍𝛼
2
∗ 𝑆𝐸 ln 𝑂𝑅

Cont…
7/5/2023
By Asaye
119
 To obtain 95% confidence interval interpretation for the odds ratio, we need
to transform back to the original value of odds ratio.
 Or, The 95% confidence interval for odds ratio is given by:
 OR is the point estimate of the sample.

Exercise
7/5/2023
By Asaye
120
 Example: Let us consider an example in order to make the concept clear. The data
in the table below is information about infant birth weights and mortality among
white infants in region X within a year.
 Find the confidence interval for odds ratio of infant mortality at 5% level of
significance?
Birth weight
Mortality
Total
Dead Alive
Low BW 618 4597 5215
High BW 422 67093 67515
Total 1040 71690 72730

Sampled reference
7/5/2023
By Asaye
121
BLUMAN ELEMENTARY STATISTICS: A STEP BY STEP
APPROACH, EIGHTH EDITION
An Introduction to Statistical Methods and Data Analysis,
Sixth Edition
Introduction to Biostatistics BY Larry Winner; Department of
Statistics, University of Florida

3. Statistical inference_anesthesia.pptx

Recommended

Recommended

More Related Content

Similar to 3. Statistical inference_anesthesia.pptx

Similar to 3. Statistical inference_anesthesia.pptx (20)

More from Abebe334138

More from Abebe334138 (12)

Recently uploaded

Recently uploaded (20)

3. Statistical inference_anesthesia.pptx