2. 2
Outline
Scenarios 1 – 5: Data analysis &
Interpretation in various study designs
Scenarios 6 – 9 : Sample size calculation for
qualitative and quantitative data
Outputs from PEPI software
3. 3
Scenario I (Data not real, only for exercise)
In a study to find out the prevalence of
hypertension, 1,000 adults were selected by
Simple Random Sampling method from a
population of 1,00,000. All the 1000 Adults were
contacted once and their blood pressure was
measured as per the standard guidelines and 55
were found to be having hypertension.
4. 4
1. What is the type of study design?
Cross-sectional study
Study carried out at one point in time
5. 5
2. What is the prevalence of
hypertension in the sample selected?
55
Prevalence = ----- x 100 = 5.5 %
1000
6. 6
3. How precise is the estimate?
Conventionally we calculate the 95% CI
95% CI= p + 1.96 S.Ep , Where p = proportion and
S.Ep is the standard error of proportion.
pq where q=100-p
SEp = ----
n
• 95% CI = 4.1 % to 6.9 %
Since the 95% CI is not very wide, the result is
precise
7. 7
4. How do you infer the prevalence of
hypertension for this population?
The prevalence of hypertension for the
population is likely to be between 4.1% and
6.9% (95% of the times)
8. 8
Confidence interval (CI) for prevalence
95% CI= p + 1.96 S.Ep
Where SEp = pq where q = 100-p
n
SEp= 5.5 x 94.5 = 519.75 = 0.72
1000 1000
95% CI = 5.5 + (1.96x0.72)
= 5.5 + 1.4
=4.1% to 6.9%
9. 9
5. Supposing the mean systolic blood
pressure of the sample selected was 120 mm
Hg and the standard deviation of the sample
was 25 mm Hg, how will you infer the mean
systolic pressure of the population?
10. 10
95% CI of the mean
95% CI of mean = Mean + 1.96 Sem (standard
error of mean)
Sem = SD = 25 = 25 = 0.79
n 1000 31.6
95% CI of Mean = 120 + 1.96 (0.79)
= 120+ 1.6
= 118.4 to 121.6
11. 11
Inference
The mean systolic pressure of the
population will be between 118.4 and
121.6 mmHg (95% of the times)
12. 12
Scenario-II (Data not real, only for exercise)
• A study was conducted to find out the
association between usage of the diuretic X
and the occurrence of squamous cell
carcinoma (SCC) of skin.
13. 13
Scenario II (cont.)
• 1129 patients with SCC of skin and 4516 individuals
without SCC were selected from a similar source
population and the usage of the diuretic X was
ascertained. Of the 1129 patients suffering from SCC,
154 were using the diuretic X and among those
without SCC 372 were using the diuretic X.
14. 14
1. What is the type of study design?
CASE-CONTROL STUDY
Selection of patients with SCC of skin (cases)
Selection of subjects without SCC of skin
(controls)
Ascertainment of exposure (i.e. Usage of
diuretic X) among cases and controls
15. 15
2.Is there an association between
usage of diuretic X and SCC of skin?
Cases Controls
Exposed (a) (b)
Not Exposed (c) (d)
16. 16
Is there an association between usage of
diuretic X and SCC of skin?
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed (c) (d)
1129 4516
17. 17
Is there an association between usage of
diuretic X and SCC of skin?
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed 975 (c) 4144 (d)
1129 4516
18. 18
Is there an association between usage of
diuretic X and SCC of skin?
Odds of exposure among cases
Odds ratio = -------------------------------------------
Odds of exposure among controls
a/c ad 154 x 4144 638176
= ----- = ---- = -------------- = --------- = 1.76
b/d bc 372 x 975 362700
Yes there is an association between usage of
diuretic X and SCC of skin
19. 19
3. Interpret the strength and direction
of association
• As Odds ratio is > 1 - Positive association
• Those using diuretic X have 1.76 times
greater risk of developing SCC as compared
to those not using diuretic X
20. 20
4. Is the measure of association precise?
95% CI of Odds ratio
= Anti log of: logn OR ± 1.96 SE of logn OR
= 1.4 to 2.2
Since the 95% CI is not very wide, the result
is precise
21. 21
5. Interpret the 95% CI
The 95 % CI does not include 1, hence the
result (odds ratio 1.76) is statistically
significant
The odds of developing SCC of skin in those
using diuretic X compared to those not using
diuretic X is likely to be between 1.4 to 2.2
(95 times out of 100 times)
22. 22
Calculation of 95% CI for OR
= Anti log of: logn OR ± 1.96 SE of logn OR
= Anti log of: logn (1.76) ± 1.96 SE of logn OR
1 1 1 1
Where SE of logn OR = ---- + ---- + --- + -----
154 372 975 4144
= 0.0065 + 0.0027 + 0.001 + 0.0002
= 0.01045 = 0.102
23. 23
Calculation of 95% CI for OR
logn (1.76) = 0.57, SE of logn OR = 0.102
95%CI = Antilog of : 0.57 + 1.96 x 0.102
=Antilog of: 0.57 + 0.2
= Antilog of: 0.36 to 0.77
= 1.4 to 2.2
25. 25
7. Interpret p value and OR
• Since p-value is < 0.001, the odds ratio of 1.76 is
statistically significant.
• The probability of the odds ratio (1.76) occuring
by chance is < 0.001 and hence it is statistically
significant
26. 26
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (154 x 4144) – (372 x 975) )2 x 5645
= ----------------------------------------------------------
(154+372)(975+4144)(154+975)(372+4144)
( 638176 – 3627200 )2 x 5645
= -----------------------------------------
526 x 5119 x 1129 x 4516
75887026576 x 5645 428382265021520
= ---------------------------- = --------------------------
13728362835016 13728362835016
= 31.2
27. 27
Chi-square & p-value
df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
The table value at 0.1% level of significance is
10.83. The χ2
1 =31.2(p < 0.001)
29. 29
Scenario III (Data not real, only for exercise)
A study was initiated to find out the
association between hypertension and Acute
myocardial infarction (AMI). 950 adults
without AMI were enrolled. Among them 200
were found to have hypertension and 750
were found to have normal blood pressure
30. 30
Scenario III (cont.)
They were followed for 20 years and the
occurrence of AMI during the period was
recorded. 50 of the 200 individuals with
hypertension developed AMI whereas 75 of
the individuals with normal blood pressure
developed AMI.
31. 31
1. What is the type of study design?
Cohort study
Starts with exposure (Hypertension)
Ends with ascertainment of disease (AMI)
32. 32
2. What is the incidence of AMI among
individuals with hypertension?
50
= ------ x 100
200
= 25%
33. 33
3. What is the incidence of AMI among
those without hypertension?
75
= ------ x 100
750
= 10%
34. 34
4. Is there an association between
Acute myocardial infarction and
hypertension?
AMI No AMI Total
Hypertensive (a) (b) 200
Normotensive (c) (d) 750
950
35. 35
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive (c) (d) 750
950
36. 36
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive 75 (c) 675 (d) 750
125 825 950
37. 37
Is there an association between AMI and
hypertension?
I Exposed 50/200 0.25
Relative Risk = ------------- = ---------- = ------
I Unexposed 75/750 0.1
= 2.5
Yes. There is an association between AMI and
hypertension
38. 38
5. Interpret the measure of association
(Relative Risk)
As RR > 1, there is a positive association
between hypertension and occurrence of AMI.
Those with hypertension are at 2.5 times
greater risk of developing AMI as compared to
those with normal blood pressure
39. 39
6. Find out the 95% CI for RR
95% CI for RR
= Anti log of :logn RR ± 1.96 SE of logn RR
= 1.8 to 3.4
40. 40
7. Is the measure of association precise?
(Interpret 95% CI of RR)
Since the 95% CI (1.8 to 3.4) is not very wide it is
precise
The CI does not include 1 hence the result is
statistically significant.
The risk of developing AMI among those with
hypertension is likely to be between 1.8 and 3.4
times more when compared to those with
normal blood pressure (95 times out of 100
times)
41. 41
Calculation of 95 % CI for Relative Risk (RR)
= Anti log of :logn RR ± 1.96 SE of logn RR
1- IE 1 - INE
where SE of logn RR = ------ + -------
a c
1- 0.25 1 – 0.1 0.75 0.9
= ---------- + --------- = ------ + -------
50 75 50 75
= 0.015 + 0.012 = 0.027
= 0.164
42. 42
Calculation of 95 % CI for Relative Risk
logn RR = logn (2.5) = 0.92
= Antilog of: 0.92 + 1.96 (0.164)
= Antilog of: 0.92 + 0.32
= Antilog of: 0.6 to 1.24
= 1.8 to 3.4
44. 44
9. Interpret p-value of RR
Since p value is < 0.001, the Relative Risk of 2.5
is statistically significant.
The probability of Relative Risk 2.5 occuring by
chance is < 0.001 and hence it is statistically
significant
45. 45
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (50 x 675) – (150 x 75) )2 x 950
= ----------------------------------------------------------
(50+150)(75+675)(50+75)(150+675)
( 33750 – 11250 )2 x 950
= ---------------------------------
200 x 750 x 125 x 825
506250000 x 950 480937500000
= ----------------------- = ---------------------
15468750000 15468750000
= 31.09
46. 46
Chi-square
df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
The table value at 0.1% level of significance is
10.83
The χ2
1 =31.09 (p < 0.001)
The probability of Relative Risk of 2.5 occuring
by chance is less than 0.001
48. 48
Scenario IV (Data not real, only for exercise)
A study was conducted to find out the effect
of iron-fortified salt on iron deficiency
anemia in 5-15yr old children. 303 were
randomly assigned to receive either iron-
fortified (n=152) or salt not fortified with iron
(n=151). The mean increase in Hb% at 5
months was 0.3 g/L (+0.10) in group receiving
unfortified salt and 1.5 g/L (+0.25) in group
receiving iron-fortified salt.
49. 49
1. What is the type of study design?
Randomized Controlled trial
50. 50
2. What is the result of the study?
The mean increase in Hb in the iron fortified group
is higher than that in the unfortified group
Iron-fortified salt Unfortified salt
N 152 151
Mean increase in
Hb% at 5m 1.5 g/L 0.3 g/L
S.D. 0.25 0.10
51. 51
3. Is the result statistically significant?
There are two groups of individuals
The observation are independent
The variable to be measured is quantitative
If the distribution is normally distributed the
'Pooled t-test' or 'unpaired t-test' is used
52. 52
Results of the pooled t test
t = 54.5
df = 301
P < 0.001
Yes. The result is statistically significant
53. 53
4. What is your inference?
There is statistically significant difference in
the mean change in Hb levels in those
receiving iron fortified salt in comparision to
those receiving unfortified salt
54. 54
Calculation of Test statistic
Null hypothesis (H0) :There is no difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.
Alternate hypothesis (HA):There is difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.
55. 55
Calculation of test statistic
Observed difference SDp
2 SDp
2
t = ---------------------------- where SE = ------- + -------
Standard Error (SE) n1 n2
Where SDp is the Pooled Standard Deviation
(n1-1)xSD1
2 + (n2-1)xSD2
2
SDp
2 = ----------------------------------
n1+n2-2
(152-1)x0.252 + (151-1)x0.12 (151x0.0625)+(150x0.010)
= ------------------------------------ = ---------------------------------
152+151-2 301
9.44 + 1.5 10.94
= --------------- = ------- = 0.036
301 301
56. 56
Calculation of test statistic
SDp
2 SDp
2 0.036 0.036
Standard Error = ------- + ------- = ------- + --------
n1 n2 152 151
= 0.00024 + 0.00024 = 0.00048 = 0.022
t = (1.5-.3)/0.022 = 1.2/0.022 = 54.5
Find out the degrees of freedom
= (n1+n2-2)=(151+152-2)=301
58. 58
• Look at the table of t values at the desired level
of significance and for the given degree of
freedom for two-tailed test
• Compare it with the test statistic
• Here the t∞ at 5% significance level is 1.96 and
the t∞ at 0.1% significance level is 3.29
• As the calculated value of t is more than the
table value we reject the Null hypothesis
Calculation of test statistic
59. 59
Scenario V (Data not real, only for exercise)
A study was carried out to assess the
performance of a commercial line probe
assay (LPA) for rapid detection of MDR-TB.
Smear-positive sputum specimens were
collected from 92 previously treated TB
patients and subjected to LPA.
60. 60
Scenario V (cont.)
Results were compared with MGIT-DST
(Gold standard) done on all 92 patients at the
same time. 13 patients were positive for
MDR-TB using MGIT-DST out of which 12
were also positive by the line probe assay
(LPA). 76 samples tested negative for MDR-
TB by both the tests
61. 61
1. What is the study design?
Study done at a cross section of time
(Cross sectional) for evaluating diagnostic test
62. 62
2. What is the validity of the test
Sensitivity
Specificity
65. 65
Sensitivity & Specificity
a 12
Sensitivity = ------ x 100 = --- x 100 = 92.3 %
a + c 13
d 76
Specificity = ------ x 100 = --- x 100 = 96.2 %
b + d 79
66. 66
3. Comment on the validity of the test
Sensitivity: If the test is done on MDR-TB
patients it will correctly identify 92.3% as having
MDR-TB
Specificity: If the test is done in individuals not
having MDR-TB it will correctly identify 96.2% as
not having MDR-TB
67. 67
4. What are the predictive values of the
test?
Positive Predictive Value
a 12
= ------ x 100 = --- x 100 = 80 %
a + b 15
Negative Predictive Value
d 76
= ------ x 100 = --- x 100 = 98.7 %
c + d 77
%
%
68. 68
5. Comment on the predictive values of
the test
PPV: If the test is positive for a patient, there is
80% probability that the patient has MDR-TB
NPV: If the test is negative for a patient, there is
98.7% probability that the patient does not have
MDR-TB
69. 69
6. What are the likelihood ratios of the
test
LH Ratio Positive
a/(a+c) 12/13
= --------- = -------- = 24.3
b/(b+d) 3/79
LH Ratio Negative
c/(a+c) 1/13
= ---------- = ------- = 0.08
d/(b+d) 76/79
70. 70
7. Comment on the LH Ratios of the
test
LR +ve: A positive test is 24.3 times more likely
to be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB
71. 71
Comment on the LH Ratios of the test
LR -ve: A negative test is 0.08 times less likely to
be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB
74. 74
8. If the test is positive what is your
inference?
Since the positive predictive value of the test is
80%, when a positive result is obtained, the
probability that the patient has MDR-TB is 80% (
based on which a clinical decision has to be
made)
75. 75
9. If the test is negative what is your
inference?
Since the negative predictive of the test is
98.7%, when a negative result is obtained, the
probability that the patient is not having MDR-
TB is 98.7% (based on which a clinical decision
has to be made).
76. 76
Scenario – VI (Data not real, only for exercise)
An epidemiologist wants to calculate sample
size for a study to find out the prevalence of
adolescent obesity in an urban slum
population, by simple random sampling
method.
77. 77
1. What are the Required Information /
Assumptions needed to calculate
the sample size?
Prevalence (Best assumption from other studies)
Level of significance (α)
Level of precision (d or l) (expected)
78. 78
2. What is the formula used for calculating
the sample size for this study?
d
z pq
n 2
2
)
2
/
1
(
p – Prevalence or Proportion (Best assumption)
q – (1-p)
d – Level of precision (Expected)
Z(1-α) – Normal distribution value for ‘α’
79. 79
3. Why do you use this Formula?
Qualitative Data
Prevalence / Proportion has been provided
80. 80
Required Information
Earlier studies found a prevalence of about
40 per cent as adolescent obesity in urban
slum population.
The epidemiologist wants to have a precision
of 5 per cent and a level of significance of
0.05.
81. 81
4. What are the data and what are the
assumptions given for calculating the sample
size?
Prevalence (p) : 40%
1-p : 60%
Level of significance (α) : 0.05
Standard normal table value: 1.96
Level of precision (d) : 5%
82. 82
5. What is the Sample Size calculated?
d
z pq
n 2
2
)
2
/
1
(
52
60
40
96
.
1 2
n
25
2400
84
.
3
n
369
64
.
368
n
83. 83
6. If the epidemiologist used cluster
sampling method instead of simple
random sampling what will be the sample
size calculated?
Twice (usually) of the SRS
369 x 2 = 738
84. 84
7. How to you account for refusal and non
availability of the selected individuals?
Over sample / Additional samples
5% - 20%
Depends on the attrition/refusal/non
participation/non availability
86. 86
Scenario – VII (Data not real, only for exercise)
A neurologist wants to calculate sample size for
a study to find out the mean level of plasma
phenytoin in patients with seizure disorder
selected from tertiary care hospitals in a city by
simple random sampling method.
87. 87
1. What are the required Information and
assumptions for calculating the sample
size?
Mean (x)
Standard Deviation (σ)
Level of Significance (α)
Level of Precision (d or l) (Required)
88. 88
2. What is the Formula?
d
z
n 2
2
2
)
2
/
1
(
σ – Standard Deviation (SD)
d – Level of precision
Z(1-α) – Standard Normal distribution value for ‘α’
89. 89
3. Why do you use this Formula?
Quantitative data
Mean and Standard Deviation
90. 90
Required Information
Earlier studies found a mean level of 15mcg/l of
plasma phenytoin and standard deviation of
5mcg/l among patients who have seizure
disorder.
The neurologist wants to have a precision of 1.0
mcg/l at 0.05 level of significance.
91. 91
4. What are the data and what are the
assumptions given for calculating sample
size?
Mean : 15
Standard Deviation (σ) : 5
Level of precision (d) : 1
Level of Significance (α) : 0.05
Standard normal table value : 1.96
92. 92
5. What is the Sample Size?
d
z
n 2
2
2
)
2
/
1
(
1
5
96
.
1
2
2
2
n
97
04
.
96
n
93. 93
6. How will you account for refusal and
non availability of the selected individuals?
Over sampling / Additional samples
5% to 20%
Depends on the attrition/refusal/non
participation/non availability
95. 95
Scenario – VII (Data not real, only for exercise)
A physiotherapist wants to calculate sample size for a
clinical trial on patients with knee osteoarthritis to
find out what will be the percentage of patients who
will have pain relief when subjected to
transcutaneous electrical stimulation (TENS)
compared to the percentage of patients who will have
pain relief on routine therapy.
96. 96
1. What are the information / assumptions
needed to calculate the sample size?
Per cent reduction in pain in treatment group
Per cent reduction in pain in control group
Level of Significance (α)
Power of the test (1-β)
97. 97
2. What is the Formula?
2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n
P = (p1 + p2)/2 ; Q = (1 – P)
p1 – Proportion in group 1
p2 – Proportion in group 2
q1 – (1-p1)
q2 – (1-p2)
Z(1-α) – Standard Normal distribution value for ‘α’
Z(1-β) – Standard Normal distribution value for ‘β’
98. 98
3. Why do you use this Formula?
Qualitative Data
Two proportions are given
99. 99
Required Information
Earlier studies show that 65 per cent of
patients subjected to TENS and 25 per cent of
patients on routine therapy had pain relief.
The physiotherapist wants to have 90 per
cent power and 5 per cent level of
significance for the study.
100. 100
4. What are the data and what are the
assumptions given for calculating the sample
size?
Proportion 1 : 25%
Proportion 2 : 65%
Level of Significance (α) : 0.05
Standard normal table value: 1.96
Power of the test (1-β) : 90%
Standard normal table value : 1.28
101. 101
5. What is the Sample Size Calculated?
2
65
.
0
25
.
0
2
35
.
0
65
.
0
75
.
0
25
.
0
28
.
1
495
.
0
96
.
1
n
2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n
31
3
.
30
= ≈
n
31 subjects in each group
102. 102
6. How will you account for refusal and
non availability of the selected
individuals?
Over Sampling / Additional Samples
5% to 20%
Depends on the attrition/refusal/non
participation/non availability
104. 104
Scenario – IX (Data not real, only for exercise)
A physician wants to calculate sample size for a
clinical trial to find out the mean reduction in
systolic blood pressure among hypertensive patients
subjected to an experimental drug, compared to
hypertensive patients on routine therapy.
105. 105
1. What are the required Information/
assumptions needed to calculate
the sample size?
Mean SBP in intervention group (x1)
Mean SBP in control group (x2)
SD for intervention group (σ1)
SD for control group (σ2)
Level of Significance (α)
Power of the test (1-β)
106. 106
2. What is the Formula?
2
2
1
2
/
1
2
2
d
p z
z
s
n
Sp – Pooled Standard Deviation (SD)
μd – Mean Difference
Z(1-α) – Standard Normal distribution value for ‘α’
Z(1-β) – Standard Normal distribution value for ‘β’
107. 107
3. Why do you use this Formula?
Quantitative Data
Means and Standard Deviations
108. 108
Required Information
Earlier studies have shown a reduction in mean
systolic blood pressure from 180 to 120 mmHg.,
for the experimental drug and from 180 to 140
mmHg., for the routine therapy. The standard
deviation is 40 mmHg., for both the groups.
The physician wants to have 5 per cent of
statistical significance and 90 per cent power for
his study.
109. 109
4. What are the data and what are the
assumptions given for calculating the
sample size?
Mean Difference : 20
Standard Deviation 1 : 40
Standard Deviation 2 : 40
Level of Significance (α) : 0.05
Standard normal table value for α : 1.96
Power of the Test (1-β) : 90%
Standard normal table value for (1-β): 1.28
110. 110
5. What is the Sample Size?
2
2
1
2
/
1
2
2
d
p z
z
s
n
[ ]
20
28
.
1
+
96
.
1
40
×
2
2
2 2
=
n
84
98
.
83
= ≈
n
84 subjects in each group
111. 111
6. How will you account for refusal and
non availability of the selected
individuals?
Over Sampling / Additional Samples
Depends on the attrition/refusal/non
participation/non availability