Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt

1
DATA ANALYSIS & INTERPRETATION
HANDS-ON

2
Outline
 Scenarios 1 – 5: Data analysis &
Interpretation in various study designs
 Scenarios 6 – 9 : Sample size calculation for
qualitative and quantitative data
 Outputs from PEPI software

3
Scenario I (Data not real, only for exercise)
 In a study to find out the prevalence of
hypertension, 1,000 adults were selected by
Simple Random Sampling method from a
population of 1,00,000. All the 1000 Adults were
contacted once and their blood pressure was
measured as per the standard guidelines and 55
were found to be having hypertension.

4
1. What is the type of study design?
 Cross-sectional study
 Study carried out at one point in time

5
2. What is the prevalence of
hypertension in the sample selected?
55
Prevalence = ----- x 100 = 5.5 %
1000

6
3. How precise is the estimate?
 Conventionally we calculate the 95% CI
 95% CI= p + 1.96 S.Ep , Where p = proportion and
S.Ep is the standard error of proportion.
pq where q=100-p
SEp = ----
n
• 95% CI = 4.1 % to 6.9 %
 Since the 95% CI is not very wide, the result is
precise

7
4. How do you infer the prevalence of
hypertension for this population?
The prevalence of hypertension for the
population is likely to be between 4.1% and
6.9% (95% of the times)

8
Confidence interval (CI) for prevalence
 95% CI= p + 1.96 S.Ep
 Where SEp = pq where q = 100-p
n
SEp= 5.5 x 94.5 = 519.75 = 0.72
1000 1000
95% CI = 5.5 + (1.96x0.72)
= 5.5 + 1.4
=4.1% to 6.9%

9
5. Supposing the mean systolic blood
pressure of the sample selected was 120 mm
Hg and the standard deviation of the sample
was 25 mm Hg, how will you infer the mean
systolic pressure of the population?

10
95% CI of the mean
95% CI of mean = Mean + 1.96 Sem (standard
error of mean)
Sem = SD = 25 = 25 = 0.79
n 1000 31.6
95% CI of Mean = 120 + 1.96 (0.79)
= 120+ 1.6
= 118.4 to 121.6

11
Inference
 The mean systolic pressure of the
population will be between 118.4 and
121.6 mmHg (95% of the times)

12
Scenario-II (Data not real, only for exercise)
• A study was conducted to find out the
association between usage of the diuretic X
and the occurrence of squamous cell
carcinoma (SCC) of skin.

13
Scenario II (cont.)
• 1129 patients with SCC of skin and 4516 individuals
without SCC were selected from a similar source
population and the usage of the diuretic X was
ascertained. Of the 1129 patients suffering from SCC,
154 were using the diuretic X and among those
without SCC 372 were using the diuretic X.

14
 CASE-CONTROL STUDY
 Selection of patients with SCC of skin (cases)
 Selection of subjects without SCC of skin
(controls)
 Ascertainment of exposure (i.e. Usage of
diuretic X) among cases and controls

15
2.Is there an association between
usage of diuretic X and SCC of skin?
Cases Controls
Exposed (a) (b)
Not Exposed (c) (d)

16
Is there an association between usage of
diuretic X and SCC of skin?
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed (c) (d)
1129 4516

17
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed 975 (c) 4144 (d)
1129 4516

18
Odds of exposure among cases
Odds ratio = -------------------------------------------
Odds of exposure among controls
a/c ad 154 x 4144 638176
= ----- = ---- = -------------- = --------- = 1.76
b/d bc 372 x 975 362700
 Yes there is an association between usage of
diuretic X and SCC of skin

19
3. Interpret the strength and direction
of association
• As Odds ratio is > 1 - Positive association
• Those using diuretic X have 1.76 times
greater risk of developing SCC as compared
to those not using diuretic X

20
4. Is the measure of association precise?
 95% CI of Odds ratio
 = Anti log of: logn OR ± 1.96 SE of logn OR
 = 1.4 to 2.2
 Since the 95% CI is not very wide, the result
is precise

21
5. Interpret the 95% CI
The 95 % CI does not include 1, hence the
result (odds ratio 1.76) is statistically
significant
The odds of developing SCC of skin in those
using diuretic X compared to those not using
diuretic X is likely to be between 1.4 to 2.2
(95 times out of 100 times)

22
Calculation of 95% CI for OR
= Anti log of: logn OR ± 1.96 SE of logn OR
= Anti log of: logn (1.76) ± 1.96 SE of logn OR
1 1 1 1
Where SE of logn OR = ---- + ---- + --- + -----
154 372 975 4144
= 0.0065 + 0.0027 + 0.001 + 0.0002
= 0.01045 = 0.102

23
Calculation of 95% CI for OR
logn (1.76) = 0.57, SE of logn OR = 0.102
95%CI = Antilog of : 0.57 + 1.96 x 0.102
=Antilog of: 0.57 + 0.2
= Antilog of: 0.36 to 0.77
= 1.4 to 2.2

24
6. Find out the p-value
The χ2
1 =31.2 (p < 0.001 )

25
7. Interpret p value and OR
• Since p-value is < 0.001, the odds ratio of 1.76 is
statistically significant.
• The probability of the odds ratio (1.76) occuring
by chance is < 0.001 and hence it is statistically
significant

26
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (154 x 4144) – (372 x 975) )2 x 5645
= ----------------------------------------------------------
(154+372)(975+4144)(154+975)(372+4144)
( 638176 – 3627200 )2 x 5645
= -----------------------------------------
526 x 5119 x 1129 x 4516
75887026576 x 5645 428382265021520
= ---------------------------- = --------------------------
13728362835016 13728362835016
= 31.2

27
Chi-square & p-value
 df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
 The table value at 0.1% level of significance is
10.83. The χ2
1 =31.2(p < 0.001)

29
Scenario III (Data not real, only for exercise)
 A study was initiated to find out the
association between hypertension and Acute
myocardial infarction (AMI). 950 adults
without AMI were enrolled. Among them 200
were found to have hypertension and 750
were found to have normal blood pressure

30
Scenario III (cont.)
 They were followed for 20 years and the
occurrence of AMI during the period was
recorded. 50 of the 200 individuals with
hypertension developed AMI whereas 75 of
the individuals with normal blood pressure
developed AMI.

31
 Cohort study
 Starts with exposure (Hypertension)
 Ends with ascertainment of disease (AMI)

32
2. What is the incidence of AMI among
individuals with hypertension?
50
= ------ x 100
200
= 25%

33
3. What is the incidence of AMI among
those without hypertension?
75
= ------ x 100
750
= 10%

34
4. Is there an association between
Acute myocardial infarction and
hypertension?
AMI No AMI Total
Hypertensive (a) (b) 200
Normotensive (c) (d) 750
950

35
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive (c) (d) 750
950

36
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive 75 (c) 675 (d) 750
125 825 950

37
Is there an association between AMI and
hypertension?
I Exposed 50/200 0.25
Relative Risk = ------------- = ---------- = ------
I Unexposed 75/750 0.1
= 2.5
 Yes. There is an association between AMI and
hypertension

38
5. Interpret the measure of association
(Relative Risk)
 As RR > 1, there is a positive association
between hypertension and occurrence of AMI.
 Those with hypertension are at 2.5 times
greater risk of developing AMI as compared to
those with normal blood pressure

39
6. Find out the 95% CI for RR
95% CI for RR
= Anti log of :logn RR ± 1.96 SE of logn RR
= 1.8 to 3.4

40
7. Is the measure of association precise?
(Interpret 95% CI of RR)
 Since the 95% CI (1.8 to 3.4) is not very wide it is
precise
 The CI does not include 1 hence the result is
statistically significant.
 The risk of developing AMI among those with
hypertension is likely to be between 1.8 and 3.4
times more when compared to those with
normal blood pressure (95 times out of 100
times)

41
Calculation of 95 % CI for Relative Risk (RR)
= Anti log of :logn RR ± 1.96 SE of logn RR
1- IE 1 - INE
where SE of logn RR = ------ + -------
a c
1- 0.25 1 – 0.1 0.75 0.9
= ---------- + --------- = ------ + -------
50 75 50 75
= 0.015 + 0.012 = 0.027
= 0.164

42
Calculation of 95 % CI for Relative Risk
logn RR = logn (2.5) = 0.92
= Antilog of: 0.92 + 1.96 (0.164)
= Antilog of: 0.92 + 0.32
= Antilog of: 0.6 to 1.24
= 1.8 to 3.4

43
8. Find out p-value
 The χ2
1 =31.09 (p < 0.001)

44
9. Interpret p-value of RR
 Since p value is < 0.001, the Relative Risk of 2.5
is statistically significant.
 The probability of Relative Risk 2.5 occuring by
chance is < 0.001 and hence it is statistically
significant

45
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (50 x 675) – (150 x 75) )2 x 950
= ----------------------------------------------------------
(50+150)(75+675)(50+75)(150+675)
( 33750 – 11250 )2 x 950
= ---------------------------------
200 x 750 x 125 x 825
506250000 x 950 480937500000
= ----------------------- = ---------------------
15468750000 15468750000
= 31.09

46
Chi-square
 df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
 The table value at 0.1% level of significance is
10.83
 The χ2
1 =31.09 (p < 0.001)
 The probability of Relative Risk of 2.5 occuring
by chance is less than 0.001

48
Scenario IV (Data not real, only for exercise)
 A study was conducted to find out the effect
of iron-fortified salt on iron deficiency
anemia in 5-15yr old children. 303 were
randomly assigned to receive either iron-
fortified (n=152) or salt not fortified with iron
(n=151). The mean increase in Hb% at 5
months was 0.3 g/L (+0.10) in group receiving
unfortified salt and 1.5 g/L (+0.25) in group
receiving iron-fortified salt.

49
 Randomized Controlled trial

50
2. What is the result of the study?
The mean increase in Hb in the iron fortified group
is higher than that in the unfortified group
Iron-fortified salt Unfortified salt
N 152 151
Mean increase in
Hb% at 5m 1.5 g/L 0.3 g/L
S.D. 0.25 0.10

51
3. Is the result statistically significant?
 There are two groups of individuals
 The observation are independent
 The variable to be measured is quantitative
 If the distribution is normally distributed the
'Pooled t-test' or 'unpaired t-test' is used

52
Results of the pooled t test
 t = 54.5
 df = 301
 P < 0.001
Yes. The result is statistically significant

53
4. What is your inference?
There is statistically significant difference in
the mean change in Hb levels in those
receiving iron fortified salt in comparision to
those receiving unfortified salt

54
Calculation of Test statistic
 Null hypothesis (H0) :There is no difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.
 Alternate hypothesis (HA):There is difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.

55
Calculation of test statistic
Observed difference SDp
2 SDp
2
t = ---------------------------- where SE = ------- + -------
Standard Error (SE) n1 n2
Where SDp is the Pooled Standard Deviation
(n1-1)xSD1
2 + (n2-1)xSD2
2
SDp
2 = ----------------------------------
n1+n2-2
(152-1)x0.252 + (151-1)x0.12 (151x0.0625)+(150x0.010)
= ------------------------------------ = ---------------------------------
152+151-2 301
9.44 + 1.5 10.94
= --------------- = ------- = 0.036
301 301

56
SDp
2 SDp
2 0.036 0.036
Standard Error = ------- + ------- = ------- + --------
n1 n2 152 151
= 0.00024 + 0.00024 = 0.00048 = 0.022
t = (1.5-.3)/0.022 = 1.2/0.022 = 54.5
Find out the degrees of freedom
= (n1+n2-2)=(151+152-2)=301

58
• Look at the table of t values at the desired level
of significance and for the given degree of
freedom for two-tailed test
• Compare it with the test statistic
• Here the t∞ at 5% significance level is 1.96 and
the t∞ at 0.1% significance level is 3.29
• As the calculated value of t is more than the
table value we reject the Null hypothesis

59
Scenario V (Data not real, only for exercise)
 A study was carried out to assess the
performance of a commercial line probe
assay (LPA) for rapid detection of MDR-TB.
Smear-positive sputum specimens were
collected from 92 previously treated TB
patients and subjected to LPA.

60
Scenario V (cont.)
 Results were compared with MGIT-DST
(Gold standard) done on all 92 patients at the
same time. 13 patients were positive for
MDR-TB using MGIT-DST out of which 12
were also positive by the line probe assay
(LPA). 76 samples tested negative for MDR-
TB by both the tests

61
1. What is the study design?
 Study done at a cross section of time
(Cross sectional) for evaluating diagnostic test

62
2. What is the validity of the test
 Sensitivity
 Specificity

63
MGIT-DST
positive
MGIT-DST
Negative
LPA +ve 12 (a) (b)
LPA -ve (c) 76 (d)
13 92
2 x 2 table

64
MGIT-DST
positive
MGIT-DST
Negative
LPA +ve 12 (a) 3 (b) 15
LPA -ve 1 (c) 76 (d) 77
13 79 92
2 x 2 table

65
Sensitivity & Specificity
a 12
Sensitivity = ------ x 100 = --- x 100 = 92.3 %
a + c 13
d 76
Specificity = ------ x 100 = --- x 100 = 96.2 %
b + d 79

66
3. Comment on the validity of the test
 Sensitivity: If the test is done on MDR-TB
patients it will correctly identify 92.3% as having
MDR-TB
 Specificity: If the test is done in individuals not
having MDR-TB it will correctly identify 96.2% as
not having MDR-TB

67
4. What are the predictive values of the
test?
Positive Predictive Value
a 12
= ------ x 100 = --- x 100 = 80 %
a + b 15
Negative Predictive Value
d 76
= ------ x 100 = --- x 100 = 98.7 %
c + d 77
%
%

68
5. Comment on the predictive values of
the test
 PPV: If the test is positive for a patient, there is
80% probability that the patient has MDR-TB
 NPV: If the test is negative for a patient, there is
98.7% probability that the patient does not have
MDR-TB

69
6. What are the likelihood ratios of the
test
LH Ratio Positive
a/(a+c) 12/13
= --------- = -------- = 24.3
b/(b+d) 3/79
LH Ratio Negative
c/(a+c) 1/13
= ---------- = ------- = 0.08
d/(b+d) 76/79

70
7. Comment on the LH Ratios of the
test
 LR +ve: A positive test is 24.3 times more likely
to be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB

71
Comment on the LH Ratios of the test
 LR -ve: A negative test is 0.08 times less likely to
be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB

72
Summary of findings
Sensitivity 92.3 %
Specificity 96.2 %
PPV 80 %
NPV 98.7 %
LHR Pos 24.3
LHR Neg 0.08

74
8. If the test is positive what is your
inference?
 Since the positive predictive value of the test is
80%, when a positive result is obtained, the
probability that the patient has MDR-TB is 80% (
based on which a clinical decision has to be
made)

75
9. If the test is negative what is your
inference?
 Since the negative predictive of the test is
98.7%, when a negative result is obtained, the
probability that the patient is not having MDR-
TB is 98.7% (based on which a clinical decision
has to be made).

76
Scenario – VI (Data not real, only for exercise)
 An epidemiologist wants to calculate sample
size for a study to find out the prevalence of
adolescent obesity in an urban slum
population, by simple random sampling
method.

77
1. What are the Required Information /
Assumptions needed to calculate
the sample size?
 Prevalence (Best assumption from other studies)
 Level of significance (α)
 Level of precision (d or l) (expected)

78
2. What is the formula used for calculating
the sample size for this study?
d
z pq
n 2
2
)
2
/
1
( 


p – Prevalence or Proportion (Best assumption)
q – (1-p)
d – Level of precision (Expected)
Z(1-α) – Normal distribution value for ‘α’

79
3. Why do you use this Formula?
 Qualitative Data
 Prevalence / Proportion has been provided

80
Required Information
 Earlier studies found a prevalence of about
40 per cent as adolescent obesity in urban
slum population.
 The epidemiologist wants to have a precision
of 5 per cent and a level of significance of
0.05.

81
4. What are the data and what are the
assumptions given for calculating the sample
size?
 Prevalence (p) : 40%
 1-p : 60%
 Level of significance (α) : 0.05
 Standard normal table value: 1.96
 Level of precision (d) : 5%

82
5. What is the Sample Size calculated?
d
z pq
n 2
2
)
2
/
1
( 


52
60
40
96
.
1 2 


n
25
2400
84
.
3 

n
369
64
.
368 

n

83
6. If the epidemiologist used cluster
sampling method instead of simple
random sampling what will be the sample
size calculated?
 Twice (usually) of the SRS
 369 x 2 = 738

84
7. How to you account for refusal and non
availability of the selected individuals?
 Over sample / Additional samples
 5% - 20%
 Depends on the attrition/refusal/non
participation/non availability

86
Scenario – VII (Data not real, only for exercise)
 A neurologist wants to calculate sample size for
a study to find out the mean level of plasma
phenytoin in patients with seizure disorder
selected from tertiary care hospitals in a city by
simple random sampling method.

87
1. What are the required Information and
assumptions for calculating the sample
size?
 Mean (x)
 Standard Deviation (σ)
 Level of Significance (α)
 Level of Precision (d or l) (Required)

88
2. What is the Formula?
d
z
n 2
2
2
)
2
/
1
( 



σ – Standard Deviation (SD)
d – Level of precision
Z(1-α) – Standard Normal distribution value for ‘α’

89
 Quantitative data
 Mean and Standard Deviation

90
 Earlier studies found a mean level of 15mcg/l of
plasma phenytoin and standard deviation of
5mcg/l among patients who have seizure
disorder.
 The neurologist wants to have a precision of 1.0
mcg/l at 0.05 level of significance.

91
assumptions given for calculating sample
size?
 Mean : 15
 Standard Deviation (σ) : 5
 Level of precision (d) : 1
 Level of Significance (α) : 0.05
 Standard normal table value : 1.96

92
5. What is the Sample Size?
d
z
n 2
2
2
)
2
/
1
( 



1
5
96
.
1
2
2
2


n
97
04
.
96 

n

93
6. How will you account for refusal and
non availability of the selected individuals?
 Over sampling / Additional samples
 5% to 20%

95
Scenario – VII (Data not real, only for exercise)
 A physiotherapist wants to calculate sample size for a
clinical trial on patients with knee osteoarthritis to
find out what will be the percentage of patients who
will have pain relief when subjected to
transcutaneous electrical stimulation (TENS)
compared to the percentage of patients who will have
pain relief on routine therapy.

96
1. What are the information / assumptions
needed to calculate the sample size?
Per cent reduction in pain in treatment group
Per cent reduction in pain in control group
Level of Significance (α)
Power of the test (1-β)

97
 
 2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n












 



 

P = (p1 + p2)/2 ; Q = (1 – P)
p1 – Proportion in group 1
p2 – Proportion in group 2
q1 – (1-p1)
q2 – (1-p2)
Z(1-β) – Standard Normal distribution value for ‘β’

98
 Qualitative Data
 Two proportions are given

99
 Earlier studies show that 65 per cent of
patients subjected to TENS and 25 per cent of
patients on routine therapy had pain relief.
 The physiotherapist wants to have 90 per
cent power and 5 per cent level of
significance for the study.

100
assumptions given for calculating the sample
size?
 Proportion 1 : 25%
 Proportion 2 : 65%
 Standard normal table value: 1.96
 Power of the test (1-β) : 90%
 Standard normal table value : 1.28

101
5. What is the Sample Size Calculated?
   
 
 2
65
.
0
25
.
0
2
35
.
0
65
.
0
75
.
0
25
.
0
28
.
1
495
.
0
96
.
1






n
 
 2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n












 



 

31
3
.
30
= ≈
n
31 subjects in each group

102
non availability of the selected
individuals?
 Over Sampling / Additional Samples
 5% to 20%

104
Scenario – IX (Data not real, only for exercise)
 A physician wants to calculate sample size for a
clinical trial to find out the mean reduction in
systolic blood pressure among hypertensive patients
subjected to an experimental drug, compared to
hypertensive patients on routine therapy.

105
1. What are the required Information/
assumptions needed to calculate
the sample size?
 Mean SBP in intervention group (x1)
 Mean SBP in control group (x2)
 SD for intervention group (σ1)
 SD for control group (σ2)
 Level of Significance (α)
 Power of the test (1-β)

106
 



2
2
1
2
/
1
2
2
d
p z
z
s
n




Sp – Pooled Standard Deviation (SD)
μd – Mean Difference
Z(1-β) – Standard Normal distribution value for ‘β’

107
 Quantitative Data
 Means and Standard Deviations

108
 Earlier studies have shown a reduction in mean
systolic blood pressure from 180 to 120 mmHg.,
for the experimental drug and from 180 to 140
mmHg., for the routine therapy. The standard
deviation is 40 mmHg., for both the groups.
 The physician wants to have 5 per cent of
statistical significance and 90 per cent power for
his study.

109
assumptions given for calculating the
sample size?
 Mean Difference : 20
 Standard Deviation 1 : 40
 Standard Deviation 2 : 40
 Standard normal table value for α : 1.96
 Power of the Test (1-β) : 90%
 Standard normal table value for (1-β): 1.28

110
5. What is the Sample Size?
 



2
2
1
2
/
1
2
2
d
p z
z
s
n




[ ]
20
28
.
1
+
96
.
1
40
×
2
2
2 2
=
n
84
98
.
83
= ≈
n
84 subjects in each group

111
non availability of the selected
individuals?
 Over Sampling / Additional Samples

113
Summary
 Data analysis (prevalence, OR, RR, mean, SD, Sn, Sp, PPV,
NPV , 95% CI)
 Application of appropriate statistical tests (t test, 2 test)
 Interpretation of the results (OR > 1, RR > 1, 95% CI)
 Sample size assumptions (, 1-, precision (d), one-tailed,
two-tailed)
 Basic data required for sample size calculation
 Sample size calculation in different scenarios

Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt

Recommended

Recommended

More Related Content

Similar to Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt

Similar to Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt (20)

More from Vanithadurai

More from Vanithadurai (7)

Recently uploaded

Recently uploaded (20)

Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt