SlideShare a Scribd company logo
1 of 114
1
DATA ANALYSIS & INTERPRETATION
HANDS-ON
2
Outline
 Scenarios 1 – 5: Data analysis &
Interpretation in various study designs
 Scenarios 6 – 9 : Sample size calculation for
qualitative and quantitative data
 Outputs from PEPI software
3
Scenario I (Data not real, only for exercise)
 In a study to find out the prevalence of
hypertension, 1,000 adults were selected by
Simple Random Sampling method from a
population of 1,00,000. All the 1000 Adults were
contacted once and their blood pressure was
measured as per the standard guidelines and 55
were found to be having hypertension.
4
1. What is the type of study design?
 Cross-sectional study
 Study carried out at one point in time
5
2. What is the prevalence of
hypertension in the sample selected?
55
Prevalence = ----- x 100 = 5.5 %
1000
6
3. How precise is the estimate?
 Conventionally we calculate the 95% CI
 95% CI= p + 1.96 S.Ep , Where p = proportion and
S.Ep is the standard error of proportion.
pq where q=100-p
SEp = ----
n
• 95% CI = 4.1 % to 6.9 %
 Since the 95% CI is not very wide, the result is
precise
7
4. How do you infer the prevalence of
hypertension for this population?
The prevalence of hypertension for the
population is likely to be between 4.1% and
6.9% (95% of the times)
8
Confidence interval (CI) for prevalence
 95% CI= p + 1.96 S.Ep
 Where SEp = pq where q = 100-p
n
SEp= 5.5 x 94.5 = 519.75 = 0.72
1000 1000
95% CI = 5.5 + (1.96x0.72)
= 5.5 + 1.4
=4.1% to 6.9%
9
5. Supposing the mean systolic blood
pressure of the sample selected was 120 mm
Hg and the standard deviation of the sample
was 25 mm Hg, how will you infer the mean
systolic pressure of the population?
10
95% CI of the mean
95% CI of mean = Mean + 1.96 Sem (standard
error of mean)
Sem = SD = 25 = 25 = 0.79
n 1000 31.6
95% CI of Mean = 120 + 1.96 (0.79)
= 120+ 1.6
= 118.4 to 121.6
11
Inference
 The mean systolic pressure of the
population will be between 118.4 and
121.6 mmHg (95% of the times)
12
Scenario-II (Data not real, only for exercise)
• A study was conducted to find out the
association between usage of the diuretic X
and the occurrence of squamous cell
carcinoma (SCC) of skin.
13
Scenario II (cont.)
• 1129 patients with SCC of skin and 4516 individuals
without SCC were selected from a similar source
population and the usage of the diuretic X was
ascertained. Of the 1129 patients suffering from SCC,
154 were using the diuretic X and among those
without SCC 372 were using the diuretic X.
14
1. What is the type of study design?
 CASE-CONTROL STUDY
 Selection of patients with SCC of skin (cases)
 Selection of subjects without SCC of skin
(controls)
 Ascertainment of exposure (i.e. Usage of
diuretic X) among cases and controls
15
2.Is there an association between
usage of diuretic X and SCC of skin?
Cases Controls
Exposed (a) (b)
Not Exposed (c) (d)
16
Is there an association between usage of
diuretic X and SCC of skin?
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed (c) (d)
1129 4516
17
Is there an association between usage of
diuretic X and SCC of skin?
Cases Controls
Exposed 154 (a) 372 (b)
Not Exposed 975 (c) 4144 (d)
1129 4516
18
Is there an association between usage of
diuretic X and SCC of skin?
Odds of exposure among cases
Odds ratio = -------------------------------------------
Odds of exposure among controls
a/c ad 154 x 4144 638176
= ----- = ---- = -------------- = --------- = 1.76
b/d bc 372 x 975 362700
 Yes there is an association between usage of
diuretic X and SCC of skin
19
3. Interpret the strength and direction
of association
• As Odds ratio is > 1 - Positive association
• Those using diuretic X have 1.76 times
greater risk of developing SCC as compared
to those not using diuretic X
20
4. Is the measure of association precise?
 95% CI of Odds ratio
 = Anti log of: logn OR ± 1.96 SE of logn OR
 = 1.4 to 2.2
 Since the 95% CI is not very wide, the result
is precise
21
5. Interpret the 95% CI
The 95 % CI does not include 1, hence the
result (odds ratio 1.76) is statistically
significant
The odds of developing SCC of skin in those
using diuretic X compared to those not using
diuretic X is likely to be between 1.4 to 2.2
(95 times out of 100 times)
22
Calculation of 95% CI for OR
= Anti log of: logn OR ± 1.96 SE of logn OR
= Anti log of: logn (1.76) ± 1.96 SE of logn OR
1 1 1 1
Where SE of logn OR = ---- + ---- + --- + -----
154 372 975 4144
= 0.0065 + 0.0027 + 0.001 + 0.0002
= 0.01045 = 0.102
23
Calculation of 95% CI for OR
logn (1.76) = 0.57, SE of logn OR = 0.102
95%CI = Antilog of : 0.57 + 1.96 x 0.102
=Antilog of: 0.57 + 0.2
= Antilog of: 0.36 to 0.77
= 1.4 to 2.2
24
6. Find out the p-value
The χ2
1 =31.2 (p < 0.001 )
25
7. Interpret p value and OR
• Since p-value is < 0.001, the odds ratio of 1.76 is
statistically significant.
• The probability of the odds ratio (1.76) occuring
by chance is < 0.001 and hence it is statistically
significant
26
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (154 x 4144) – (372 x 975) )2 x 5645
= ----------------------------------------------------------
(154+372)(975+4144)(154+975)(372+4144)
( 638176 – 3627200 )2 x 5645
= -----------------------------------------
526 x 5119 x 1129 x 4516
75887026576 x 5645 428382265021520
= ---------------------------- = --------------------------
13728362835016 13728362835016
= 31.2
27
Chi-square & p-value
 df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
 The table value at 0.1% level of significance is
10.83. The χ2
1 =31.2(p < 0.001)
28
29
Scenario III (Data not real, only for exercise)
 A study was initiated to find out the
association between hypertension and Acute
myocardial infarction (AMI). 950 adults
without AMI were enrolled. Among them 200
were found to have hypertension and 750
were found to have normal blood pressure
30
Scenario III (cont.)
 They were followed for 20 years and the
occurrence of AMI during the period was
recorded. 50 of the 200 individuals with
hypertension developed AMI whereas 75 of
the individuals with normal blood pressure
developed AMI.
31
1. What is the type of study design?
 Cohort study
 Starts with exposure (Hypertension)
 Ends with ascertainment of disease (AMI)
32
2. What is the incidence of AMI among
individuals with hypertension?
50
= ------ x 100
200
= 25%
33
3. What is the incidence of AMI among
those without hypertension?
75
= ------ x 100
750
= 10%
34
4. Is there an association between
Acute myocardial infarction and
hypertension?
AMI No AMI Total
Hypertensive (a) (b) 200
Normotensive (c) (d) 750
950
35
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive (c) (d) 750
950
36
Is there an association between Acute
myocardial infarction and hypertension?
AMI No AMI Total
Hypertensive 50 (a) 150 (b) 200
Normotensive 75 (c) 675 (d) 750
125 825 950
37
Is there an association between AMI and
hypertension?
I Exposed 50/200 0.25
Relative Risk = ------------- = ---------- = ------
I Unexposed 75/750 0.1
= 2.5
 Yes. There is an association between AMI and
hypertension
38
5. Interpret the measure of association
(Relative Risk)
 As RR > 1, there is a positive association
between hypertension and occurrence of AMI.
 Those with hypertension are at 2.5 times
greater risk of developing AMI as compared to
those with normal blood pressure
39
6. Find out the 95% CI for RR
95% CI for RR
= Anti log of :logn RR ± 1.96 SE of logn RR
= 1.8 to 3.4
40
7. Is the measure of association precise?
(Interpret 95% CI of RR)
 Since the 95% CI (1.8 to 3.4) is not very wide it is
precise
 The CI does not include 1 hence the result is
statistically significant.
 The risk of developing AMI among those with
hypertension is likely to be between 1.8 and 3.4
times more when compared to those with
normal blood pressure (95 times out of 100
times)
41
Calculation of 95 % CI for Relative Risk (RR)
= Anti log of :logn RR ± 1.96 SE of logn RR
1- IE 1 - INE
where SE of logn RR = ------ + -------
a c
1- 0.25 1 – 0.1 0.75 0.9
= ---------- + --------- = ------ + -------
50 75 50 75
= 0.015 + 0.012 = 0.027
= 0.164
42
Calculation of 95 % CI for Relative Risk
logn RR = logn (2.5) = 0.92
= Antilog of: 0.92 + 1.96 (0.164)
= Antilog of: 0.92 + 0.32
= Antilog of: 0.6 to 1.24
= 1.8 to 3.4
43
8. Find out p-value
 The χ2
1 =31.09 (p < 0.001)
44
9. Interpret p-value of RR
 Since p value is < 0.001, the Relative Risk of 2.5
is statistically significant.
 The probability of Relative Risk 2.5 occuring by
chance is < 0.001 and hence it is statistically
significant
45
Calculation of Chi-square & p-value
(ad – bc)2 x N
Χ1
2 = --------------------------
(a+b)(c+d)(a+c)(b+d)
( (50 x 675) – (150 x 75) )2 x 950
= ----------------------------------------------------------
(50+150)(75+675)(50+75)(150+675)
( 33750 – 11250 )2 x 950
= ---------------------------------
200 x 750 x 125 x 825
506250000 x 950 480937500000
= ----------------------- = ---------------------
15468750000 15468750000
= 31.09
46
Chi-square
 df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1
 The table value at 0.1% level of significance is
10.83
 The χ2
1 =31.09 (p < 0.001)
 The probability of Relative Risk of 2.5 occuring
by chance is less than 0.001
47
48
Scenario IV (Data not real, only for exercise)
 A study was conducted to find out the effect
of iron-fortified salt on iron deficiency
anemia in 5-15yr old children. 303 were
randomly assigned to receive either iron-
fortified (n=152) or salt not fortified with iron
(n=151). The mean increase in Hb% at 5
months was 0.3 g/L (+0.10) in group receiving
unfortified salt and 1.5 g/L (+0.25) in group
receiving iron-fortified salt.
49
1. What is the type of study design?
 Randomized Controlled trial
50
2. What is the result of the study?
The mean increase in Hb in the iron fortified group
is higher than that in the unfortified group
Iron-fortified salt Unfortified salt
N 152 151
Mean increase in
Hb% at 5m 1.5 g/L 0.3 g/L
S.D. 0.25 0.10
51
3. Is the result statistically significant?
 There are two groups of individuals
 The observation are independent
 The variable to be measured is quantitative
 If the distribution is normally distributed the
'Pooled t-test' or 'unpaired t-test' is used
52
Results of the pooled t test
 t = 54.5
 df = 301
 P < 0.001
Yes. The result is statistically significant
53
4. What is your inference?
There is statistically significant difference in
the mean change in Hb levels in those
receiving iron fortified salt in comparision to
those receiving unfortified salt
54
Calculation of Test statistic
 Null hypothesis (H0) :There is no difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.
 Alternate hypothesis (HA):There is difference in
mean change in Hb in those receiving iron
fortified salt and unfortified salt.
55
Calculation of test statistic
Observed difference SDp
2 SDp
2
t = ---------------------------- where SE = ------- + -------
Standard Error (SE) n1 n2
Where SDp is the Pooled Standard Deviation
(n1-1)xSD1
2 + (n2-1)xSD2
2
SDp
2 = ----------------------------------
n1+n2-2
(152-1)x0.252 + (151-1)x0.12 (151x0.0625)+(150x0.010)
= ------------------------------------ = ---------------------------------
152+151-2 301
9.44 + 1.5 10.94
= --------------- = ------- = 0.036
301 301
56
Calculation of test statistic
SDp
2 SDp
2 0.036 0.036
Standard Error = ------- + ------- = ------- + --------
n1 n2 152 151
= 0.00024 + 0.00024 = 0.00048 = 0.022
t = (1.5-.3)/0.022 = 1.2/0.022 = 54.5
Find out the degrees of freedom
= (n1+n2-2)=(151+152-2)=301
57
58
• Look at the table of t values at the desired level
of significance and for the given degree of
freedom for two-tailed test
• Compare it with the test statistic
• Here the t∞ at 5% significance level is 1.96 and
the t∞ at 0.1% significance level is 3.29
• As the calculated value of t is more than the
table value we reject the Null hypothesis
Calculation of test statistic
59
Scenario V (Data not real, only for exercise)
 A study was carried out to assess the
performance of a commercial line probe
assay (LPA) for rapid detection of MDR-TB.
Smear-positive sputum specimens were
collected from 92 previously treated TB
patients and subjected to LPA.
60
Scenario V (cont.)
 Results were compared with MGIT-DST
(Gold standard) done on all 92 patients at the
same time. 13 patients were positive for
MDR-TB using MGIT-DST out of which 12
were also positive by the line probe assay
(LPA). 76 samples tested negative for MDR-
TB by both the tests
61
1. What is the study design?
 Study done at a cross section of time
(Cross sectional) for evaluating diagnostic test
62
2. What is the validity of the test
 Sensitivity
 Specificity
63
MGIT-DST
positive
MGIT-DST
Negative
LPA +ve 12 (a) (b)
LPA -ve (c) 76 (d)
13 92
2 x 2 table
64
MGIT-DST
positive
MGIT-DST
Negative
LPA +ve 12 (a) 3 (b) 15
LPA -ve 1 (c) 76 (d) 77
13 79 92
2 x 2 table
65
Sensitivity & Specificity
a 12
Sensitivity = ------ x 100 = --- x 100 = 92.3 %
a + c 13
d 76
Specificity = ------ x 100 = --- x 100 = 96.2 %
b + d 79
66
3. Comment on the validity of the test
 Sensitivity: If the test is done on MDR-TB
patients it will correctly identify 92.3% as having
MDR-TB
 Specificity: If the test is done in individuals not
having MDR-TB it will correctly identify 96.2% as
not having MDR-TB
67
4. What are the predictive values of the
test?
Positive Predictive Value
a 12
= ------ x 100 = --- x 100 = 80 %
a + b 15
Negative Predictive Value
d 76
= ------ x 100 = --- x 100 = 98.7 %
c + d 77
%
%
68
5. Comment on the predictive values of
the test
 PPV: If the test is positive for a patient, there is
80% probability that the patient has MDR-TB
 NPV: If the test is negative for a patient, there is
98.7% probability that the patient does not have
MDR-TB
69
6. What are the likelihood ratios of the
test
LH Ratio Positive
a/(a+c) 12/13
= --------- = -------- = 24.3
b/(b+d) 3/79
LH Ratio Negative
c/(a+c) 1/13
= ---------- = ------- = 0.08
d/(b+d) 76/79
70
7. Comment on the LH Ratios of the
test
 LR +ve: A positive test is 24.3 times more likely
to be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB
71
Comment on the LH Ratios of the test
 LR -ve: A negative test is 0.08 times less likely to
be made when the patient has MDR-TB
compared to when the patient is not having
MDR-TB
72
Summary of findings
Sensitivity 92.3 %
Specificity 96.2 %
PPV 80 %
NPV 98.7 %
LHR Pos 24.3
LHR Neg 0.08
73
74
8. If the test is positive what is your
inference?
 Since the positive predictive value of the test is
80%, when a positive result is obtained, the
probability that the patient has MDR-TB is 80% (
based on which a clinical decision has to be
made)
75
9. If the test is negative what is your
inference?
 Since the negative predictive of the test is
98.7%, when a negative result is obtained, the
probability that the patient is not having MDR-
TB is 98.7% (based on which a clinical decision
has to be made).
76
Scenario – VI (Data not real, only for exercise)
 An epidemiologist wants to calculate sample
size for a study to find out the prevalence of
adolescent obesity in an urban slum
population, by simple random sampling
method.
77
1. What are the Required Information /
Assumptions needed to calculate
the sample size?
 Prevalence (Best assumption from other studies)
 Level of significance (α)
 Level of precision (d or l) (expected)
78
2. What is the formula used for calculating
the sample size for this study?
d
z pq
n 2
2
)
2
/
1
( 


p – Prevalence or Proportion (Best assumption)
q – (1-p)
d – Level of precision (Expected)
Z(1-α) – Normal distribution value for ‘α’
79
3. Why do you use this Formula?
 Qualitative Data
 Prevalence / Proportion has been provided
80
Required Information
 Earlier studies found a prevalence of about
40 per cent as adolescent obesity in urban
slum population.
 The epidemiologist wants to have a precision
of 5 per cent and a level of significance of
0.05.
81
4. What are the data and what are the
assumptions given for calculating the sample
size?
 Prevalence (p) : 40%
 1-p : 60%
 Level of significance (α) : 0.05
 Standard normal table value: 1.96
 Level of precision (d) : 5%
82
5. What is the Sample Size calculated?
d
z pq
n 2
2
)
2
/
1
( 


52
60
40
96
.
1 2 


n
25
2400
84
.
3 

n
369
64
.
368 

n
83
6. If the epidemiologist used cluster
sampling method instead of simple
random sampling what will be the sample
size calculated?
 Twice (usually) of the SRS
 369 x 2 = 738
84
7. How to you account for refusal and non
availability of the selected individuals?
 Over sample / Additional samples
 5% - 20%
 Depends on the attrition/refusal/non
participation/non availability
85
Software Output
86
Scenario – VII (Data not real, only for exercise)
 A neurologist wants to calculate sample size for
a study to find out the mean level of plasma
phenytoin in patients with seizure disorder
selected from tertiary care hospitals in a city by
simple random sampling method.
87
1. What are the required Information and
assumptions for calculating the sample
size?
 Mean (x)
 Standard Deviation (σ)
 Level of Significance (α)
 Level of Precision (d or l) (Required)
88
2. What is the Formula?
d
z
n 2
2
2
)
2
/
1
( 



σ – Standard Deviation (SD)
d – Level of precision
Z(1-α) – Standard Normal distribution value for ‘α’
89
3. Why do you use this Formula?
 Quantitative data
 Mean and Standard Deviation
90
Required Information
 Earlier studies found a mean level of 15mcg/l of
plasma phenytoin and standard deviation of
5mcg/l among patients who have seizure
disorder.
 The neurologist wants to have a precision of 1.0
mcg/l at 0.05 level of significance.
91
4. What are the data and what are the
assumptions given for calculating sample
size?
 Mean : 15
 Standard Deviation (σ) : 5
 Level of precision (d) : 1
 Level of Significance (α) : 0.05
 Standard normal table value : 1.96
92
5. What is the Sample Size?
d
z
n 2
2
2
)
2
/
1
( 



1
5
96
.
1
2
2
2


n
97
04
.
96 

n
93
6. How will you account for refusal and
non availability of the selected individuals?
 Over sampling / Additional samples
 5% to 20%
 Depends on the attrition/refusal/non
participation/non availability
94
Software Output
95
Scenario – VII (Data not real, only for exercise)
 A physiotherapist wants to calculate sample size for a
clinical trial on patients with knee osteoarthritis to
find out what will be the percentage of patients who
will have pain relief when subjected to
transcutaneous electrical stimulation (TENS)
compared to the percentage of patients who will have
pain relief on routine therapy.
96
1. What are the information / assumptions
needed to calculate the sample size?
Per cent reduction in pain in treatment group
Per cent reduction in pain in control group
Level of Significance (α)
Power of the test (1-β)
97
2. What is the Formula?
 
 2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n












 



 

P = (p1 + p2)/2 ; Q = (1 – P)
p1 – Proportion in group 1
p2 – Proportion in group 2
q1 – (1-p1)
q2 – (1-p2)
Z(1-α) – Standard Normal distribution value for ‘α’
Z(1-β) – Standard Normal distribution value for ‘β’
98
3. Why do you use this Formula?
 Qualitative Data
 Two proportions are given
99
Required Information
 Earlier studies show that 65 per cent of
patients subjected to TENS and 25 per cent of
patients on routine therapy had pain relief.
 The physiotherapist wants to have 90 per
cent power and 5 per cent level of
significance for the study.
100
4. What are the data and what are the
assumptions given for calculating the sample
size?
 Proportion 1 : 25%
 Proportion 2 : 65%
 Level of Significance (α) : 0.05
 Standard normal table value: 1.96
 Power of the test (1-β) : 90%
 Standard normal table value : 1.28
101
5. What is the Sample Size Calculated?
   
 
 2
65
.
0
25
.
0
2
35
.
0
65
.
0
75
.
0
25
.
0
28
.
1
495
.
0
96
.
1






n
 
 2
2
2
1
2
2
2
1
1
1
2
/
1
P
P
q
p
q
p
z
z PQ
n












 



 

31
3
.
30
= ≈
n
31 subjects in each group
102
6. How will you account for refusal and
non availability of the selected
individuals?
 Over Sampling / Additional Samples
 5% to 20%
 Depends on the attrition/refusal/non
participation/non availability
103
Software Output
104
Scenario – IX (Data not real, only for exercise)
 A physician wants to calculate sample size for a
clinical trial to find out the mean reduction in
systolic blood pressure among hypertensive patients
subjected to an experimental drug, compared to
hypertensive patients on routine therapy.
105
1. What are the required Information/
assumptions needed to calculate
the sample size?
 Mean SBP in intervention group (x1)
 Mean SBP in control group (x2)
 SD for intervention group (σ1)
 SD for control group (σ2)
 Level of Significance (α)
 Power of the test (1-β)
106
2. What is the Formula?
 



2
2
1
2
/
1
2
2
d
p z
z
s
n




Sp – Pooled Standard Deviation (SD)
μd – Mean Difference
Z(1-α) – Standard Normal distribution value for ‘α’
Z(1-β) – Standard Normal distribution value for ‘β’
107
3. Why do you use this Formula?
 Quantitative Data
 Means and Standard Deviations
108
Required Information
 Earlier studies have shown a reduction in mean
systolic blood pressure from 180 to 120 mmHg.,
for the experimental drug and from 180 to 140
mmHg., for the routine therapy. The standard
deviation is 40 mmHg., for both the groups.
 The physician wants to have 5 per cent of
statistical significance and 90 per cent power for
his study.
109
4. What are the data and what are the
assumptions given for calculating the
sample size?
 Mean Difference : 20
 Standard Deviation 1 : 40
 Standard Deviation 2 : 40
 Level of Significance (α) : 0.05
 Standard normal table value for α : 1.96
 Power of the Test (1-β) : 90%
 Standard normal table value for (1-β): 1.28
110
5. What is the Sample Size?
 



2
2
1
2
/
1
2
2
d
p z
z
s
n




[ ]
20
28
.
1
+
96
.
1
40
×
2
2
2 2
=
n
84
98
.
83
= ≈
n
84 subjects in each group
111
6. How will you account for refusal and
non availability of the selected
individuals?
 Over Sampling / Additional Samples
 Depends on the attrition/refusal/non
participation/non availability
112
Software Output
113
Summary
 Data analysis (prevalence, OR, RR, mean, SD, Sn, Sp, PPV,
NPV , 95% CI)
 Application of appropriate statistical tests (t test, 2 test)
 Interpretation of the results (OR > 1, RR > 1, 95% CI)
 Sample size assumptions (, 1-, precision (d), one-tailed,
two-tailed)
 Basic data required for sample size calculation
 Sample size calculation in different scenarios
114
Thank You

More Related Content

Similar to Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt

An infinite population has a standard deviation of 10.  A random s.docx
An infinite population has a standard deviation of 10.  A random s.docxAn infinite population has a standard deviation of 10.  A random s.docx
An infinite population has a standard deviation of 10.  A random s.docx
greg1eden90113
 
Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docx
spoonerneddy
 
Presentation1group b
Presentation1group bPresentation1group b
Presentation1group b
AbhishekDas15
 

Similar to Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt (20)

An infinite population has a standard deviation of 10.  A random s.docx
An infinite population has a standard deviation of 10.  A random s.docxAn infinite population has a standard deviation of 10.  A random s.docx
An infinite population has a standard deviation of 10.  A random s.docx
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.ppt
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Chapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed HusseinChapter3 biostatistics by Dr Ahmed Hussein
Chapter3 biostatistics by Dr Ahmed Hussein
 
Statistical Journal club
Statistical Journal club Statistical Journal club
Statistical Journal club
 
Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docx
 
Role of estimation in statistics
Role of estimation in statisticsRole of estimation in statistics
Role of estimation in statistics
 
Statistics
StatisticsStatistics
Statistics
 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
 
8. normal distribution qt pgdm 1st semester
8. normal distribution qt pgdm 1st  semester8. normal distribution qt pgdm 1st  semester
8. normal distribution qt pgdm 1st semester
 
Presentation1group b
Presentation1group bPresentation1group b
Presentation1group b
 
MD Paediatricts (Part 2) - Epidemiology and Statistics
MD Paediatricts (Part 2) - Epidemiology and StatisticsMD Paediatricts (Part 2) - Epidemiology and Statistics
MD Paediatricts (Part 2) - Epidemiology and Statistics
 
5-Propability-2-87.pdf
5-Propability-2-87.pdf5-Propability-2-87.pdf
5-Propability-2-87.pdf
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
 
MEAN.pptx
MEAN.pptxMEAN.pptx
MEAN.pptx
 
BIIntro.ppt
BIIntro.pptBIIntro.ppt
BIIntro.ppt
 
Statistics chm 235
Statistics chm 235Statistics chm 235
Statistics chm 235
 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
 
Frequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior DesignsFrequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior Designs
 

More from Vanithadurai (7)

darwin evolution ppt.pptx
darwin evolution ppt.pptxdarwin evolution ppt.pptx
darwin evolution ppt.pptx
 
PPT for Excreta disposal.ppt
PPT for  Excreta disposal.pptPPT for  Excreta disposal.ppt
PPT for Excreta disposal.ppt
 
alcohol prevention
alcohol prevention  alcohol prevention
alcohol prevention
 
Hospital waste management.
Hospital waste management.Hospital waste management.
Hospital waste management.
 
DYNAMICS OF DISEASE & DISEASE TRANSMISSION.ppt
DYNAMICS OF DISEASE & DISEASE TRANSMISSION.pptDYNAMICS OF DISEASE & DISEASE TRANSMISSION.ppt
DYNAMICS OF DISEASE & DISEASE TRANSMISSION.ppt
 
Family ppt.pptx
Family ppt.pptxFamily ppt.pptx
Family ppt.pptx
 
dr vs Growth and Development -final.ppt
dr vs Growth and Development -final.pptdr vs Growth and Development -final.ppt
dr vs Growth and Development -final.ppt
 

Recently uploaded

Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose AcademicsCytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
MedicoseAcademics
 
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 QuinolineUnit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
AarishRathnam1
 
Failure to thrive in neonates and infants + pediatric case.pptx
Failure to thrive in neonates and infants  + pediatric case.pptxFailure to thrive in neonates and infants  + pediatric case.pptx
Failure to thrive in neonates and infants + pediatric case.pptx
claviclebrown44
 

Recently uploaded (20)

How to buy 5cladba precursor raw 5cl-adb-a raw material
How to buy 5cladba precursor raw 5cl-adb-a raw materialHow to buy 5cladba precursor raw 5cl-adb-a raw material
How to buy 5cladba precursor raw 5cl-adb-a raw material
 
Anti viral drug pharmacology classification
Anti viral drug pharmacology classificationAnti viral drug pharmacology classification
Anti viral drug pharmacology classification
 
Stereochemistry & Asymmetric Synthesis.pptx
Stereochemistry & Asymmetric Synthesis.pptxStereochemistry & Asymmetric Synthesis.pptx
Stereochemistry & Asymmetric Synthesis.pptx
 
ROSE CASE SPINAL SBRT BY DR KANHU CHARAN PATRO
ROSE  CASE SPINAL SBRT BY DR KANHU CHARAN PATROROSE  CASE SPINAL SBRT BY DR KANHU CHARAN PATRO
ROSE CASE SPINAL SBRT BY DR KANHU CHARAN PATRO
 
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose AcademicsCytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
Cytoskeleton and Cell Inclusions - Dr Muhammad Ali Rabbani - Medicose Academics
 
ESC HF 2024 Spotlights Day-2.pptx heart failure
ESC HF 2024 Spotlights Day-2.pptx heart failureESC HF 2024 Spotlights Day-2.pptx heart failure
ESC HF 2024 Spotlights Day-2.pptx heart failure
 
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 QuinolineUnit 4 Pharmaceutical Organic Chemisty 3 Quinoline
Unit 4 Pharmaceutical Organic Chemisty 3 Quinoline
 
Signs It’s Time for Physiotherapy Sessions Prioritizing Wellness
Signs It’s Time for Physiotherapy Sessions Prioritizing WellnessSigns It’s Time for Physiotherapy Sessions Prioritizing Wellness
Signs It’s Time for Physiotherapy Sessions Prioritizing Wellness
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024
 
Mgr university bsc nursing adult health previous question paper with answers
Mgr university  bsc nursing adult health previous question paper with answersMgr university  bsc nursing adult health previous question paper with answers
Mgr university bsc nursing adult health previous question paper with answers
 
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
High Purity 99% PMK Ethyl Glycidate Powder CAS 28578-16-7
 
Varicose Veins Treatment Aftercare Tips by Gokuldas Hospital
Varicose Veins Treatment Aftercare Tips by Gokuldas HospitalVaricose Veins Treatment Aftercare Tips by Gokuldas Hospital
Varicose Veins Treatment Aftercare Tips by Gokuldas Hospital
 
Failure to thrive in neonates and infants + pediatric case.pptx
Failure to thrive in neonates and infants  + pediatric case.pptxFailure to thrive in neonates and infants  + pediatric case.pptx
Failure to thrive in neonates and infants + pediatric case.pptx
 
Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024Top 10 Most Beautiful Russian Pornstars List 2024
Top 10 Most Beautiful Russian Pornstars List 2024
 
Hemodialysis: Chapter 1, Physiological Principles of Hemodialysis - Dr.Gawad
Hemodialysis: Chapter 1, Physiological Principles of Hemodialysis - Dr.GawadHemodialysis: Chapter 1, Physiological Principles of Hemodialysis - Dr.Gawad
Hemodialysis: Chapter 1, Physiological Principles of Hemodialysis - Dr.Gawad
 
Histopathological staining techniques used in liver diseases
Histopathological staining techniques used in liver diseasesHistopathological staining techniques used in liver diseases
Histopathological staining techniques used in liver diseases
 
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptxCreeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
Creeping Stroke - Venous thrombosis presenting with pc-stroke.pptx
 
Bangalore whatsapp Number Just VIP Brookefield 100% Genuine at your Door Step
Bangalore whatsapp Number Just VIP Brookefield 100% Genuine at your Door StepBangalore whatsapp Number Just VIP Brookefield 100% Genuine at your Door Step
Bangalore whatsapp Number Just VIP Brookefield 100% Genuine at your Door Step
 
Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024Top 10 Most Beautiful Chinese Pornstars List 2024
Top 10 Most Beautiful Chinese Pornstars List 2024
 
parliaments-for-health-security_RecordOfAchievement.pdf
parliaments-for-health-security_RecordOfAchievement.pdfparliaments-for-health-security_RecordOfAchievement.pdf
parliaments-for-health-security_RecordOfAchievement.pdf
 

Answers to Data Analysis and interpretation modified 2020 (2410) (1).ppt

  • 1. 1 DATA ANALYSIS & INTERPRETATION HANDS-ON
  • 2. 2 Outline  Scenarios 1 – 5: Data analysis & Interpretation in various study designs  Scenarios 6 – 9 : Sample size calculation for qualitative and quantitative data  Outputs from PEPI software
  • 3. 3 Scenario I (Data not real, only for exercise)  In a study to find out the prevalence of hypertension, 1,000 adults were selected by Simple Random Sampling method from a population of 1,00,000. All the 1000 Adults were contacted once and their blood pressure was measured as per the standard guidelines and 55 were found to be having hypertension.
  • 4. 4 1. What is the type of study design?  Cross-sectional study  Study carried out at one point in time
  • 5. 5 2. What is the prevalence of hypertension in the sample selected? 55 Prevalence = ----- x 100 = 5.5 % 1000
  • 6. 6 3. How precise is the estimate?  Conventionally we calculate the 95% CI  95% CI= p + 1.96 S.Ep , Where p = proportion and S.Ep is the standard error of proportion. pq where q=100-p SEp = ---- n • 95% CI = 4.1 % to 6.9 %  Since the 95% CI is not very wide, the result is precise
  • 7. 7 4. How do you infer the prevalence of hypertension for this population? The prevalence of hypertension for the population is likely to be between 4.1% and 6.9% (95% of the times)
  • 8. 8 Confidence interval (CI) for prevalence  95% CI= p + 1.96 S.Ep  Where SEp = pq where q = 100-p n SEp= 5.5 x 94.5 = 519.75 = 0.72 1000 1000 95% CI = 5.5 + (1.96x0.72) = 5.5 + 1.4 =4.1% to 6.9%
  • 9. 9 5. Supposing the mean systolic blood pressure of the sample selected was 120 mm Hg and the standard deviation of the sample was 25 mm Hg, how will you infer the mean systolic pressure of the population?
  • 10. 10 95% CI of the mean 95% CI of mean = Mean + 1.96 Sem (standard error of mean) Sem = SD = 25 = 25 = 0.79 n 1000 31.6 95% CI of Mean = 120 + 1.96 (0.79) = 120+ 1.6 = 118.4 to 121.6
  • 11. 11 Inference  The mean systolic pressure of the population will be between 118.4 and 121.6 mmHg (95% of the times)
  • 12. 12 Scenario-II (Data not real, only for exercise) • A study was conducted to find out the association between usage of the diuretic X and the occurrence of squamous cell carcinoma (SCC) of skin.
  • 13. 13 Scenario II (cont.) • 1129 patients with SCC of skin and 4516 individuals without SCC were selected from a similar source population and the usage of the diuretic X was ascertained. Of the 1129 patients suffering from SCC, 154 were using the diuretic X and among those without SCC 372 were using the diuretic X.
  • 14. 14 1. What is the type of study design?  CASE-CONTROL STUDY  Selection of patients with SCC of skin (cases)  Selection of subjects without SCC of skin (controls)  Ascertainment of exposure (i.e. Usage of diuretic X) among cases and controls
  • 15. 15 2.Is there an association between usage of diuretic X and SCC of skin? Cases Controls Exposed (a) (b) Not Exposed (c) (d)
  • 16. 16 Is there an association between usage of diuretic X and SCC of skin? Cases Controls Exposed 154 (a) 372 (b) Not Exposed (c) (d) 1129 4516
  • 17. 17 Is there an association between usage of diuretic X and SCC of skin? Cases Controls Exposed 154 (a) 372 (b) Not Exposed 975 (c) 4144 (d) 1129 4516
  • 18. 18 Is there an association between usage of diuretic X and SCC of skin? Odds of exposure among cases Odds ratio = ------------------------------------------- Odds of exposure among controls a/c ad 154 x 4144 638176 = ----- = ---- = -------------- = --------- = 1.76 b/d bc 372 x 975 362700  Yes there is an association between usage of diuretic X and SCC of skin
  • 19. 19 3. Interpret the strength and direction of association • As Odds ratio is > 1 - Positive association • Those using diuretic X have 1.76 times greater risk of developing SCC as compared to those not using diuretic X
  • 20. 20 4. Is the measure of association precise?  95% CI of Odds ratio  = Anti log of: logn OR ± 1.96 SE of logn OR  = 1.4 to 2.2  Since the 95% CI is not very wide, the result is precise
  • 21. 21 5. Interpret the 95% CI The 95 % CI does not include 1, hence the result (odds ratio 1.76) is statistically significant The odds of developing SCC of skin in those using diuretic X compared to those not using diuretic X is likely to be between 1.4 to 2.2 (95 times out of 100 times)
  • 22. 22 Calculation of 95% CI for OR = Anti log of: logn OR ± 1.96 SE of logn OR = Anti log of: logn (1.76) ± 1.96 SE of logn OR 1 1 1 1 Where SE of logn OR = ---- + ---- + --- + ----- 154 372 975 4144 = 0.0065 + 0.0027 + 0.001 + 0.0002 = 0.01045 = 0.102
  • 23. 23 Calculation of 95% CI for OR logn (1.76) = 0.57, SE of logn OR = 0.102 95%CI = Antilog of : 0.57 + 1.96 x 0.102 =Antilog of: 0.57 + 0.2 = Antilog of: 0.36 to 0.77 = 1.4 to 2.2
  • 24. 24 6. Find out the p-value The χ2 1 =31.2 (p < 0.001 )
  • 25. 25 7. Interpret p value and OR • Since p-value is < 0.001, the odds ratio of 1.76 is statistically significant. • The probability of the odds ratio (1.76) occuring by chance is < 0.001 and hence it is statistically significant
  • 26. 26 Calculation of Chi-square & p-value (ad – bc)2 x N Χ1 2 = -------------------------- (a+b)(c+d)(a+c)(b+d) ( (154 x 4144) – (372 x 975) )2 x 5645 = ---------------------------------------------------------- (154+372)(975+4144)(154+975)(372+4144) ( 638176 – 3627200 )2 x 5645 = ----------------------------------------- 526 x 5119 x 1129 x 4516 75887026576 x 5645 428382265021520 = ---------------------------- = -------------------------- 13728362835016 13728362835016 = 31.2
  • 27. 27 Chi-square & p-value  df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1  The table value at 0.1% level of significance is 10.83. The χ2 1 =31.2(p < 0.001)
  • 28. 28
  • 29. 29 Scenario III (Data not real, only for exercise)  A study was initiated to find out the association between hypertension and Acute myocardial infarction (AMI). 950 adults without AMI were enrolled. Among them 200 were found to have hypertension and 750 were found to have normal blood pressure
  • 30. 30 Scenario III (cont.)  They were followed for 20 years and the occurrence of AMI during the period was recorded. 50 of the 200 individuals with hypertension developed AMI whereas 75 of the individuals with normal blood pressure developed AMI.
  • 31. 31 1. What is the type of study design?  Cohort study  Starts with exposure (Hypertension)  Ends with ascertainment of disease (AMI)
  • 32. 32 2. What is the incidence of AMI among individuals with hypertension? 50 = ------ x 100 200 = 25%
  • 33. 33 3. What is the incidence of AMI among those without hypertension? 75 = ------ x 100 750 = 10%
  • 34. 34 4. Is there an association between Acute myocardial infarction and hypertension? AMI No AMI Total Hypertensive (a) (b) 200 Normotensive (c) (d) 750 950
  • 35. 35 Is there an association between Acute myocardial infarction and hypertension? AMI No AMI Total Hypertensive 50 (a) 150 (b) 200 Normotensive (c) (d) 750 950
  • 36. 36 Is there an association between Acute myocardial infarction and hypertension? AMI No AMI Total Hypertensive 50 (a) 150 (b) 200 Normotensive 75 (c) 675 (d) 750 125 825 950
  • 37. 37 Is there an association between AMI and hypertension? I Exposed 50/200 0.25 Relative Risk = ------------- = ---------- = ------ I Unexposed 75/750 0.1 = 2.5  Yes. There is an association between AMI and hypertension
  • 38. 38 5. Interpret the measure of association (Relative Risk)  As RR > 1, there is a positive association between hypertension and occurrence of AMI.  Those with hypertension are at 2.5 times greater risk of developing AMI as compared to those with normal blood pressure
  • 39. 39 6. Find out the 95% CI for RR 95% CI for RR = Anti log of :logn RR ± 1.96 SE of logn RR = 1.8 to 3.4
  • 40. 40 7. Is the measure of association precise? (Interpret 95% CI of RR)  Since the 95% CI (1.8 to 3.4) is not very wide it is precise  The CI does not include 1 hence the result is statistically significant.  The risk of developing AMI among those with hypertension is likely to be between 1.8 and 3.4 times more when compared to those with normal blood pressure (95 times out of 100 times)
  • 41. 41 Calculation of 95 % CI for Relative Risk (RR) = Anti log of :logn RR ± 1.96 SE of logn RR 1- IE 1 - INE where SE of logn RR = ------ + ------- a c 1- 0.25 1 – 0.1 0.75 0.9 = ---------- + --------- = ------ + ------- 50 75 50 75 = 0.015 + 0.012 = 0.027 = 0.164
  • 42. 42 Calculation of 95 % CI for Relative Risk logn RR = logn (2.5) = 0.92 = Antilog of: 0.92 + 1.96 (0.164) = Antilog of: 0.92 + 0.32 = Antilog of: 0.6 to 1.24 = 1.8 to 3.4
  • 43. 43 8. Find out p-value  The χ2 1 =31.09 (p < 0.001)
  • 44. 44 9. Interpret p-value of RR  Since p value is < 0.001, the Relative Risk of 2.5 is statistically significant.  The probability of Relative Risk 2.5 occuring by chance is < 0.001 and hence it is statistically significant
  • 45. 45 Calculation of Chi-square & p-value (ad – bc)2 x N Χ1 2 = -------------------------- (a+b)(c+d)(a+c)(b+d) ( (50 x 675) – (150 x 75) )2 x 950 = ---------------------------------------------------------- (50+150)(75+675)(50+75)(150+675) ( 33750 – 11250 )2 x 950 = --------------------------------- 200 x 750 x 125 x 825 506250000 x 950 480937500000 = ----------------------- = --------------------- 15468750000 15468750000 = 31.09
  • 46. 46 Chi-square  df = (r – 1) x (c – 1) = (2 -1) x (2 – 1) = 1  The table value at 0.1% level of significance is 10.83  The χ2 1 =31.09 (p < 0.001)  The probability of Relative Risk of 2.5 occuring by chance is less than 0.001
  • 47. 47
  • 48. 48 Scenario IV (Data not real, only for exercise)  A study was conducted to find out the effect of iron-fortified salt on iron deficiency anemia in 5-15yr old children. 303 were randomly assigned to receive either iron- fortified (n=152) or salt not fortified with iron (n=151). The mean increase in Hb% at 5 months was 0.3 g/L (+0.10) in group receiving unfortified salt and 1.5 g/L (+0.25) in group receiving iron-fortified salt.
  • 49. 49 1. What is the type of study design?  Randomized Controlled trial
  • 50. 50 2. What is the result of the study? The mean increase in Hb in the iron fortified group is higher than that in the unfortified group Iron-fortified salt Unfortified salt N 152 151 Mean increase in Hb% at 5m 1.5 g/L 0.3 g/L S.D. 0.25 0.10
  • 51. 51 3. Is the result statistically significant?  There are two groups of individuals  The observation are independent  The variable to be measured is quantitative  If the distribution is normally distributed the 'Pooled t-test' or 'unpaired t-test' is used
  • 52. 52 Results of the pooled t test  t = 54.5  df = 301  P < 0.001 Yes. The result is statistically significant
  • 53. 53 4. What is your inference? There is statistically significant difference in the mean change in Hb levels in those receiving iron fortified salt in comparision to those receiving unfortified salt
  • 54. 54 Calculation of Test statistic  Null hypothesis (H0) :There is no difference in mean change in Hb in those receiving iron fortified salt and unfortified salt.  Alternate hypothesis (HA):There is difference in mean change in Hb in those receiving iron fortified salt and unfortified salt.
  • 55. 55 Calculation of test statistic Observed difference SDp 2 SDp 2 t = ---------------------------- where SE = ------- + ------- Standard Error (SE) n1 n2 Where SDp is the Pooled Standard Deviation (n1-1)xSD1 2 + (n2-1)xSD2 2 SDp 2 = ---------------------------------- n1+n2-2 (152-1)x0.252 + (151-1)x0.12 (151x0.0625)+(150x0.010) = ------------------------------------ = --------------------------------- 152+151-2 301 9.44 + 1.5 10.94 = --------------- = ------- = 0.036 301 301
  • 56. 56 Calculation of test statistic SDp 2 SDp 2 0.036 0.036 Standard Error = ------- + ------- = ------- + -------- n1 n2 152 151 = 0.00024 + 0.00024 = 0.00048 = 0.022 t = (1.5-.3)/0.022 = 1.2/0.022 = 54.5 Find out the degrees of freedom = (n1+n2-2)=(151+152-2)=301
  • 57. 57
  • 58. 58 • Look at the table of t values at the desired level of significance and for the given degree of freedom for two-tailed test • Compare it with the test statistic • Here the t∞ at 5% significance level is 1.96 and the t∞ at 0.1% significance level is 3.29 • As the calculated value of t is more than the table value we reject the Null hypothesis Calculation of test statistic
  • 59. 59 Scenario V (Data not real, only for exercise)  A study was carried out to assess the performance of a commercial line probe assay (LPA) for rapid detection of MDR-TB. Smear-positive sputum specimens were collected from 92 previously treated TB patients and subjected to LPA.
  • 60. 60 Scenario V (cont.)  Results were compared with MGIT-DST (Gold standard) done on all 92 patients at the same time. 13 patients were positive for MDR-TB using MGIT-DST out of which 12 were also positive by the line probe assay (LPA). 76 samples tested negative for MDR- TB by both the tests
  • 61. 61 1. What is the study design?  Study done at a cross section of time (Cross sectional) for evaluating diagnostic test
  • 62. 62 2. What is the validity of the test  Sensitivity  Specificity
  • 63. 63 MGIT-DST positive MGIT-DST Negative LPA +ve 12 (a) (b) LPA -ve (c) 76 (d) 13 92 2 x 2 table
  • 64. 64 MGIT-DST positive MGIT-DST Negative LPA +ve 12 (a) 3 (b) 15 LPA -ve 1 (c) 76 (d) 77 13 79 92 2 x 2 table
  • 65. 65 Sensitivity & Specificity a 12 Sensitivity = ------ x 100 = --- x 100 = 92.3 % a + c 13 d 76 Specificity = ------ x 100 = --- x 100 = 96.2 % b + d 79
  • 66. 66 3. Comment on the validity of the test  Sensitivity: If the test is done on MDR-TB patients it will correctly identify 92.3% as having MDR-TB  Specificity: If the test is done in individuals not having MDR-TB it will correctly identify 96.2% as not having MDR-TB
  • 67. 67 4. What are the predictive values of the test? Positive Predictive Value a 12 = ------ x 100 = --- x 100 = 80 % a + b 15 Negative Predictive Value d 76 = ------ x 100 = --- x 100 = 98.7 % c + d 77 % %
  • 68. 68 5. Comment on the predictive values of the test  PPV: If the test is positive for a patient, there is 80% probability that the patient has MDR-TB  NPV: If the test is negative for a patient, there is 98.7% probability that the patient does not have MDR-TB
  • 69. 69 6. What are the likelihood ratios of the test LH Ratio Positive a/(a+c) 12/13 = --------- = -------- = 24.3 b/(b+d) 3/79 LH Ratio Negative c/(a+c) 1/13 = ---------- = ------- = 0.08 d/(b+d) 76/79
  • 70. 70 7. Comment on the LH Ratios of the test  LR +ve: A positive test is 24.3 times more likely to be made when the patient has MDR-TB compared to when the patient is not having MDR-TB
  • 71. 71 Comment on the LH Ratios of the test  LR -ve: A negative test is 0.08 times less likely to be made when the patient has MDR-TB compared to when the patient is not having MDR-TB
  • 72. 72 Summary of findings Sensitivity 92.3 % Specificity 96.2 % PPV 80 % NPV 98.7 % LHR Pos 24.3 LHR Neg 0.08
  • 73. 73
  • 74. 74 8. If the test is positive what is your inference?  Since the positive predictive value of the test is 80%, when a positive result is obtained, the probability that the patient has MDR-TB is 80% ( based on which a clinical decision has to be made)
  • 75. 75 9. If the test is negative what is your inference?  Since the negative predictive of the test is 98.7%, when a negative result is obtained, the probability that the patient is not having MDR- TB is 98.7% (based on which a clinical decision has to be made).
  • 76. 76 Scenario – VI (Data not real, only for exercise)  An epidemiologist wants to calculate sample size for a study to find out the prevalence of adolescent obesity in an urban slum population, by simple random sampling method.
  • 77. 77 1. What are the Required Information / Assumptions needed to calculate the sample size?  Prevalence (Best assumption from other studies)  Level of significance (α)  Level of precision (d or l) (expected)
  • 78. 78 2. What is the formula used for calculating the sample size for this study? d z pq n 2 2 ) 2 / 1 (    p – Prevalence or Proportion (Best assumption) q – (1-p) d – Level of precision (Expected) Z(1-α) – Normal distribution value for ‘α’
  • 79. 79 3. Why do you use this Formula?  Qualitative Data  Prevalence / Proportion has been provided
  • 80. 80 Required Information  Earlier studies found a prevalence of about 40 per cent as adolescent obesity in urban slum population.  The epidemiologist wants to have a precision of 5 per cent and a level of significance of 0.05.
  • 81. 81 4. What are the data and what are the assumptions given for calculating the sample size?  Prevalence (p) : 40%  1-p : 60%  Level of significance (α) : 0.05  Standard normal table value: 1.96  Level of precision (d) : 5%
  • 82. 82 5. What is the Sample Size calculated? d z pq n 2 2 ) 2 / 1 (    52 60 40 96 . 1 2    n 25 2400 84 . 3   n 369 64 . 368   n
  • 83. 83 6. If the epidemiologist used cluster sampling method instead of simple random sampling what will be the sample size calculated?  Twice (usually) of the SRS  369 x 2 = 738
  • 84. 84 7. How to you account for refusal and non availability of the selected individuals?  Over sample / Additional samples  5% - 20%  Depends on the attrition/refusal/non participation/non availability
  • 86. 86 Scenario – VII (Data not real, only for exercise)  A neurologist wants to calculate sample size for a study to find out the mean level of plasma phenytoin in patients with seizure disorder selected from tertiary care hospitals in a city by simple random sampling method.
  • 87. 87 1. What are the required Information and assumptions for calculating the sample size?  Mean (x)  Standard Deviation (σ)  Level of Significance (α)  Level of Precision (d or l) (Required)
  • 88. 88 2. What is the Formula? d z n 2 2 2 ) 2 / 1 (     σ – Standard Deviation (SD) d – Level of precision Z(1-α) – Standard Normal distribution value for ‘α’
  • 89. 89 3. Why do you use this Formula?  Quantitative data  Mean and Standard Deviation
  • 90. 90 Required Information  Earlier studies found a mean level of 15mcg/l of plasma phenytoin and standard deviation of 5mcg/l among patients who have seizure disorder.  The neurologist wants to have a precision of 1.0 mcg/l at 0.05 level of significance.
  • 91. 91 4. What are the data and what are the assumptions given for calculating sample size?  Mean : 15  Standard Deviation (σ) : 5  Level of precision (d) : 1  Level of Significance (α) : 0.05  Standard normal table value : 1.96
  • 92. 92 5. What is the Sample Size? d z n 2 2 2 ) 2 / 1 (     1 5 96 . 1 2 2 2   n 97 04 . 96   n
  • 93. 93 6. How will you account for refusal and non availability of the selected individuals?  Over sampling / Additional samples  5% to 20%  Depends on the attrition/refusal/non participation/non availability
  • 95. 95 Scenario – VII (Data not real, only for exercise)  A physiotherapist wants to calculate sample size for a clinical trial on patients with knee osteoarthritis to find out what will be the percentage of patients who will have pain relief when subjected to transcutaneous electrical stimulation (TENS) compared to the percentage of patients who will have pain relief on routine therapy.
  • 96. 96 1. What are the information / assumptions needed to calculate the sample size? Per cent reduction in pain in treatment group Per cent reduction in pain in control group Level of Significance (α) Power of the test (1-β)
  • 97. 97 2. What is the Formula?    2 2 2 1 2 2 2 1 1 1 2 / 1 P P q p q p z z PQ n                     P = (p1 + p2)/2 ; Q = (1 – P) p1 – Proportion in group 1 p2 – Proportion in group 2 q1 – (1-p1) q2 – (1-p2) Z(1-α) – Standard Normal distribution value for ‘α’ Z(1-β) – Standard Normal distribution value for ‘β’
  • 98. 98 3. Why do you use this Formula?  Qualitative Data  Two proportions are given
  • 99. 99 Required Information  Earlier studies show that 65 per cent of patients subjected to TENS and 25 per cent of patients on routine therapy had pain relief.  The physiotherapist wants to have 90 per cent power and 5 per cent level of significance for the study.
  • 100. 100 4. What are the data and what are the assumptions given for calculating the sample size?  Proportion 1 : 25%  Proportion 2 : 65%  Level of Significance (α) : 0.05  Standard normal table value: 1.96  Power of the test (1-β) : 90%  Standard normal table value : 1.28
  • 101. 101 5. What is the Sample Size Calculated?        2 65 . 0 25 . 0 2 35 . 0 65 . 0 75 . 0 25 . 0 28 . 1 495 . 0 96 . 1       n    2 2 2 1 2 2 2 1 1 1 2 / 1 P P q p q p z z PQ n                     31 3 . 30 = ≈ n 31 subjects in each group
  • 102. 102 6. How will you account for refusal and non availability of the selected individuals?  Over Sampling / Additional Samples  5% to 20%  Depends on the attrition/refusal/non participation/non availability
  • 104. 104 Scenario – IX (Data not real, only for exercise)  A physician wants to calculate sample size for a clinical trial to find out the mean reduction in systolic blood pressure among hypertensive patients subjected to an experimental drug, compared to hypertensive patients on routine therapy.
  • 105. 105 1. What are the required Information/ assumptions needed to calculate the sample size?  Mean SBP in intervention group (x1)  Mean SBP in control group (x2)  SD for intervention group (σ1)  SD for control group (σ2)  Level of Significance (α)  Power of the test (1-β)
  • 106. 106 2. What is the Formula?      2 2 1 2 / 1 2 2 d p z z s n     Sp – Pooled Standard Deviation (SD) μd – Mean Difference Z(1-α) – Standard Normal distribution value for ‘α’ Z(1-β) – Standard Normal distribution value for ‘β’
  • 107. 107 3. Why do you use this Formula?  Quantitative Data  Means and Standard Deviations
  • 108. 108 Required Information  Earlier studies have shown a reduction in mean systolic blood pressure from 180 to 120 mmHg., for the experimental drug and from 180 to 140 mmHg., for the routine therapy. The standard deviation is 40 mmHg., for both the groups.  The physician wants to have 5 per cent of statistical significance and 90 per cent power for his study.
  • 109. 109 4. What are the data and what are the assumptions given for calculating the sample size?  Mean Difference : 20  Standard Deviation 1 : 40  Standard Deviation 2 : 40  Level of Significance (α) : 0.05  Standard normal table value for α : 1.96  Power of the Test (1-β) : 90%  Standard normal table value for (1-β): 1.28
  • 110. 110 5. What is the Sample Size?      2 2 1 2 / 1 2 2 d p z z s n     [ ] 20 28 . 1 + 96 . 1 40 × 2 2 2 2 = n 84 98 . 83 = ≈ n 84 subjects in each group
  • 111. 111 6. How will you account for refusal and non availability of the selected individuals?  Over Sampling / Additional Samples  Depends on the attrition/refusal/non participation/non availability
  • 113. 113 Summary  Data analysis (prevalence, OR, RR, mean, SD, Sn, Sp, PPV, NPV , 95% CI)  Application of appropriate statistical tests (t test, 2 test)  Interpretation of the results (OR > 1, RR > 1, 95% CI)  Sample size assumptions (, 1-, precision (d), one-tailed, two-tailed)  Basic data required for sample size calculation  Sample size calculation in different scenarios