Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Sample size in health sciences - Basics and selected examples
1. Sample size estimation:
Basics & selected examples Dr. S. A. Rizwan, M.D.
Public Health Specialist
SBCM, Joint Program – Riyadh
Ministry of Health, Kingdom of Saudi Arabia
2. Learning objectives
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Importance of sample size estimation
• Basic concepts in sample size calculation
• How does sample size relate to study results
• Sample size calculation in specific situations
2
3. Books and software
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Books
• Sample size determination in health studies - a practical manual
(Lwanga & Lemeshow)
• Sample Size Calculations in Clinical Research (Shein-Chung Chow,
Hansheng Wang, Jun Shao)
• Software
• Epitools, online calculators, Stat cal in Epi Info, G power
• PASS, nmaster, Statsdirect, Stata
• Many others
3
8. Prerequisites for this class
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Understanding of the following basic concepts
• Types of study designs
• Measures of association
• Mean/SD
• Proportion
• Standard error
• Hypothesis testing and types
• Confidence intervals
8
9. Some related terms
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Significance level
• Power
• Effect size
• Variability
• Precision
Con. level Z α
95% 1.96 (2 sided)
95% 1.64 (1 sided)
99% 2.57 (2 sided)
99% 2.32 (1 sided)
Power Z β
90% 1.282
85% 1.037
80% 0.842
75% 0.675
70% 0.524
9
11. Two aspects of a good sample
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The sample size
• If adequate, then good internal
validity
• The sampling method
• If representative, then good
external validity
11
12. Why calculate sample size?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Stating the assumptions and
parameters before start of the
study increases the validity of
statistical conclusions made after
the study
• Post-hoc analysis and results are
considered merely exploratory
12
13. Thought exercise
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• I am applying for a job and in the
resume I have stated that my typing
speed is very fast.
• My friend is applying for the same
job and in his resume he stated that
his typing speed was 60 words/min.
Which candidate are you more likely to assess in a valid manner?
13
14. Why calculate sample size? (contd.)
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Funds and time constraints
• Really not necessary to study the entire
population (ethical problem!)
• Small samples unable to detect clinically
relevant differences
• If a study with small sample finds non-
significant results – what does it mean?
14
15. Thought exercise
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Study 1: A study was conducted for an anti-
hypertensive drug on 10,000 people which
showed a statistically significant fall in BP of
1mm Hg over 3 months
• Study 2: It was found that there was 30%
reduction in mortality due to propranolol among
MI patients. But that was not significant. 66
cases and 64 controls were studied
State your comment on each of the above scenario.
15
21. Further considerations for sample size
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Study design
• Cluster design
• Cross over
• Matched/paired
• Type of hypothesis (inequality,
equivalence, non-inferiority & superiority)
• Fixed follow up duration
• Ratio of controls to cases
• Hypothesis testing or CI estimation?
21
22. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
1. Convert the research question into a statistical problem statement
2. Determine formula or software command & determine inputs needed
3. Select the sources for the inputs
4. Substitute the values in the formula or enter in the software
5. Factor in non-response/drop-out rate
22
23. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• First: Convert the research question into a
statistical problem statement
• For eg.,
• To estimate the mean birth weight of
neonates born to mothers with anaemia
in the eastern sector of Riyadh
• Estimation of a single mean with stated
precision
23
24. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Second: Find out the formula or the software
command appropriate for this problem
• For eg.,
• Estimation of a single mean with stated
precision
N = (Zα
2 * S2) / L2
24
25. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Second: and determine the ingredients you
require to input in the formula
• Exp. proportion, incidence
• Exp. SD
• Exp. RR or OR
• Power, precision
• Confidence level
• Others (DE, ICC, COV, cluster size)
• For eg.,
• Estimate of SD, alfa & precision
25
26. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Third: Selecting the sources for the inputs
• Match the location as close as possible
• Match the study population as close as possible
• Match the study setting as close as possible
• Match the statistic as close as possible
• Or conduct a pilot study
• For eg.,
• Other sector in Riyadh -> some other city in KSA ->
Middle east -> any developing country -> anywhere
26
27. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Third (contd.): What sources to use?
• From where
• Published Literature
• Pilot study
• Experts in the field
• Educated guess (gut feeling)
It begs the question that if we already know these inputs then
why conduct the study in the first place!
27
29. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Fourth: Substitute the values in the
formula or enter in the software
N = (Zα
2 * S2) / L2
N = (1.96*1.96 * 600*600)/100*100
N = 138.2
N = Rounded to 140
29
30. How to approach a SS problem?
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Fifth: Factor in non-response/drop-out rate
• Final sample size =
!"#$%& ()*&
+,$&-.&/ 0&($12(& 0".&
• For eg.,
• For a non-response rate of 20%
• Final sample size = 140 / 0.80 = 175
30
32. Sample size in specific situations
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
Authors Original research question Simplified problem statement
1. Dr. Nariman
What is the proportion of patients who quit
smoking in a tobacco cessation program?
Estimation of a single proportion
for a special group
2. Dr. Ghadeer
What is the incidence of DM in obese hypertensives
and what is the incidence of DM in non-obese
hypertensive during a five year follow-up period?
Comparison of incidence rates in
two groups in a cohort study
3. Dr. Rahma
What is the proportion of LBW neonates born to
sickle cell mothers and what is the proportion of
LBW neonates born to normal mothers in a cohort
of mothers?
Comparison of two proportions in
a cohort study
4. Dr. Abrar
What is the proportion of ILI absent students in the
handwashing schools and what is the proportion of
ILI absent students in the control schools? Here
schools are the units of randomisation
Comparison of two proportions in
a 2 group cluster RCT
32
34. Scenario 1 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• What is the proportion of patients who quit smoking
in a tobacco cessation program?
• Specifically, what is the proportion of patients with
DM and HTN who quit smoking in a tobacco cessation
program?
• It is a cross-sectional study based on secondary data
analysis
• Estimating a single proportion
34
35. Scenario 1 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• SS formula for estimation a difference between two proportions in cohort study
• Inputs required are expected proportion of quitting, precision & confidence level
35
36. Scenario 1 – Step 3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A thorough literature review and preliminary data analysis showed a wide variation in
the expected proportion – from 10% to 50%
36
38. Scenario 1 – Step 5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• The concept of dropouts or loss to follow-up in
not applicable in this case because it is secondary
data analysis
• So the sample size should be >400 and but need
not be >3500
• Final decision will depend on feasibility
38
40. Scenario 2 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis: the risk of developing (incidence) DM will
be higher in obese hypertensive patients as compared
to non-obese hypertensive patients during a 5 year
follow-up period
• It is a cohort study with two groups
• Exposed is obese hypertensive
• Non-exposed is non-obese hypertensive
• Outcome is incidence of DM
• Estimating a difference between two incidence rates
in a cohort study
40
41. Scenario 2 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• This problem can be visualised in a number of ways:
1. Comparing two incidence rates in a cohort study
(Relative Risk – hypothesis test)
2. Comparing two incidence rates in a cohort study
(Relative Risk – stated precision)
3. Comparing two incidence rates in a cohort study
with small proportion and fixed study duration (Risk
difference – hypothesis test)
4. Comparing two proportions (Risk difference –
hypothesis test)
41
42. Scenario 2 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 1: SS formula for estimating RR with stated precision
• Inputs required are expected proportion of disease among exposed & unexposed, RR, Precision,
confidence level
42
43. Scenario 2 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 2: SS formula for hypothesis testing of RR
• Inputs required are expected proportion of disease among exposed & unexposed, RR, power,
confidence level
43
44. Scenario 2 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• SS formula for difference in two proportions (aka risk difference) can also be used for
this scenario
Risk difference between 2 proportions Risk difference between 2 incidence rates
with fixed study duration
44
46. Scenario 2 – Step 3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A casual literature review showed that the risk of DM was 5 times among obese HTN as compared to non-
obese HTN, the incidence among non-obese was 5.4 and among obese was 24.2 per 1000 person years
46
47. Scenario 2 – Step 4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 1 & 2: Substituting the values for a number of scenarios in the software
47
48. Scenario 2 – Step 5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Considering a loss to follow-up of 10%
• Final sample size = 716 / 0.90 = 795 per group
48
50. Scenario 3 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis: the proportion of LBW neonates will be
higher in the sickle cell mothers as compared to the
non-sickle cell mother
• It is a cohort with two groups
• Exposed is mothers with sickle cell disease
• Non-exposed is normal mothers
• Outcome is proportion of LBW
• Estimating a difference between two proportions in
a cohort study
50
51. Scenario 3 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• This problem can be visualised in a number of ways:
1. Comparing two incidence rates in a cohort study
(Relative Risk – hypothesis test)
2. Comparing two incidence rates in a cohort study
(Relative Risk – stated precision)
3. Comparing two incidence rates in a cohort study
with small proportion and fixed study duration (Risk
difference – hypothesis test)
4. Comparing two proportions (Risk difference –
hypothesis test)
51
52. Scenario 3 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 1: SS formula for estimating risk difference (hypothesis test)
• Inputs required are expected proportion of disease among exposed & unexposed, power, confidence level
Difference between 2 proportions
52
53. Scenario 3 – Step 3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A literature review showed that proportion of LBW among SCD mothers was 16.5% and
in the normal mothers it was 8.3%, with an RR of ~2
53
55. Scenario 3 – Step 5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Considering a loss to follow-up of 10%
• Final sample size = 331 / 0.90 = 368 per group
55
57. Scenario 4 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Hypothesis: the proportion of students being absent due
to ILI will be higher in the control schools as compared to
the schools implementing the handwashing program
during a follow up period of 6 weeks
• It is a cluster RCT with two groups
• Exposed is handwashing program
• Non-exposed is no handwashing program
• Outcome is proportion of ILI absenteeism
• School is the unit of randomisation
• Estimating a difference between two proportions in a
cluster RCT
57
58. Scenario 4 – Step 1
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• This problem can be visualised in a number of ways:
1. Comparing two proportions (Risk difference –
hypothesis test using ICC)
2. Comparing two proportions (Risk difference –
hypothesis test using Design Effect)
3. Comparing two proportions (Risk difference –
hypothesis test using Coefficient of variation)
58
59. Scenario 4 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 1: SS formula for comparison of proportions using design effect
• Inputs required are proportion of outcome in the exp. group & control group, size of cluster, DE, power,
confidence level
59
60. Scenario 4 – Step 2
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 2: SS formula for comparison of proportions using intra cluster correlation coefficient
• Inputs required are proportion of outcome in the exp. group & control group, size of cluster, ICC,
power, confidence level
60
61. Scenario 4 – Step 3
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A literature review showed that incidence of ILI absenteeism was 0.043 in the exp.
group and 0.070 in the control group
61
63. Scenario 4
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Method 3: SS formula for comparison of incidence rates (person time)
• Inputs required are incidence rates (PT) in the exp. group & control group, coeff. of variation, power,
confidence level
63
64. Scenario 4 – Step 5
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Considering a loss to follow-up of 10%
• Final sample size = 1625 / 0.90 = 1805 per group
• No. of clusters required = 1805 / 40 = 45 per group
64
65. Review
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• Why is sample size calculation important?
• What are the five steps to calculate the SS?
• What are the some of the common inputs required
for sample size formulae?
• How will you select an appropriate source for the
inputs of SS formula?
• How will you relate the SS of your study after the
results?
65
66. Take home messages
Demystifying statistics! SBCM, Joint Program – RiyadhSBCM, Joint Program – Riyadh
• A priori sample size calculation is very
crucial for making valid conclusions
• Follow the stepwise approach
• Sample size estimation does not need to be
very accurate, only adequate
• In case of non-significant findings in a study,
calculate power for deeper understanding
66