Leading transformational change: inner and outer skills
Sample size by formula
1. SAMPLE SIZE (2)SAMPLE SIZE (2)
Dr Htin Zaw SoeDr Htin Zaw Soe
MBBS, DFT, MMedSc (P & TM), PhD,MBBS, DFT, MMedSc (P & TM), PhD,
DipMedEdDipMedEd
Associate Professor, Department ofAssociate Professor, Department of
BiostatisticsBiostatistics
University of Public Health, YangonUniversity of Public Health, Yangon
2. Sample sizeSample size
It isIt is not necessarily truenot necessarily true that the bigger the sample size, the better thethat the bigger the sample size, the better the
study becomesstudy becomes
To get a better study, it is necessary to increaseTo get a better study, it is necessary to increase accuracyaccuracy of dataof data
collection and to have acollection and to have a representativerepresentative samplesample
A desired sample size – determined by expected variation in data (ie.A desired sample size – determined by expected variation in data (ie.
the more varied the data, the larger the sample size to get same levelthe more varied the data, the larger the sample size to get same level
of accuracy)of accuracy)
For exploratory studies, start with a small sample size (eg. n= 30)For exploratory studies, start with a small sample size (eg. n= 30)
For cross-sectional and analytical studies, sample size - calculated.For cross-sectional and analytical studies, sample size - calculated.
The eventual sample size is usually a compromise between what isThe eventual sample size is usually a compromise between what is
desirabledesirable and what isand what is feasible.feasible.
3. Feasible ‘n’ - determined by time/manpower/transport/moneyFeasible ‘n’ - determined by time/manpower/transport/money
Rules: many variables → smaller nRules: many variables → smaller n
: few variables → larger n: few variables → larger n
: more varied the data → larger n: more varied the data → larger n
: at least 5 – 10 study units per cell in cross- tabulations: at least 5 – 10 study units per cell in cross- tabulations
Sample size determinationSample size determination
- By formulaBy formula
- By table of minimum sample sizeBy table of minimum sample size
Sample size calculation formulaeSample size calculation formulae
- Divided into two categoriesDivided into two categories
(A) For studies trying to measure a variable with a certain(A) For studies trying to measure a variable with a certain precisionprecision
(B) For studies seeking to demonstrate a(B) For studies seeking to demonstrate a significant differencesignificant difference
between two groupsbetween two groups
4. (A) For studies trying to measure a variable with a certain(A) For studies trying to measure a variable with a certain precisionprecision
Abbreviations used are:Abbreviations used are:
n = sample sizen = sample size
s = standard deviations = standard deviation
e = required size of standard errore = required size of standard error
( margin of error is used for ± 2 times the size of standard error (e) if( margin of error is used for ± 2 times the size of standard error (e) if
a precision of 95% is required)a precision of 95% is required)
r = rater = rate
p = percentagep = percentage
d = confidence leveld = confidence level
[For 90% confidence level, d = 1 (1.645) ][For 90% confidence level, d = 1 (1.645) ]
[For 95% confidence level , d = 2 (1.96) ][For 95% confidence level , d = 2 (1.96) ]
[For 99% confidence level , d = 3 (2.58) ][For 99% confidence level , d = 3 (2.58) ]
e = width of interval / 2de = width of interval / 2d
5. (1) Single mean(1) Single mean
n = sn = s22
/ e/ e22
(2) Single rate(2) Single rate
n = r / en = r / e22
(3) Single proportion(3) Single proportion
n = p (1-p) / en = p (1-p) / e22
(4) Difference between two means ( n in each group)(4) Difference between two means ( n in each group)
n = sn = s11
22
+ s+ s22
22
/ e/ e22
(5) Difference between two rates ( n in each group)(5) Difference between two rates ( n in each group)
n = rn = r11 + r+ r22 / e/ e22
(6) Difference between two proportions ( n in each group)(6) Difference between two proportions ( n in each group)
n = pn = p11(1 –p(1 –p11) + p) + p22(1-p(1-p22) / e) / e22
6. Single meanSingle mean
In a study the mean weight of newborn babies will be determined. TheIn a study the mean weight of newborn babies will be determined. The
mean weight is expected to be 3000 grams. Weights are approximatelymean weight is expected to be 3000 grams. Weights are approximately
normally distributed and 95% of the birth weights are probablynormally distributed and 95% of the birth weights are probably
between 2000 and 4000 gram; therefore the standard deviation wouldbetween 2000 and 4000 gram; therefore the standard deviation would
bebe 500500 gram. The desired 95% confidence interval isgram. The desired 95% confidence interval is 2950 to 30502950 to 3050
gram, so the standard error would be 25 gram. The required samplegram, so the standard error would be 25 gram. The required sample
size would be:size would be:
n=n=ss22
==50050022
==250000250000=400 new born babies=400 new born babies
ee22
252522
625625
(Note:(Note: e= width of interval /2de= width of interval /2d = 100/2× 2 = 25)= 100/2× 2 = 25)
9. Difference between two means (sample size in each group)Difference between two means (sample size in each group)
The difference of the mean birth weights in district A and B will be The difference of the mean birth weights in district A and B will be
determined. In district A the mean is expected to be 3000 grammes determined. In district A the mean is expected to be 3000 grammes
with a standard deviation of 500 gram. In district B the mean is with a standard deviation of 500 gram. In district B the mean is
expected to be 3200 gram with a standard deviation of 500 gram. expected to be 3200 gram with a standard deviation of 500 gram.
The difference in mean birth weight between districts A and B is The difference in mean birth weight between districts A and B is
therefore expected to be 200 gram. The desired 95% confidence therefore expected to be 200 gram. The desired 95% confidence
interval of this difference is 100 to 300 gram, giving a standard error interval of this difference is 100 to 300 gram, giving a standard error
of the difference of 50 gram. The required sample size would be:of the difference of 50 gram. The required sample size would be:
n = n = ss11
22
+ s + s22
22
= =50050022
+ 500 + 50022
= 200 newborns in each district= 200 newborns in each district
ee22
50 5022
(Note: (Note: e= width of interval /2d e= width of interval /2d = 200/2× 2 = 50)= 200/2× 2 = 50)
10. Difference between two rates (sample size in each group)Difference between two rates (sample size in each group)
The difference in maternal mortality rates between urban and rural The difference in maternal mortality rates between urban and rural
areas will be determined. In the rural areas the maternal mortality rate areas will be determined. In the rural areas the maternal mortality rate
is expected to be 100 per 10,000 and in the urban areas 50 per 10,000 is expected to be 100 per 10,000 and in the urban areas 50 per 10,000
live births. The difference is therefore 50 per 10,000 live births. The live births. The difference is therefore 50 per 10,000 live births. The
desired 95% confidence interval of this difference is 30 to 70 per 10,000 desired 95% confidence interval of this difference is 30 to 70 per 10,000
live births giving a standard error of the difference of 10/10,000. The live births giving a standard error of the difference of 10/10,000. The
required sample size would be:required sample size would be:
n= n= rr11 + r + r22==100/10,000 + 50/10,000100/10,000 + 50/10,000 =15,000 =15,000 live births in each arealive births in each area
ee22
(10/10,000) (10/10,000)22
(Note: (Note: e= width of interval /2d e= width of interval /2d = 40/2× 2 = 10)= 40/2× 2 = 10)
11. Difference between two proportions (sample size in each group)Difference between two proportions (sample size in each group)
The difference in the proportion of nurses leaving the service is The difference in the proportion of nurses leaving the service is
determined between two regions. In one region 30% of the nurses are determined between two regions. In one region 30% of the nurses are
estimated to leave the service within three years of graduation, in the estimated to leave the service within three years of graduation, in the
other region 15%, giving a difference of 15%. The desired 95% other region 15%, giving a difference of 15%. The desired 95%
confidence interval for this difference is 5% to 25%, giving a standard confidence interval for this difference is 5% to 25%, giving a standard
error of 5%. The sample size in each group would be:error of 5%. The sample size in each group would be:
n=n=pp11 (100 - p (100 - p11) + p) + p22 (100 - p (100 - p22))
ee22
==30 x 70 + 15 x 8530 x 70 + 15 x 85=135 nurses in each region=135 nurses in each region
5522
(Note: (Note: e= width of interval /2d e= width of interval /2d = 20/2× 2 = 5)= 20/2× 2 = 5)
12. (B) For studies seeking to demonstrate a (B) For studies seeking to demonstrate a significant differencesignificant difference between between
two groupstwo groups
Abbreviations used are:Abbreviations used are:
n = sample sizen = sample size
s = standard deviations = standard deviation
e = required size of standard errore = required size of standard error
m = meanm = mean
r = rater = rate
p = percentagep = percentage
u = one-sided percentage point of the normal distribution, u = one-sided percentage point of the normal distribution,
corresponding to 100% - the power. corresponding to 100% - the power. The power is the probability of The power is the probability of
finding a significant resultfinding a significant result. (eg. if the power is 75%, u = 0.67). (eg. if the power is 75%, u = 0.67)
v = percentage point of the normal distribution, corresponding to the v = percentage point of the normal distribution, corresponding to the
(two-sided) significance level (eg. if the significance level is 5% (as (two-sided) significance level (eg. if the significance level is 5% (as
usual), v = 1.96) usual), v = 1.96)
13. (1) Comparison of two means (n in each group)(1) Comparison of two means (n in each group)
n = ( u + v)n = ( u + v)22
(s(s11
22
+ s+ s22
22
) / (m) / (m11 - m- m22))22
(2) Comparison of two rates (n in each group)(2) Comparison of two rates (n in each group)
n = ( u + v)n = ( u + v)22
(r(r11 + r+ r22) / (r) / (r11 - r- r22))22
(3) Comparison of two proportions (n in each group)(3) Comparison of two proportions (n in each group)
n = ( u + v)n = ( u + v)22
{p{p11(1 - p(1 - p11) + p) + p22(1 - p(1 - p22) } / (p) } / (p11 - p- p22))22
14. Other formulaeOther formulae (Ref No. 2)(Ref No. 2)
(1) For cross-sectional study(1) For cross-sectional study
(1.1) For measuring one variable : single proportion(1.1) For measuring one variable : single proportion
n = (p q) (zn = (p q) (zαα /d)/d)22
(the same as in n = p (1-p) / e(the same as in n = p (1-p) / e22
))
n = sample sizen = sample size
p = the approximate value of the proportion or percentage ofp = the approximate value of the proportion or percentage of
interest to be determined (if is not known, use 0.5 for p as ainterest to be determined (if is not known, use 0.5 for p as a
conservative estimate)conservative estimate)
q = 1-pq = 1-p
zzαα = percentage point of the normal distribution, corresponding to= percentage point of the normal distribution, corresponding to
the two-sided significance level (can be found from the Standardthe two-sided significance level (can be found from the Standard
Normal Table or z table)Normal Table or z table)
d = precision - how close to the proportion of interest the estimated = precision - how close to the proportion of interest the estimate
is desired to beis desired to be
15. (1.2) For difference between two proportions(1.2) For difference between two proportions
n = zn = zαα
22
(p(p11qq11 + p+ p22qq22) / d) / d22
(the same as in n = p(the same as in n = p11(1 –p(1 –p11) + p) + p22(1-p(1-p22) / e) / e22
))
pp11 = the proportion or percentage of interest to be determined for= the proportion or percentage of interest to be determined for
group 1group 1
pp22 = the proportion or percentage of interest to be determined for= the proportion or percentage of interest to be determined for
group 2group 2
qq11 = 1 - p= 1 - p11
qq22 = 1 – p= 1 – p22
d = precisiond = precision
zzαα = percentage point of the normal distribution, corresponding to the= percentage point of the normal distribution, corresponding to the
two-sided significance leveltwo-sided significance level
n = sample size in each groupn = sample size in each group
16. (2) For analytical studies(2) For analytical studies
(2.1) For significant difference between two groups: comparison of(2.1) For significant difference between two groups: comparison of
two proportionstwo proportions
n = [zn = [zαα ++ zzββ ]]22
[p[p11 qq1+1+ pp22 qq22] / (p] / (p11 - p- p22 ))22
(the same as in n = ( u + v)(the same as in n = ( u + v)22
{p{p11(1 - p(1 - p11) + p) + p22(1 - p(1 - p22) } / (p) } / (p11 - p- p22))22
))
pp11 = the prevalence, proportion or percentage of interest of group 1= the prevalence, proportion or percentage of interest of group 1
pp22 = the prevalence, proportion or percentage of interest of group 2= the prevalence, proportion or percentage of interest of group 2
qq11 = 1 - p= 1 - p11
qq22 = 1 – p= 1 – p22
zzαα = percentage point of the normal distribution, corresponding to= percentage point of the normal distribution, corresponding to
the two-sided significance levelthe two-sided significance level
zz1-1-ββ = One-sided percentage point of the normal distribution,= One-sided percentage point of the normal distribution,
corresponding to 100%, the power (can be found from the Standardcorresponding to 100%, the power (can be found from the Standard
Normal Table or z table)Normal Table or z table)
17. (2.2) For case control study
n = 2 (zα + zβ )2
(p q) / (p0 - p1 )2
p1 = p0 × OR / [ 1 + p0 (OR – 1)]
The estimate of proportion of individuals among the cases who
were exposed
p0 = proportion of individuals among the controls whom we expect
have been exposed
OR = Odds ratio that is to be tested as being statistically significant is
specified by investigator
p = p0 + p1 / 2
q = 1 – p
zα = percentage point of the normal distribution, corresponding to the
two-sided significance level
z 1-β = One-sided percentage point of the normal distribution,
corresponding to 100%, the power (can be found from the Standard
18. (2. 3) For cohort study
n = 1 / 1-f [2 (zα + zβ )2
(p q) / (p0 - p1 )2
]
f = proportion of study subjects who are expected to leave the study
(drop-out)
p0 = proportion of participants in the unexposed group who are
expected to exhibit the outcome of interest
p1 = proportion of participants in the exposed group who are expected
to exhibit the outcome of interest
p = p0 + p1 / 2
q = 1 – p
zα = percentage point of the normal distribution, corresponding to the
two-sided significance level
z1-β = One-sided percentage point of the normal distribution,
corresponding to 100%, the power (can be found from the Standard
Normal Table or z table)
19. (3) For randomized clinical trial
n = 1 / 1-f [2 (zα + zβ )2
(p q) / (p0 - p1 )2
]
f = proportion of study subjects who are expected to leave the study
(drop-out)
p0 = proportion of participants in the control treatment group who are
expected to exhibit the outcome of interest
p1 = proportion of participants in the treatment group who are
expected to exhibit the outcome of interest
p = p0 + p1 / 2
q = 1 – p
zα = percentage point of the normal distribution, corresponding to the
two-sided significance level
z1-β = One-sided percentage point of the normal distribution,
corresponding to 100%, the power (can be found from the Standard
Normal Table or z table)
20. Sample size determination by table of minimum sample sizeSample size determination by table of minimum sample size
[See a manual by Lwanga SK and S Lemeshaw (1991)][See a manual by Lwanga SK and S Lemeshaw (1991)]
21. References:References:
(1)(1) C. Varkevisser, I. Pathmanathan, & A Brownlee (2000).C. Varkevisser, I. Pathmanathan, & A Brownlee (2000).
Health Systems Research Training SeriesHealth Systems Research Training Series: Volume 2-: Volume 2- Designing andDesigning and
conducting health systems research projects;conducting health systems research projects; Part I- ProposalPart I- Proposal
Development and Fieldwork.Development and Fieldwork.
(2) Department of Medical Research (Lower Myanmar). (2010)(2) Department of Medical Research (Lower Myanmar). (2010) LectureLecture
Guide onGuide on Research MethodologyResearch Methodology. 7th edition. Union of Myanmar.. 7th edition. Union of Myanmar.
Department of Medical Research (Lower Myanmar), Ministry ofDepartment of Medical Research (Lower Myanmar), Ministry of
Health: 187.Health: 187.
(3) Lwanga SK and S Lemeshaw (1991). Sample size determination in(3) Lwanga SK and S Lemeshaw (1991). Sample size determination in
health studies: A practical manual. WHO. Geneva. pp 80.health studies: A practical manual. WHO. Geneva. pp 80.