Research Method:
Sample size determination
Mengistu Meskele (PhD, Associate Professor )
1
Why do we need to calculate sample the size? Reasons
To improve precision of the estimates
To consider random sampling error
To assess feasibility
To plan proper implementation
To respond efficiently to research question
Sample size calculation
Necessary information input
Population size
Hypothesized % frequency of outcome factor in the population (p):
Confidence limits
Power (1-β)
Ratio of Controls to Cases
Ratio of Unexposed to Exposed
Proportion of controls with exposure
Proportion of cases with exposure:
Least extreme OR/RR to be detected:
Percent of Unexposed with Outcome:
Percent of Exposed with Outcome:
 Design effects: Complex sampling strategies
 Potential sample reduction during study implementation:
 Non - response
 Losses to follow-up
 Non-compliance
In order to calculate the required sample size, you need to know
the following facts
A] The reasonable estimate of the key proportion to be studied. If you cannot guess the
proportion, take it as 50%.
B) The degree of accuracy required. That is, the allowed deviation from the true proportion in
the population as a whole. It can be within 1% or 5%, etc.
C) The confidence level required, usually specified as 95%
D] The size of the population that the sample is to represent. If it is more than 10,000 the precise
magnitude is not likely to be very important; but if the population is less than 10,000 then a
smaller sample size may be required
In order to calculate the required sample size, you need to know the
following facts…
E) The difference between the two sub-groups and the value of the
likelihood or the power that helps in finding a statistically significant
Difference(For two population groups and the interest is to compare between
two means or proportions).
Estimating a proportion
• Estimate how big the proportion might be (P)
• Choose the margin of error you will allow in the estimate of the proportion (say ±
w)
• Choose the level of confidence that the proportion in the whole population is indeed
between (p-w) and (p+w). We can never be 100% sure. Do you want to be 95%
sure?
• The minimum sample size required, for a very large population (N>10,000) is:
• n = Z2 p(1-p) / w2
Estimating a proportion…
• Example 1 (Prevalence of diarrhoea)
• a) p = 0.26 , w = 0.03 , Z = 1.96 ( i.e., for a 95% C.I.)
• n = (1.96)2 (.26 × .74) / (.03)2 = 821.25 ≈ 822
• Thus, the study should include at least 822 subjects.
• b) If the above sample is to be taken from a relatively small population (say N
= 3000), the required minimum sample will be obtained from the above
estimate by making some adjustment.
Finite population correction for proportions
• With finite populations, correction for proportions is necessary
• If the population is small then the sample size can be reduced slightly.
• This is because a given sample size provides proportionately more information
for a small population than for a large population.
• The sample size (n0) can thus be adjusted using the corrected formulae
Finite population correction for proportions…
• N= 821.25 / (1+ (821.25/3000)) = 644.7 ≈ 645 subjects
N.B. If you don’t have any information about P, take it as 50% and get the
maximum value of
PQ which is 1/4 (i.e., 25%).
Comparison of two proportions
• n (in each region) = (p1q1 + p2q2) (f(α,β)) / ((p1 - p2)²
• α = type I error (level of significance)
• β = type II error ( 1-β = power of the study)
• power = the probability of getting a significant result
• f (α,β) =10.5, when the power = 90% and the level of significance = 5%
• N.B. Sample size calculation using the STATCALC calculator of the Epi
Info program
Poor symptomatic tuberculosis screening practices in a
quarter of health centres in Ethiopia
Sample size calculation
P(Proportion of health centers which provide tuberculosis screening)= 50%
10% margin of error
95% confidence levels.
Appropriate health-seeking behavior and associated factors among
people who had cough for at least two weeks in Ethiopia
Sample size calculation
5% level of significance
 4.2% prevalence of cough
Total adult population(aged =15 years) of 43,128,
2% margin of error,
A design effect of 2,
non response rate of 10%.
Prevalence and factors affecting work-related
injury in Ethiopia
P1(prevalence of work-related injury among small scale industry workers = 50%)
 P2 (prevalence of work-related injury among medium scale industry workers=
40%)
 5 % level of significance
 Power of 80 %
Allocation ratio of small to medium-scale industrial workers (n 1 : n2) of 1:2
none response rate of 5 %.
OR=1.5
EXAMPLE
Aim: To estimate the effect of second-hand smoke on the risk active
Tuberculosis
Design: Cohort Study
Statistical analysis: Estimation: RR of TB (second-hand smoke exposure vs no
exposure)
Assumptions / Decisions
 Significance level: 0.05
 Power:0.80
 Expected baseline TB incidence among unexposed:
 6/ 1 000 000 person-years
 Minimally relevant effect to detect: RR: 2
Sample Size calculation : Varying scenarios
Sample size
Effect of change in significance level, power, aims,
design and population characteristics- assumptions
Non-Probability Sampling
Probability of inclusion not known so the existence of
sampling error also not known
Degree to which the sample represents the
population of interest is not known.
Uses
 When identifying the sampling frame is not possible or
practical
 Hard-to-identify populations
 When generalization is not the main purpose
 Clinical trials
Types of Non-Probability Sampling
Purposeful/purposive sampling
Convenience sampling
One that adequately answers the research question:
No hard and fast rules(can be one)
Selection continues to the point of redundancy (until new categories, themes
or explanations stop emerging from the data)
Depend on available time and resources
Sample Size
Selection of Subjects
Knowledgeable: topic, culture, context, language….
Interactive: open minded, free to speak, clear
Responsive: immediate reaction, critical, sensitive to circumstances
Good memory: good recall of events
Purposive Sampling
Making a strategic choices about selecting the study participant and context: whom to
chose, where, and how
 There is no one best sampling strategy; depends on the context and the nature of
research objective
• Focus on the most productive sample
• Consider representing range of variation
• Be flexible…ongoing analysis can indicate subsequent selection; watch for “missing” voices
• Theoretical representativeness
Less rigorous sampling method
 Convenience sampling: Selecting readily available respondents, the weakest sampling
scheme due to low credibility
Example : study on contraceptive use started in clinics
 A group of women who met the researchers outside in a village where spontaneously
included.
Some suggestions of sample size in qualitative
studies
• The smallest number of participants should be 15
• Should lie under 50
• 6-8 participants for FGDs AND at least 2 FGDs per population group
• IMPORTANT
• Attainment of saturation
• Justification of choice of number
Task
 Estimate sample size for the proposed study by
considering potential assumptions for the research
question/objectives
26

L7 Sample size determination.pptx

  • 1.
    Research Method: Sample sizedetermination Mengistu Meskele (PhD, Associate Professor ) 1
  • 2.
    Why do weneed to calculate sample the size? Reasons To improve precision of the estimates To consider random sampling error To assess feasibility To plan proper implementation To respond efficiently to research question
  • 3.
    Sample size calculation Necessaryinformation input Population size Hypothesized % frequency of outcome factor in the population (p): Confidence limits Power (1-β) Ratio of Controls to Cases Ratio of Unexposed to Exposed Proportion of controls with exposure Proportion of cases with exposure: Least extreme OR/RR to be detected: Percent of Unexposed with Outcome: Percent of Exposed with Outcome:  Design effects: Complex sampling strategies  Potential sample reduction during study implementation:  Non - response  Losses to follow-up  Non-compliance
  • 4.
    In order tocalculate the required sample size, you need to know the following facts A] The reasonable estimate of the key proportion to be studied. If you cannot guess the proportion, take it as 50%. B) The degree of accuracy required. That is, the allowed deviation from the true proportion in the population as a whole. It can be within 1% or 5%, etc. C) The confidence level required, usually specified as 95% D] The size of the population that the sample is to represent. If it is more than 10,000 the precise magnitude is not likely to be very important; but if the population is less than 10,000 then a smaller sample size may be required
  • 5.
    In order tocalculate the required sample size, you need to know the following facts… E) The difference between the two sub-groups and the value of the likelihood or the power that helps in finding a statistically significant Difference(For two population groups and the interest is to compare between two means or proportions).
  • 6.
    Estimating a proportion •Estimate how big the proportion might be (P) • Choose the margin of error you will allow in the estimate of the proportion (say ± w) • Choose the level of confidence that the proportion in the whole population is indeed between (p-w) and (p+w). We can never be 100% sure. Do you want to be 95% sure? • The minimum sample size required, for a very large population (N>10,000) is: • n = Z2 p(1-p) / w2
  • 7.
    Estimating a proportion… •Example 1 (Prevalence of diarrhoea) • a) p = 0.26 , w = 0.03 , Z = 1.96 ( i.e., for a 95% C.I.) • n = (1.96)2 (.26 × .74) / (.03)2 = 821.25 ≈ 822 • Thus, the study should include at least 822 subjects. • b) If the above sample is to be taken from a relatively small population (say N = 3000), the required minimum sample will be obtained from the above estimate by making some adjustment.
  • 8.
    Finite population correctionfor proportions • With finite populations, correction for proportions is necessary • If the population is small then the sample size can be reduced slightly. • This is because a given sample size provides proportionately more information for a small population than for a large population. • The sample size (n0) can thus be adjusted using the corrected formulae
  • 9.
    Finite population correctionfor proportions… • N= 821.25 / (1+ (821.25/3000)) = 644.7 ≈ 645 subjects N.B. If you don’t have any information about P, take it as 50% and get the maximum value of PQ which is 1/4 (i.e., 25%).
  • 10.
    Comparison of twoproportions • n (in each region) = (p1q1 + p2q2) (f(α,β)) / ((p1 - p2)² • α = type I error (level of significance) • β = type II error ( 1-β = power of the study) • power = the probability of getting a significant result • f (α,β) =10.5, when the power = 90% and the level of significance = 5% • N.B. Sample size calculation using the STATCALC calculator of the Epi Info program
  • 11.
    Poor symptomatic tuberculosisscreening practices in a quarter of health centres in Ethiopia Sample size calculation P(Proportion of health centers which provide tuberculosis screening)= 50% 10% margin of error 95% confidence levels.
  • 12.
    Appropriate health-seeking behaviorand associated factors among people who had cough for at least two weeks in Ethiopia Sample size calculation 5% level of significance  4.2% prevalence of cough Total adult population(aged =15 years) of 43,128, 2% margin of error, A design effect of 2, non response rate of 10%.
  • 13.
    Prevalence and factorsaffecting work-related injury in Ethiopia P1(prevalence of work-related injury among small scale industry workers = 50%)  P2 (prevalence of work-related injury among medium scale industry workers= 40%)  5 % level of significance  Power of 80 % Allocation ratio of small to medium-scale industrial workers (n 1 : n2) of 1:2 none response rate of 5 %. OR=1.5
  • 14.
    EXAMPLE Aim: To estimatethe effect of second-hand smoke on the risk active Tuberculosis Design: Cohort Study Statistical analysis: Estimation: RR of TB (second-hand smoke exposure vs no exposure) Assumptions / Decisions  Significance level: 0.05  Power:0.80  Expected baseline TB incidence among unexposed:  6/ 1 000 000 person-years  Minimally relevant effect to detect: RR: 2
  • 15.
    Sample Size calculation: Varying scenarios
  • 16.
    Sample size Effect ofchange in significance level, power, aims, design and population characteristics- assumptions
  • 19.
    Non-Probability Sampling Probability ofinclusion not known so the existence of sampling error also not known Degree to which the sample represents the population of interest is not known. Uses  When identifying the sampling frame is not possible or practical  Hard-to-identify populations  When generalization is not the main purpose  Clinical trials
  • 20.
    Types of Non-ProbabilitySampling Purposeful/purposive sampling Convenience sampling
  • 21.
    One that adequatelyanswers the research question: No hard and fast rules(can be one) Selection continues to the point of redundancy (until new categories, themes or explanations stop emerging from the data) Depend on available time and resources Sample Size
  • 22.
    Selection of Subjects Knowledgeable:topic, culture, context, language…. Interactive: open minded, free to speak, clear Responsive: immediate reaction, critical, sensitive to circumstances Good memory: good recall of events
  • 23.
    Purposive Sampling Making astrategic choices about selecting the study participant and context: whom to chose, where, and how  There is no one best sampling strategy; depends on the context and the nature of research objective • Focus on the most productive sample • Consider representing range of variation • Be flexible…ongoing analysis can indicate subsequent selection; watch for “missing” voices • Theoretical representativeness
  • 24.
    Less rigorous samplingmethod  Convenience sampling: Selecting readily available respondents, the weakest sampling scheme due to low credibility Example : study on contraceptive use started in clinics  A group of women who met the researchers outside in a village where spontaneously included.
  • 25.
    Some suggestions ofsample size in qualitative studies • The smallest number of participants should be 15 • Should lie under 50 • 6-8 participants for FGDs AND at least 2 FGDs per population group • IMPORTANT • Attainment of saturation • Justification of choice of number
  • 26.
    Task  Estimate samplesize for the proposed study by considering potential assumptions for the research question/objectives 26