Sample size determination

SAMPLE SIZE
DETERMINATION
AUGUSTINE GATIMU NJUGUNA
Augustine Gatimu Njuguna-PhD (Epidemiology) Candidate-JKUAT, FUoN
(Epidemiology) Candidate - UoN, FUoN (Health Informatics) – UoN .MSc.
Medical Statistics – UoN BScN – KeMU.

Session Outline
• What is sample size?
• Basic information needed for sample size calculation.
• Why to determine sample size?
• How large a sample do we need?
• What are the methods of determining it?
• What are the factors that affect it?
• Types of measurement in research.
• How do we determine sample size?
• Conclusion

What is a Sample?
• This is the sub-population, to be studied in order to draw a inference
from a reference population (a population to which the findings of
the Study are to be generalized).
• In Census, the sample size is equal to the population size.
• However, in research, because of time constraints and budget, a
representative sample is normally used.
• Larger the sample, more accurate will be the findings from a Study.

Cont’d……………
• Availability of resources sets upper limit of the sample size.
• Required accuracy sets lower limit of sample size.
• Thus, an optimum sample size is an essential component of any
research.

Basic Information Needed for
Sample Size Calculation
The approach to sample size calculation can be arrived at by thinking through the
following set of questions:
• What type of study is this?
Single sample (prevalence survey)
Comparison of two groups (cross-sectional, case-control, cohort study)
• What is the main (primary) outcome?
 Mean of a measurement (mean blood pressure)
 Proportion
 Ordered scale (pain scores)
• What is the expected variability between the subjects?
• How large a difference would be considered clinically important and reasonable?

What is sample size determination
• Sample size determination is the mathematical estimation of the
number of subjects/units to be included in a study.
• When a representative sample is taken from a population, the finding
are generalized to the population.
• Optimum sample size determination is required for the following
reasons:
To allow appropriate analysis
To provide desired level of accuracy
To allow validity to the significance test.

How large a sample do we need?
If the sample is too small:
1. Even a well conducted Study may fail to answer it’s research
question.
2. It may fail to detect important effects or associations.
3. It may associate this effect or association imprecisely.

If the sample size is too large:
1. The Study will be difficult and costly.
2. Time constraint.
3. Loss of accuracy.
Hence, optimum sample size must be determined before
commencement of a Study.

Types of Measurement in Research
• Random error
• Systematic error (bias)
• Precision (reliability)
• Accuracy (Validity)
• Effect size
• Design effect
• Type I(a) error
• Type II (b) error
• Power (1-β)
• Null hypothesis
• Alternative hypothesis

Definition of terms
• Random error: Errors that occur by chance. Sources are sample
variability, subject to subject differences & measurement errors. These
can be reduced by averaging, increasing sample size, repeating the
experiment.
• Systematic error: Deviations not due to chance alone. Several factors,
e.g. patient selection criteria may contribute. It can be reduced by good
study design and conduct of the experiment.
• Precision: The degree to which a variable has the same value when
measured several times. It is a function of random error.
• Accuracy: The degree to which a variable actually represent the true
value. It is function of systematic error.

• Power: This is the probability that the test will correctly identify a
significant difference, effect or association in the sample should one
exist in the population. Sample size is directly proportional to the
power of the study. The larger the sample size, the study will have
greater power to detect significance difference, effect or association.
• Effect size: Is a measure of the strength of the relationship between
two variables in a population. The bigger the size of the effect in the
population, the easier it will be to find out.

• Design effect: Geographic clustering is generally used to make the
study easier & cheaper to perform. The effect on the sample size
depends on the number of clusters & the variance between & within
the cluster.
In practice, this is determined from previous studies and is expressed as a
constant called ‘design effect’ often between 1.0 & 2.0. The sample sizes for
simple random samples are multiplied by the design effect to obtain the sample
size for the cluster sample.

• Null hypothesis: It state that there is no difference among groups or
no association between the predictor & the outcome variable. This
hypothesis need to be tested.
• Alternative hypothesis: It contradict the null hypothesis. If the
alternative hypothesis cannot be tested directly, it is accepted by
exclusion if the test of significance rejects the null hypothesis. There
are two types; one tail(one-sided) or two tailed(two-sided)

• A type I error occurs if you reject the null hypothesis when it is true.
• A type II error occurs if you do not reject the null hypothesis when it
is false.

At what stage can sample size be addressed?
• It can be addressed at two stages:
1. Calculation of the optimum sample size is required during the
planning stage, while designing the Study and information on some
parameters.
2. At the stage of interpretation of the result.

Approaches for estimating sample size
• Approaches for estimating sample size depend primarily on:
1. The study design &
2. The main outcome measure of the study
There are distinct approaches for calculating sample size for different
study designs & different outcome measures.

Procedure for calculating sample size
• There are 3 procedures that could be used for calculating sample size:
1. Use of formulae
2. Ready made tables
3. Computer soft wares

Sample Size Formula
• The formula requires that we (i)specify the amount of confidence we
wish to have, (ii) estimate the variance in the population, and (iii)
specify the level of desired accuracy we want.
• When we specify the above, the formula tells us what sample size we
need to use….n

Use of formulae for sample size calculation &
power analysis
• There are many formulae for calculating sample size & power in
different situations for different study designs.
• The appropriate sample size for population-based study is
determined largely by 3 factors
1. The estimated prevalence of the variable of interest.
2. The desired level of confidence.
3. The acceptable margin of error.

To calculate the minimum sample size required for accuracy, in estimating
proportions, the following decisions must be taken:
• Decide on a reasonable estimate of key proportions (p) to be measured in the
study
• Decide on the degree of accuracy (d) that is desired in the study.~1%-5% or
0.01 and 0.05
• Decide on the confidence level(Z) you want to use. Usually 95%≡1.96.
• Determine the size (N) of the population that the sample is supposed to
represent.
• Decide on the minimum differences you expect to find statistical significance.

For population >10,000.
•n=𝑧2pq/𝑑2
n= desired sample size(when the population>10,000)
Z=standard normal deviate; usually set at 1.96(or a~2), which correspond to
95% confidence level.
p=proportion in the target population estimated to have a particular
characteristics. If there is no reasonable estimate, use 50%(i.e. 0.5)
q=1-p(proportion in the target population not having the particular
characteristics)
d= degree of accuracy required, usually set at 0.05 level( occasionally at 2.0)

Example 1
• If the proportion of a target population with certain characteristics is 0.50,
Z statistics is 1.96 & we desire accuracy at 0.05 level, then the sample size
is;-
N=(1.962)(0.5)(0.5)/0.052
N=384.

If study population is < 10,000
nf=n/1+(n)/(N)
• nf= desired sample size, when study population <10,000
• n= desired sample size, when the study population > 10,000
• N= estimate of the population size
Example, if n were found to be 400 and if the population size were
estimated at 1000,
then nf will be calculated as follows
nf= 400/1+400/1000
nf= 400/1.4
nf=286

Sample size formula for comparison of groups
• If we wish to test difference(d) between two sub-samples regarding a
proportion & can assume an equal number of cases(n1=n2=n’) in two
sub samples, the formula for n’ is
n’=2𝒛𝟐
𝟐pq/𝒅𝟐
• E.g. suppose we want to compare an experimental group against a
control group with regards to women using contraception. If we
expect p to be 40 & wish to conclude that an observed difference of
0.10 or more is significant at the
0.05 level, the sample size will be:
n’= 2(1.96)2(0.4)(0.6)/0.12
=184
Thus, 184 experimental subject & another 184 control subjects are
required.

Use of ready made table for sample size calculation
• How large a sample of patients should be followed up if an investigator wishes to
estimate the incidence rate of a disease to within 10% of it’s true value with 95%
confidence?
• The table show that for e=0.10 & confidence level of 95%, a sample size of 385
would be needed.
• This table can be used to calculate the sample size making the desired changes in
the relative precision & confidence level .e.g. if the level of confidence is reduce to
90%, then the sample size would be 271.
• Such table that give ready made sample sizes are available for different designs &
situation

Use of computer software for sample size
calculation & power analysis
• The following software can be used for calculating sample size & power;
Epi-info
nQuerry
Power & precision
Sample
STATA
SPSS

Epi-info for sample size determination
• In STATCALC:
1 Select SAMPLE SIZE & POWER.
2 Select POPULATION SURVEY.
3 Enter the size of population (e.g. 15 000).
4 Enter the expected frequency (an estimate of the true prevalence,
e.g.80% ± your minimum standard).
5 Enter the worst acceptable result (e.g. 75%) i.e the margin of error is
5%

CONCLUSIONS
1. Sample size determination is one of the most essential components of
every research Study.
2. The larger the sample size, the higher will be the degree of accuracy, but
this is limited by the availability of resources.
3. It can be determined using formulae, ready made tables and computer
soft wares.
Steps:
1. 1st Formulate a research question
2. 2nd Select appropriate study design, primary outcome measure,
statistical significance.
3. 3rd use the appropriate formula to calculate the sample size.

Sample size determination

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sample size determination

Similar to Sample size determination (20)

Recently uploaded

Recently uploaded (20)

Sample size determination