1. SAMPLE SIZE CALCULATIONS
Presented By:
Dr. Nivedita Yadav
Dr. Parul Singhal
Dr. Kanishka Tyagi
Dr. Akanksha Sirohi
Dr. Aarushi
Dr. Aanchal Singh
Guided By:
Dr. Kaynat Nasser
2. NEED FOR SAMPLE SIZE
CALCULATION
• Sample-size determination is often an important step in
planning an epidemiological study
• An adequate sample size helps ensure that the study will
yield reliable information.
• Conducting a study with an inadequate sample size is not
only futile, it is also un ethical.
• Different study design need different method of sample
size calculation and one formula cannot be used in all
designs.
• Determining sample size is a very important issue
because samples that are too large may waste time,
resources and money, while samples that are too small
may lead to inaccurate results.
3. • Sampling frame: It is a complete
enumeration of the sampling units in the study
population, which may be a list, directory,
map, arial configuration.
• Sampling unit: It may be an individual, a
household or a school.
Non-representativeness
of the study population
results in a lowered
accuracy
Small sample size
leads to low precision
4.
5.
6. KNOWLEDGE OF THE POPULATION PARAMETERS
By pilot surveys
By use of results of previous surveys
By intelligent guess
7. BASIS FOR DETERMINING THE SIZE OF
SAMPLE
Specification of a precision level.
Specification of level of confidence.
Power: The likelihood of rejecting the null hypothesis
when the null hypothesis is false.
8. MARGIN OF ERROR/SAMPLING ERROR
The margin of error is a statistic expressing the amount of random sampling error in a survey's
results
Larger the margin of error, the less confidence.
The difference between the sample statistic and the related population parameter is called the
sampling error.
Margin of error Sample size
10. SAMPLE SIZE
The choosing of sample size depends on non-
statistical and statistical considerations.
Nonstatistical: availability of manpower and
sampling frames.
Statistical considerations : Precision of the estimate
of prevalence and the expected prevalence of the
disease.
11. SAMPLE SIZE REQUIRED FOR ESTIMATING
POPULATION MEAN
• Suppose we want an interval that extends d units on either side of the estimator
d = (reliability coefficient) x (Standard error)
• If sampling is from a population sufficiently large size, the equation is:
d = z s
n
• When solved for n gives:
n = z2 s2
d2
width of the confidence interval (d)
level of confidence (z)
population variance (s2)
12. SAMPLE SIZE FOR POPULATION MEAN
A farm has 1000 young pigs with an initial weight of about 50 kgs.
They put them on a new diet for 3 weeks and want to know how
many pigs to sample so that they can estimate the average weight
gain. We want the results to be within 2 Kgs with 90% confidence
level.
We have no idea of σ or SD
90% confidence level =1.645
13. SAMPLE SIZE REQUIRED FOR ESTIMATING
PROPORTIONS
• Same as for population mean.
• Assuming random sampling and approximate normality
in the distribution of p, brings us to the formula for n if
sampling is with replacement, from a population
sufficiently large to warrant ignoring the finite
population correction :
Where q = 1 – p
n
z pq
=
2
2
d
14. WHAT SAMPLE SIZE FOR PROPORTION
• A researcher wants to estimate the true FMD immunization coverage in a village of cattle
population
• As per literature review , the immunization coverage should be somewhere around 80%
• Precision (absolute): we’d like the result to be within 4% of the true value
• Confidence level: conventional = 95% = 1 - α; therefore, α = 0.05 and z(1-a/2) = 1.96 =
value of the standard normal distribution corresponding to a significance level of 0.05
(1.96 for a 2-sided test at the 0.05 level)
• d = absolute precision = 0.04
• p = expected proportion in the population = 0.80
• z(1-a/2) = 1.96 = value of the standard normal distribution corresponding to a significance
level of a (1.96 for a 2-sided test at the 0.05 level)
z2 . p . (1-p)
n = -------------------------
d2
(1.96)2 (.80) (.20)
= ------------------------------
(0.04)2
= 384
15. DESCRIPTIVE STUDIES
• In general, these studies can only identify patterns or trends in
disease occurrence over time or in different geographical
locations, but cannot ascertain the causal agent or degree of
exposure.
• To calculate the required sample size in a descriptive study, we
need to know the level of precision, level of confidence or risk
and degree of variability.
16. FINITE POPULATION CORRECTION FACTOR
When population sizes are less than 10 times the
estimated sample size, it is possible to use a
finite population correction factor.
The finite population correction factor measures
how much extra precision we achieve when the
sample size becomes close to the population
size.
N is the size of the population and n is the size of
the sample.
If fpc is close to 1, then there is almost no effect.
When fpc is much smaller than 1, then sampling a
large fraction of the population is indeed having an effect
on precision.
17. INDEPENDENT CASE-CONTROL STUDIES
α = alpha, β = 1 – power, ψ = odds ratio
m– number of
control subjects per case subject, p1 – probability
of exposure in controls. p0 can be estimated as the
population prevalence
of exposure, nc is the continuity corrected sample
size and Zp is the standard normal deviate for
probability p
24. SAMPLE SIZE CALCULATION FOR TESTING A
HYPOTHESIS (CLINICAL TRIALS OR CLINICAL
INTERVENTIONAL STUDIES)
25. RESOURCE EQUATION METHOD
It depends on the size of the whole experiment and the
number of treatment groups, not the individual group
sizes.
If a value of E is less than 10 then more animal should
be included and if it is more than 20 then sample size
should be decreased.
The resource equation method is useful when there is
no previous estimate of the standard deviation.
26. RESOURCE EQUATION METHOD EXAMPLE
For example, if a factorial experiment is planned with both sexes
and three dose levels then there will be six treatment groups. If it
is proposed that there should be eight animals in each treatment
group (as is common), there will be 48 animals in total and E = 48
– 6 = 42. This experiment is unnecessarily large.
Redesigning it with four animals per group, E = 24 – 6 = 18,
which is within the suggested limits of 10 – 20.
A power analysis should be used in preference to the resource
equation method wherever possible.
Unfortunately, power analysis is not so easy to use when there are
more than two groups because it is more difficult (but not
impossible) to specify the effect size of interest.
27. WHAT FACTORS AFFECT THE POWER OF A
TEST?
To increase the power of your test, you may do any of the
following:
1. Increase the effect size (the difference between the null
and alternative values) to be detected
2. Increase the sample size(s)
3. Decrease the variability in the sample(s)
4. Increase the significance level (alpha) of the test
Editor's Notes
For example, if it is a study in a village (with a population of say, 500) and the objective is to determine the prevalence of some unusual events or factors among the villagers, the selection unit ideally should be individuals residing in the village. In this case, the list of the names of all inhabitants will be the
reference sampling frame. But there are situations where the sampling frame could not be worked out so easily. Taking example of a similar study covering a state, it is almost impossible to draw a list of all inhabitants residing in the state. So here, simple random sampling could not be appropriate; one has to
make use of a more simple approach
1.Specification of a precision level: A decision on the tolerable limits of errors is made, i.e. the researcher makes a statement that it does not
matter if his sample estimate does not differ from true population value by a certain amount. For example, suppose a Paediatrician plans a study to
estimate the population of malnourished children in a village and suppose that the true proportion of malnourished children is 10%. He is satisfied
if his estimate does not differ from true value of 10% by 5% i.e. he is okay with the result of his study if his estimate is within 9.5% to 10.5% (i.e. 10±0.5%).
2. Specification of level of confidence: This is the degree of uncertainty or probability that a sample value lies outside a stated limits (i.e. 10 ± 0.5) %.
Suppose this measure is 5%, the investigator has to accept the unlikely situation of 1 in 20 cases that the sample result falls aside the desired limit;
and if it is 1%, then the chance that the sample result falls outside the desired limits in 1 in 400. However, by convention, the mostly used confidence levels are 5% and 1%; but nothing stops the investigator from tolerating 10%, 2.5%
etc.ond level
When the sample size is 50, it does not matter much whether the population is 10 thousand or 10 million.
When the sample size is four thousand, then we have about 23% more precision with a population of ten thousand than we would for a population of ten
million.