Sampling design and procedures

Sampling Design and
Procedures
Prabesh Ghimire
Prabesh Ghimire, MPH 1

Census and Sample

Census
• Quantitative research method, in which all the members of the
population are enumerated.
• Implies complete enumeration of the study participants
• It is presumed that in such inquiry, when all items are covered,
no elements of chance is left and highest accuracy is obtained.

Advantages of Census
• It provides basis for overall socio-economic planning of the
country.
• Provides complete information about the population
• More reliable and accurate information
• Covers wide range of the study

Demerits of Census
• Resource intensive (time, human resources, financial
resources)
• Possibilities of error are higher in census investigation

Sampling
• Statistical procedure of drawing
a sample from a population
• Based on belief that drawn
sample will exhibit the relevant
characteristics of the whole
population

Applications of Sampling in Public Health
• Random sampling is the basic requirement for establishing
causes-effect relationship
• Good sampling design can provide more reliable estimates.
• Use of appropriate sampling methods help generalize the
findings of health research to the entire population of interest.
• Sampling is useful to assure both internal and external validity
of public health research.

Significance of Sampling
• Necessity: Sometimes it’s simply not possible to study the whole
population due to its size or inaccessibility.
• Practicality: It’s easier and more efficient to collect data from a
sample.
• Cost-effectiveness: There are fewer participant, laboratory,
equipment, and researcher costs involved.
• Manageability: Storing and running statistical analyses on smaller
datasets is easier and reliable.

Target
Population
Study
Population
Sample

Target/ Reference Population
• The target population is that population to which it is intended to
apply the results.
• Population to which the researchers are interested in
generalizing the study findings.
• Example:
• All mothers of Under-5 Children,
• All pregnant teens,
• All people living with HIV (PLHIV)

Study Population
• It is the accessible population that researchers draw their
sample from.
• This population is a subset of the target population and is also
known as the accessible population.
• A defined population from which a sample has been selected.
• Mothers of U-5 Children of XYZ municipality

Sample
• Specific group that you will collect data from.
• The size of the sample is always less than the total size of the
population.

Sampling Frame
• A sampling frame is a list of all the items (sampling units) in the
population from which the sample is drawn
• It’s a complete list of everyone or everything that researchers
want to study.
• The difference between a population and a sampling frame is
that the population is general and the frame is specific.
• Frame is needed so that everyone in the population is identified
so that they will have an equal opportunity for selection in the
study.

Sampling Techniques

Simple Random Sampling
• Sampling technique where every item in
the population has an even chance and
likelihood of being selected in the sample.
• selection of items entirely depends on
luck or probability, and therefore this
sampling technique is also sometimes
known as a method of chances.
• The sample size in this sampling method
should ideally be more than a few
hundred so that simple random sampling
can be applied appropriately.

Techniques of simple random sampling
• Lottery
• Use of random number table
• Computer generated random number

Stratified Random Sampling
• For a stratified random sample, the population is divided into
groups or strata.
• To stratify means to classify or to separate people into groups
according to some characteristics, such as
• position, rank, income, education, sex, or ethnic background
• The population is divided to make the elements within a
group/strata as homogenous as possible.

Stratified Random Sampling
Two types
• Proportionate
• the sample size from each stratum is dependent on that size of the
stratum.
• Therefore largest strata are sampled more heavily as they make larger
percentage of the target population.
• Disproportionate
• In disproportionate sampling, the sample selection from each stratum is
independent of it’s size.

Merits
• Stratified random samples are generally more accurate in
representing the population than are simple random samples.
• Suitable for large and heterogenous population
Demerits
• Because participants are to be chosen randomly from each
stratum, a complete list of the population within each stratum
must be constructed.

Systematic Random Sampling
• In systematic sampling, only the first sample unit is selected at
random and the remaining units are automatically selected at
the fixed equal interval guiding by some rule.
• Suppose N units of population are numbered from 1 to N in
some order.
• Then, the sample interval K = N/n is determined, where n is the
desired sample size.
• The first item in between 1&K is selected at random and every
other elements are automatically selected in the interval of K.

Systematic Random Sampling
Merits
• This methods is simple and easy.
• The selected samples are evenly spread in the population and
therefore minimize chances of clustered selection of subjects
• Sampling frame is not always required
Limitations
• The method may introduce bias when elements are not
arranged in random order.

Systematic Sampling Methods
Interval Sampling
• Select every Nth case at the health facility.
• For example every 5th, 7th, or 10th patient that meets the
inclusion criteria would be selected.
• Some foreknowledge of the volume of cases at the site is
required so that appropriate sampling interval can be selected.
Source: WHO interim global surveillance standards for influenza

Systematic Sampling Methods
Alternate Day Sampling
• Select all patients meeting the inclusion criteria presenting to a
facility on a certain day or days of the week,
• This can reduce the logistical challenges of surveillance by
confining laboratory specimen and data collection efforts to a
single day.
• In order to remove the bias of the week, the day on which cases
are selected should be systematically alternated from week to
week.
Source: WHO interim global surveillance standards for influenza

Cluster Sampling
• Cluster sampling is a sampling plan used when mutually
homogeneous yet internally heterogeneous groupings are
evident in a statistical population.
• In this sampling plan, the total population is divided into these
groups (known as clusters) and a simple random sample of the
groups is selected.
• The elements in each cluster are then sampled.

Cluster Sampling
• If all elements in each sampled cluster are sampled, then this is
referred to as a "one-stage" cluster sampling plan.
• If a simple random subsample of elements is selected within
each of these groups, this is referred to as a "two-stage" cluster
sampling plan.
• A common motivation for cluster sampling is to reduce the
research costs given the desired accuracy

Cluster elements
• The population within a cluster should ideally be as
heterogeneous as possible, but there should be homogeneity
between clusters.
• Each cluster should be a small-scale representation of the total
population.

Cluster Random Sampling
Merits
• Can be cheaper than other sampling plans – e.g. fewer travel expenses,
administration costs.
• Feasibility: This sampling plan takes large populations into account. Since
these groups are so large, deploying any other sampling plan would be
very costly
• Does not require sampling frame
Limitations
• Complexity
• Design effect- sampling error
• Results may not be generalizable

Probability Proportionate to Size
• The probability of selecting a cluster is proportional to its size,
so that a large cluster has a greater probability of selection than
a small cluster.
• The advantage here is that when clusters are selected with
probability proportionate to size, the same number of interviews
should be carried out in each sampled cluster so that each unit
sampled has the same probability of selection.

Exercise for PPS
Hypothetical Data for sampling using PPS

Multi-Stage Sampling
• Multi-stage sampling (also known as multi-stage cluster
sampling) is a more complex form of cluster sampling which
contains more that two stages in sample selection.
• Large clusters of population are divided into smaller clusters in
several stages in order to make primary data collection more
manageable.

Example Multi-Stage Sampling
• Choose 3 provinces in Nepal using SRS (or other probability
sampling)
• Choose 3 district in each province using SRS (or other
probability methods)
• Choose 3 municipalities from each district using SRS (or other
probability methods)
• Choose 100 households from each municipality using SRS or
Systematic random sampling.
• This will result in 2700 households to be included in the sample
group

Multi-Stage Sampling
Merits
• Cost and speed that the survey can be done in
• Convenience of finding the survey sample, particularly in large
areas
• Sample frame required only for the selected clusters
Limitations
• May not always acquire a representative sample
• The presence of group-level information is required

Non-Probability Sampling

Convenience Sampling
• Sometimes known as grab or opportunity sampling or
accidental or haphazard sampling.
• A type of non-probability sampling which involves the sample
being drawn from that part of the population which is close to
hand. That is, readily available and convenient.
• The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample
because it would not be representative enough.
37
Prabesh Ghimire, MPH

Convenience Sampling
• For example, if the interviewer was to conduct a survey at a
health facility.
• The clients that he/she could interview would be limited to those
given there at that given time.
• This type of sampling is most useful for pilot testing..
38

Judgmental sampling or Purposive
sampling
40
• Also called expert sampling
• The researcher chooses the sample based on who they think
would be appropriate for the study.
• This is used primarily when there is a limited number of people
that have expertise in the area being researched.
• Usually done for Key Informant Interviews
• Interview to understand the decision maker's perception on
current health policies might purposively require senior officials
of MOHP.

Purposive sampling example
• If you want to know more about the opinions and experiences of
disabled adolescents in your community,
• You purposefully select a number of adolescents with different support
needs in order to gather a varied range of data on their disability
experiences.

Quota Sampling
• In quota sampling the selection of the sample is non-random.
• The population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling.
• Then judgment is used to select participants or units from each
segment based on a specified proportion.
• It is this second step which makes the technique one of non-
probability sampling.
• The problem is that these samples may be biased because not
everyone gets a chance of selection.
43

300 sample required
180 male students 120 female students
Selection by
convenience/
judgement
Selection by
convenience/
judgement
1200 male students 800 female students
60% 40%
60% 40%

Snowball/ Chain Referral Sampling
• Chain-referral sampling
• In this technique, existing participants provide referrals to recruit
other participants required for a research study.
• It is used when
• potential participants have traits that are hard to find
• It is tough to choose the participants to assemble them as samples for
research
• Useful in sensitive investigations/studies

Snowball/ Chain Referral Sampling
• Two key steps
• Identify potential participants in the population. Often, only one or two
participants can be found initially.
• Ask those participants to recruit other people (and then ask those
people to recruit.
• Types
• Linear snowball sampling
• Exponential snowball sampling
• non-discriminative: multiple referrals; and each referred person is interviewed
• Discriminative: multiple referral; only one among referred is interviewed

Applications of Snowball Sampling
• Useful for investigating patients with rare disease
• Identifying drug abusers, criminals

Source: https://www.linkedin.com/pulse/understanding-population-sampling-
approach-from-testing-anup-kale/?articleId=6658186365529886720

Snowball Sampling
• Merits
• Needs little planning and fewer workforce
• The chain referral process allows the researcher to reach populations
that are difficult to sample
• Demerits
• Researcher has a little control over the sampling method
• Representativeness of the sample is not guaranteed. Researcher has
no idea of the true distribution of the sample
• Sometimes recruitment may be affected if the participants fails to
recruit/identify other participants

Voluntary Response Sampling
• Similar to a convenience sample, a voluntary response sample
is mainly based on ease of access.
• Instead of the researcher choosing participants and directly
contacting them, people volunteer themselves (e.g. by
responding to a public online survey).
• Voluntary response samples are always at least somewhat
biased, as some people will inherently be more likely to
volunteer than others.

Other sampling methods
Consecutive Sampling
• Total enumerative sampling where every participants meeting
the inclusion criteria is selected until the required sample size is
achieved.
• Typically better than conveniences sampling in controlling
sampling bias.
• Care needs to be taken with consecutive sampling

Selection of Sampling Design
(Choosing the best sampling method)

Sampling frame availability
• We need to check for availability of a sampling frame.
• If sampling frame is available
• Use Simple random or a stratified random sampling.
• If sampling frame is not available, we could still use other
random sampling methods
• for instance, systematic or cluster sampling
• Snowball sampling (non-random) may also be used where
sampling frame is not present.

Population Distribution
• Check if our target population is widely varied in its baseline
characteristics.
• For example, a population with large ethnic subgroups could
best be studied using a stratified sampling method.
• Homogenous population may be studied using simple random
method.
• If the population is geographically dispersed, use cluster
sampling

Generalizability
• To increase generalizability: select random sampling methods
• In Systematic Random sampling, generalizability may decrease
if baseline characteristics repeat across every nth participant
• In cluster design, if clusters are not representative, results may
not be generalizable

Research Objectiveness
• A refined research question and goal would help us define our
population of interest.
• If our calculated sample size is small then it would be easier to
get a random sample.
• If, however, the sample size is large, then we should check if
our budget and resources can handle a random sampling
method.

Determination of Sample Size

For Cross-Sectional Surveys
• Cross sectional studies or cross sectional survey are done to
• estimate a population parameter like prevalence of some disease in a
community or
• finding the average value of some quantitative variable in a population.
• Sample size formula for categorical and quantitative variable
are different.

For Proportion (Qualitative Variable)
• Suppose a researcher wants to know proportion of children who are
stunted in a population, then this formula should be used as
proportion is a qualitative variable.
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 =
𝑍 1−𝛼/2
2
× 𝑝 1 − 𝑝
𝑑2
Where, Z(1-/2) is standard normal variate (at 5% Type I error, it is 1.96)
p = expected proportion in population based on previous studies or
pilot studies
d = absolute error or precision (has to be decided by researcher)

If the population is finite
• If the population is finite, we use
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 (𝑓𝑖𝑛𝑖𝑡𝑒) =
𝑛
1 + (
𝑛 − 1
𝑁
)
Where,
N= Finite population size
n= sample size calculated using infinite population size formula

Exercise on Sample Size Calculation
• Suppose you are planning to conduct a household survey to
estimate the prevalence of stunting among under-5 children in
Kageshwori Manohar Municipality. Previous study had shown
that the stunting prevalence in Bagmati province was 22.6%.
Calculate the desired sample size for your study:
i) If the number of U-5 children is unknown
ii) If the number of U-5 children is known (i.e. 9024)
iii) For two-stage cluster sampling.

Exercise on Sample Size Calculation
• Suppose you are planning to conduct a household survey to
estimate the prevalence of anemia among women of
reproductive age in Kathmandu district. In previous studies, the
anemia prevalence in WRA varied as 29.0%, 40.8% and 58%.
Calculate the appropriate sample size for your study.

For quantitative variable
• Suppose the same researcher is interested in knowing average systolic
blood pressure of children of the same city.
• Below mentioned formula should be used as blood pressure is a
quantitative variable
𝑍(1−𝛼/2)
2
× 𝑆𝐷2
𝑑2
Where, Z(1-/2) is standard normal variate as mentioned above
SD = Standard deviation of variable. Value of standard deviation can be
taken from previously done study or through pilot study.
d = absolute error or precision (has to be decided by researcher)

For Case-Control Studies
Formula for sample size calculation for comparison between two groups
when endpoint is quantitative data
(𝑟 + 1)
𝑟
×
𝑆𝐷2
(𝑍𝛼/2 × 𝑍𝛽)2
𝑑2
• Where,
• SD = Standard deviation of variable. (from previously done study or
through pilot study.)
• Z/2 is standard normal variate
• Zß is power of study (0.842 at 80% power, 1.28 for 90% power)
• d is the effect size (difference between mean values)
• r is the ratio of control to cases

For Case-Control Studies
Formula for sample size calculation for comparison between two groups
when endpoint is quanlitative data
𝑟 + 1
𝑟
×
(𝑍𝛼/2 × 𝑍𝛽)2 𝑝(1 − 𝑝)
(𝑝1 − 𝑝2)
2
• Where,
• p1- p2 Effect size or the difference in proportion of events in two
groups
• p1= proportion in cases
• p2= proportion in controls
• p = pooled prevalence
• 𝑍𝛽= Standard normal variate for power

For Intervention Studies
Formula for sample size calculation for comparison between two
groups when endpoint is quantitative data
• When the variable is quantitative data like blood pressure, weight, height,
etc., then the following formula can be used for calculation of sample size
for comparison between two groups.
2 𝑆𝐷2 (𝑍𝛼/2 × 𝑍𝛽)2
𝑑2
Where,
• SD = Standard deviation of variable. (from previously done study or
through pilot study.)
• Z(1-/2) is standard normal variate
• Zß is power of study (0.842 at 80% power)
• d = effect size (difference between mean values)

For Intervention Studies
Formula for sample size calculation for comparison between two
groups when endpoint is qualitative data
• When the endpoint of a clinical intervention study is qualitative, then
the following formula can be used for sample size calculation for
comparison between two groups.
2 (𝑍𝛼/2 × 𝑍𝛽)2 𝑝(1 − 𝑝)
(𝑝1 − 𝑝2)
2
Where,
• p1- p2 is the difference in proportion of events in two groups
• p = pooled prevalence

Practical tips
Use digital technology
• Epi info stat calc
• Gpower
(www.gpower.hhu.de)
• N4 studies- for android/
ios mobile
• OpenEpi
(www.openepi.com)

References
• Banerjee, A., & Chaudhury, S. (2010). Statistics without tears:
Populations and samples. Industrial psychiatry journal, 19(1),
60–65. https://doi.org/10.4103/0972-6748.77642
• Charan, J., & Biswas, T. (2013). How to calculate sample size
for different study designs in medical research?. Indian journal
of psychological medicine, 35(2), 121–126. doi:10.4103/0253-
7176.116232

Sampling design and procedures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sampling design and procedures

Similar to Sampling design and procedures (20)

More from Prabesh Ghimire

More from Prabesh Ghimire (20)

Recently uploaded

Recently uploaded (19)

Sampling design and procedures