This document discusses different sampling methods used in research. It begins by defining key sampling terms like population, sample, sampling unit, and sampling frame. It then describes the main types of probability sampling methods including simple random sampling, systematic random sampling, stratified random sampling, cluster sampling, and multi-stage sampling. The document also discusses non-probability sampling methods and notes that while quicker and cheaper, they do not allow for generalization. Overall, the document provides an overview of different sampling strategies, their advantages and disadvantages, and examples of how each might be implemented in research.
2. 2
I) Introduction
♣ Sampling involves the selection of a number of a study
units from a defined population.
♣ The population is too large for us to consider collecting i
nformation from all its members.
♣ If the whole population is taken there is no need of statisti
cal inference.
♣ Usually, a representative subgroup of the population (sa
mple) is included in the investigation.
♣ A representative sample has all the important characteris
tics of the population from which it is drawn.
3. • A sample is a collection of individuals sele
cted from a larger population.
• Researchers are not interested in the sam
ple itself, but in what can be learned from t
he sample—and how this information can
be applied to the entire population.
5. • Therefore, it is essential that a sample sh
ould be correctly defined and organized.
• If the wrong questions are posed to the w
rong people, reliable information will not
be received and lead to a wrong conclusi
on when applied to the entire population.
6. Advantages of sampling:
• Feasibility: Sampling may be the only feasibl
e method of collecting information.
• Reduced cost: Sampling reduces demands
on resource such as finance, personnel, and
material.
• Greater accuracy: Sampling may lead to bett
er accuracy of collecting data
• Sampling error: Precise allowance can be m
ade for sampling error
• Greater speed: Data can be collected and su
mmarized more quickly
7. Disadvantages of sampling:
• There is always a sampling error.
• Sampling may create a feeling of discri
mination within the population.
8. 8
Introduction
If we have to draw a sample, we will be confronted
with the following questions:
a) What is the group of people ( population) from which we
want to draw a sample?
b) How many people do we need in our sample?
c) How will these people be selected?
Apart from persons, a population may consist of mo
squitoes, villages, institutions, etc.
9. 9
Introduction
Definitions:
• Reference population (also called sourc
e population or target population) :
the population of interest, to which the investigato
rs would like to generalize the results of the study
, and from which a representative sample is to be
drawn.
• Study or sample population –
the population included in the sample
10. 10
Introduction
♣ Sampling unit - the unit of selection in the sampling process
♣ Study unit - the unit on which information is collected.
- the sampling unit is not necessarily the same as the study unit.
- if the objective is to determine the availability of latrine, then the study un
it would be the household; if the objective is to determine the prevalence
of trachoma, then the study unit would be the individual.
♣ Sampling frame - the list of all the units in the reference
population, from which a sample is to be picked.
♣ Sampling fraction (Sampling interval) - the ratio of the number
of units in the sample to the number of units in the reference population (
n/N).
11. • Reference population (or target population): t
he population of interest to whom the resear
chers would like to make generalizations.
• Sampling population: the subset of the target
population from which a sample will be draw
n.
• Study population: the actual group in which t
he study is conducted = Sample
• Study unit: the units on which information wi
ll be collected: persons, housing units, etc.
12.
13. Researchers are interested to know about factors as
sociated with ART use among HIV/AIDS patients att
ending certain hospitals in a given Region
Target population = All ART
patients in the Region
Sampling population = All
ART patients in, e.g. 3,
hospitals in the Region
Sample
14. Sampling Methods
Two broad divisions:
A. Probability sampling methods
B. Non-probability sampling methods
15. A. Probability sampling
• Involves random selection of a sample
• Every sampling unit has a known and non-
zero probability of selection into the sampl
e.
• Involves the selection of a sample from a p
opulation, based on chance.
• generalization is possible (from sample to
population)
16. • There are several different ways in which a
probability sample can be selected.
• The method chosen depends on a number
of factors, such as
– the available sampling frame,
– how spread out the population is,
– how costly it is to survey members of the popul
ation
17. Most common probability
sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multi-stage sampling
6. Sampling with probability proportional to
size
18. 1. Simple random sampling
• The required number of individuals are s
elected at random from the sampling fra
me, a list or a database of all individuals i
n the population
• Each member of a population has an equ
al chance of being included in the sample.
19. • To use a SRS method:
– Make a numbered list of all the units in the po
pulation
– Each unit should be numbered from 1 to N (w
here N is the size of the population)
– Select the required number.
20. • The randomness of the sample is ens
ured by:
• Use of “lottery’ methods
• Table of random numbers
• Computer programs
21. Random number table
• It is a table of random numbers construc
ted by a process that
1. In any position in the table, each of the nu
mbers 0 through 9 has a probability
1/10 of occurring.
2. The occurrence of any number in one part
of the table is independent of the occurrenc
e of any number in any other part of the tabl
e.
23. Example
• Suppose your school has 500 students
and you need to conduct a short survey
on the quality of the food served in the c
afeteria.
• You decide that a sample of 10 students
should be sufficient for your purposes.
• In order to get your sample, you assign
a number from 1 to 500 to each student
in your school.
24. • To select the sample, you use a table of ra
ndomly generated numbers.
• Pick a starting point in the table (a row and
column number) and look at the random n
umbers that appear there.
• In this case, since the data run into three d
igits, the random numbers would need to c
ontain three digits as well.
25. • Ignore all random numbers after 500 because
they do not correspond to any of the students
in the school.
• Remember that the sample is without replace
ment, so if a number recurs, skip over it and u
se the next random number.
• The first 10 different numbers between 001 an
d 500 make up your sample.
26. • SRS has certain limitations:
– Requires a sampling frame.
– Difficult if the reference population is disperse
d.
– Minority subgroups of interest may not be sele
cted.
27. 2. Systematic random samplin
g
• Sometimes called interval sampling
• Selection of individuals from the sampling
frame systematically rather than randomly
• Individuals are taken at regular intervals d
own the list
• The starting point is chosen at random
28. • Important if the reference population is
arranged in some order:
– Order of registration of patients
– Numerical number of house numbers
– Student’s registration books
• Taking individuals at fixed intervals (ev
ery kth) based on the sampling fraction,
eg. if the sample includes 20%, then ev
ery fifth.
29. Steps in systematic random sa
mpling
1. Number the units on your frame from 1 to N (w
here N is the total population size).
2. Determine the sampling interval (K) by dividing
the number of units in the population by the de
sired sample size.
30. 3. Select a number between one and K at rando
m. This number is called the random start and
would be the first number included in your sam
ple.
4. Select every Kth unit after that first number
Note: Systematic sampling should not be used
when a cyclic repetition is inherent in the sam
pling frame.
31. Example
• To select a sample of 100 from a populatio
n of 400, you would need a sampling interv
al of 400 ÷ 100 = 4.
• Therefore, K = 4.
• You will need to select one unit out of ever
y four units to end up with a total of 100 uni
ts in your sample.
• Select a number between 1 and 4 from a t
able of random numbers.
32. • If you choose 3, the third unit on your fram
e would be the first unit included in your sa
mple;
• The sample might consist of the following
units to make up a sample of 100: 3 (the r
andom start), 7, 11, 15, 19...395, 399 (up t
o N, which is 400 in this case).
33. • Using the above example, you can see t
hat with a systematic sample approach t
here are only four possible samples that
can be selected, corresponding to the fo
ur possible random starts:
A. 1, 5, 9, 13...393, 397
B. 2, 6, 10, 14...394, 398
C. 3, 7, 11, 15...395, 399
D. 4, 8, 12, 16...396, 400
34. • Each member of the population belongs to only
one of the four samples and each sample has th
e same chance of being selected.
• The main difference with SRS, any combination
of 100 units would have a chance of making up t
he sample, while with systematic sampling, ther
e are only four possible samples.
35. Merits
Systematic sampling is usually less time consu
ming and easier to perform than SRS.
It provides a good approximation to SRS.
Unlike SRS, systematic sampling can be condu
cted without a sampling frame
Eg., In patients attending a health center, where it is
not possible to predict in advance who will be attendi
ng.
36. Demerits
•If there is any sort of cyclic pattern in the o
rdering of the subjects which coincides with
the sampling interval, the sample will not b
e representative of the population.
Example:
•list of married couples arranged with men'
s names alternatively with the women's na
mes will result in a sample of all men or wo
men.
37. 3. Stratified random sampling
• It is done when the population is known to be have
heterogeneity with regard to some factors and thos
e factors are used for stratification
• The population is divided into homogeneous, mutua
lly exclusive groups called strata, and
• A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age, se
x, province of residence, income, etc.).
38. Why do we need to create strata?
• It can make the sampling strategy more effi
cient.
• A larger sample is required to get a more ac
curate estimation if a characteristic varies gr
eatly from one unit to the other.
• For example, if every person in a population
had the same salary, then a sample of one i
ndividual would be enough to get a precise
estimate of the average salary.
39. • Stratified sampling ensures an adequate sa
mple size for sub-groups in the population
of interest.
• When a population is stratified, each stratu
m becomes an independent population and
you will need to decide the sample size for
each stratum.
40. • Equal allocation:
– Allocate equal sample size to each stratum
• Proportionate allocation:
– nj is sample size of the jth stratum
– Nj is population size of the jth stratum
– n = n1 + n2 + ...+ nk is the total sample size
– N = N1 + N2 + ...+ Nk is the total population
size
n
n
N
N
j j
42. Merit
• The representativeness of the sample is improve
d.
• That is, adequate representation of minority sub
groups of interest can be ensured
Demerit
• Sampling frame for the entire population has to
be prepared separately for each stratum.
43. 43
4-Cluster sampling
Is the selection of groups of study units (clusters) instead of
the selection of study units individually
The sampling unit is a cluster, and the sampling frame is a li
st of these clusters.
Procedure -
-The reference population (homogeneous) is divided into
clusters. These clusters are often geographic units
(eg. districts, villages, etc.).
- a sample of such clusters is selected
- all the units in the selected clusters are studied.
It is preferable to select a large number of small clusters rat
her than a small number of large clusters.
44. Example
• In a school based study, we assume student
s of the same school are homogeneous.
• We can select randomly sections and includ
e all students of the selected sections only
45. Advantages
• Cost reduction
• It creates 'pockets' of sampled units instea
d of spreading the sample over the whole t
erritory.
• Sometimes a list of all units in the populati
on is not available, while a list of all cluster
s is either available or easy to create.
46. Disadvantages
• sampling error is usually higher than for a simple
random sample of the same size
• It is based on the assumption that the characteri
stic to be studied is uniformly distributed through
out the reference population, which may not alw
ays be the case.
• do not have total control over the final sample si
ze.
47. 5. Multi-stage sampling
This method is appropriate when the reference populatio
n is large and widely scattered
selection is done in stages until the final sampling unit
(eg., households or persons) are arrived at.
The primary sampling unit (PSU) is the sampling unit (us
ually large size) in the first sampling stage.
The secondary sampling unit (SSU) is the sampling unit
in the second sampling stage.
etc.
49. Example - The PSUs could be kebeles and the
SSUs could be households.
Merit - Cuts the cost of preparing sampling frame
Demerit - Sampling error is increased compared w
ith a simple random sample.
Multistage sampling gives less precise estim
ates than simple random sampling for the sa
me sample size, but the reduction in cost usu
ally far outweighs this, and allows for a large
r sample size.
50. B. Non-probability sampling
• Every item has an unknown chance of being sele
cted.
♣ no random selection (unrepresentative of the give
n population)
♣ inappropriate if the aim is to measure variables an
d generalize findings obtained from a sample to th
e population.
51. • Researchers are reluctant to use these methods b
ecause there is no way to measure the precision of
the resulting sample.
• can be useful when descriptive comments about th
e sample itself are desired.
• they are quick, inexpensive and convenient.
52. When to use non-probability sa
mpling
• When a group that represents the target population alre
ady exists
• When it is overly difficult or impossible to obtain the list
of names for sampling (e.g. homeless)
• When research is exploratory in nature and all of the ca
ses of interest may not be identified ahead of time
• For rare populations
• For very expensive samples E.g. clinical samples
• when it is unfeasible or impractical to conduct probability s
ampling.
53. The most common types of n
on-probability sampling
1. Convenience or haphazard sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique
54. 1. Convenience or haphazard sampli
ng
• Convenience sampling is sometimes ref
erred to as haphazard or accidental sampli
ng.
• It is not normally representative of the targ
et population because sample units are on
ly selected if they can be accessed easily
and conveniently.
55. • the method is easy to use,
• advantage is greatly offset by the presenc
e of bias.
• it can deliver accurate results when the p
opulation is homogeneous.
56. • For example, a scientist could use this method t
o determine whether a lake is polluted or not.
• Assuming that the lake water is well-mixed, any
sample would yield similar information.
• A scientist could safely draw water anywhere on
the lake without bothering about whether or not t
he sample is representative
57. 2. Volunteer sampling
• sampling occurs when people volunteer to be in
volved in the study.
– Example:- In psychological experiments or pharmace
utical trials (drug testing),
• In these instances, the sample is taken from a gr
oup of volunteers.
• Sometimes, the researcher offers payment to att
ract respondents.
• In exchange, the volunteers accept the possibilit
y of a lengthy, demanding or sometimes unpleas
ant process.
58. • Sampling voluntary participants as oppo
sed to the general population may introd
uce strong biases.
• Often in opinion polling, only the people
who care strongly enough about the subj
ect tend to respond.
• The silent majority does not typically re
spond, resulting in large selection bias.
59. 3. Judgment sampling
• This approach is used when a sample is taken
based on certain judgments about the overall p
opulation.
• The underlying assumption is that the investigat
or will select units that are characteristic of the
population.
• The critical issue here is objectivity: how much
can judgment be relied upon to arrive at a typic
al sample?
60. • Judgment sampling is subject to the rese
archer's biases and is perhaps even more
biased than haphazard sampling.
• Since any preconceptions the researcher
may have are reflected in the sample, larg
e biases can be introduced if these precon
ceptions are inaccurate.
61. • Often use this method in
– exploratory studies like pre-testing of question
naires and focus groups.
– in laboratory settings where the choice of exp
erimental subjects (i.e., animal, human) reflect
s the investigator's pre-existing beliefs about t
he population.
• One advantage of judgment sampling is t
he reduced cost and time involved in acq
uiring the sample.
62. 4. Quota sampling
• This is one of the most common forms of n
on-probability sampling.
• Sampling is done until a specific number of
units (quotas) for various sub-populations h
ave been selected.
63. • It does not meet the basic requirement of r
andomness.
• Some units may have no chance of selecti
on or the chance of selection may be unkn
own.
• Therefore, the sample may be biased.
64. • Quota sampling is generally less expensive t
han random sampling.
• It is also easy to administer,
65. • Quota sampling is an effective sampling
method when information is urgently requir
ed and can be conducted without sampling
frames.
• In many cases where the population has n
o suitable frame, quota sampling may be t
he only appropriate sampling method.
66. 5. Snowball sampling
• A technique for selecting a research sampl
e where existing study subjects recruit futu
re subjects from among their acquaintance
s.
• Thus the sample group appears to grow lik
e a rolling snowball.
67. • Often used in hidden populations which are diffic
ult for researchers to access;
– example populations would be drug users or commerc
ial sex workers.
• Are subject to numerous biases.
– For example, people who have many friends are more
likely to be recruited into the sample.
68. 68
Errors in sampling
• When we take a sample, our results will not be exactly equal to the cor
rect results for the whole population. That is, our results will be subject
ed to errors.
A) Sampling Error (Random Error)
A sample is a subset of a population. Because of this property of samp
les, results obtained from them cannot reflect the full range of variation
found in the larger group (population).
This type of error, arising from the sampling process itself, is called sa
mpling error, which is a form of random error.
Sampling error can be minimized by increasing the size of the sample.
69. 69
Errors in sampling
B) Non-Sampling Error (Bias)
Systematic error in the design or conduct of a sampling procedure whic
h results in distortion of the sample , so that it is no longer representative
of the reference population.
We can eliminate or reduce the non-sampling error (bias) by careful desi
gn of the sampling procedure and not by increasing the sample size.
Example :
If you take male students only from a student dormitory in Ethiopia in ord
er to determine the proportion of smokers, you would result in an overest
imate, since females are less likely to smoke.
Increasing the number of male students would not remove the bias.
70. 70
Errors in sampling
There are several possible sources of bias in sampling
( eg., accessibility bias, volunteer bias, etc.)
The best known source of bias is non-response. It is the failure to obt
ain information on some of the subjects included in the sample to be s
tudied.
Non-response results in significant bias when the following two conditi
ons are both fulfilled.
1. When non-respondents constitute a significant proportion of the sa
mple (about 15% or more)
2. When non-respondents differ significantly from respondents.
71. 71
Errors in sampling
There are several ways to deal with this problem and reduce the
possibility of bias.
a) Data collection tools (questionnaire) have to be pre-tested.
b) If non-response is due to absence of the subjects, repeated attempts shou
ld be considered to contact study subjects who were absent at the time of
the initial visit.
c) To include additional people in the sample, so that non-respondents who
were absent during data collection can be replaced (make sure that their
absence is not related to the topic being studied).
N.B. :
The number of non-responses should be documented according to
type, so as to facilitate an assessment of the extent of bias introduced
by non-response