3. Introduction
Definition
Basic Concepts in Samples and Sampling
Determination of sample size
Reasons for Taking a Sample
Limitations of Sampling
Principles of Sampling
Classification
Description of individual type
Sampling error
References
4. Sampling is the process of selecting a
small number of elements from a
larger defined target group of
elements such that the information
gathered from the small group will
allow judgments to be made about the
larger groups.
5. Population: the entire group under study as
defined by research objectives. Sometimes called
the “universe.”
Researchers define populations in specific terms
such as heads of households, individual person
types, families, types of retail outlets, etc.
Some Bases for Defining Population:
Geographic Area
Demographics
Lifestyle
Awareness
6. Sample: a subset of the population that should
represent the entire group
Sample unit: the basic level of investigation, the
constituents of population which are individuals
or households
• Sampled from population and cannot be further
subdivided for the purpose of sampling at a time
• The research objective should define sample unit
Census: an accounting of the complete
population
7. Sampling frame - for adopting any
sampling procedure it is essential to have a
list (a master list of the population)
identifying each sampling unit by a number,
such a list or map is called sampling frame
e.g.
a list of voters
a list of householders
a list of villages in a district
a list of farmers
8. Sampling error: any error that occurs in a
survey due to use of sample instead of whole
population.
Sample frame error (SFE): the degree to which
the sample frame fails to account for all of the
defined units in the population (e.g a telephone
book listing does not contain unlisted numbers)
leading to sampling frame error.
9. Kish posited four basic problems of sampling
frames:
Missing elements: Some members of the
population are not included in the frame.
Foreign elements: Non-members of the
population are included in the frame.
Duplicate entries: A member of the population is
surveyed more than once.
Groups or clusters: The frame lists clusters
instead of individuals.
10. Calculating sample frame error (SFE):
Subtract the number of items on the sampling list
from the total number of items in the population.
Take this number and divide it by the total
population.
Multiply this decimal by 100 to convert to percent
(SFE must be expressed in %) i.e.
(Total number of items in the population) – (number of items on the sampling list) ×100
Total population.
If the SFE was 40% this would mean that 40% of
the population was not in the sampling frame
11.
12. Estimation of mean with specified
precision
n= z2
1-α σ2/d2
D= specified precision on either side of mean
Estimation of proportion with specified
absolute precision
n= z2
1-α π(1- π )/d2
D= specified precision on either side of
proportion
13. For prospective study- Estimation of Relative
Risk
n= (Z1-α/2/ε)2
Where ε = specified relative precision
Retrospective study – Estimation of Odds
Ratio (OR)
n = Z1-α/2[1/{π1(1-π1)}+1/{π1(1-π1)}]
[In(1-ε)]2
Where ε = specified relative precision in
terms of fractions of OR
14. Sample size for Quantitative Data
n = 4σ2
E2
n= Sample Size
σ= Standard deviation of population
E= Desired allowable error
15. Sample size for Qualitative Data
n= 4pq
E2
p = positive character
q = 1-p
E = Allowable error
16. 1. Complete enumerations are practically
impossible when the population is infinite
2. When the results are required in a short time.
3. When the area of survey is wide
4. When resources for survey are limited
particularly in respect of money and trained
persons
5. When the item or unit is destroyed under
investigation
17. Sampling is to be done by qualified and
experienced persons, otherwise, the
information will be unbelievable.
Sample method may give the extreme values
sometimes instead of the mixed values
There is the possibility of sampling errors.
Census survey is free from sampling error
18. Principle of statistical regularity: Moderately
large number of units chosen at random from a
large group are almost sure on the average to
possess characteristics of the large group.
Principle of inertia of large numbers:
Other things being equal, as the sample size
increases, the results tend to be more
accurate and reliable
Principle of validity:
This states that the sampling methods provide
valid estimates about the population units
19. Principle of optimization
This principle takes into account the desirability
of obtaining a sampling design which gives
optimum results.
This minimizes the risk or loss of the sampling
design.
The foremost purpose of sampling is to gather
maximum information about the population
under consideration at minimum cost, time and
human power
This is best achieved when the sample contains all
the properties of the population – representative
sample.
21. Probability sampling-
In this members of the
population have a known chance
(probability) of being selected
Simple
Random
Systematic
Random
Stratified
Random
Random
Cluster
22. The probability of being selected is “known and
equal” for all members of the population
1. Blind Draw Method (e.g. names “placed in a
hat” and then drawn randomly)
2. Random Numbers Method (all items in the
sampling frame given numbers, numbers
then drawn using table or computer
program)
Tippett’s series comprising 41600 numbers
Fisher & Yates’ series comprising 15000 digits
Kendall & Smith’s series comprising 100000 digits
A Million Random Digits by Rand Corporation
3. Simple random sample by excel
4. Computer aided telephone interviewing (CATI)
25. The starting point in this example is therefore located
where row beginning with 8 & column beginning with 1
intersect at number 6
Select a beginning column at random again by tossing the die twice to select
block & column within the block e.g. if we toss a 3 & a1
Use 3rd block of columns 1st column headed by no. 1
e.g. if we throw a 2 & a 3,
Begin in 2nd block down In 2nd block select 3rd row no. is 8
Select starting point by tossing a dice twice to select row & column
First toss determines block of
rows
Second toss determines row within
the block
26. 927415 925612
926937 515107
867169 388342
512500 542747
168117 169280
014658 159944
832261 993050
032683 131188
062454 423050
806702 881309
837815 163631
926839 453853
670884 840940
772977 367506
622143 938278
767825 284716
6
Starting point in this
example is located
where row beginning
with 8 & column
beginning with 1
intersect at number 6
27.
28. Advantages:
Known and equal chance of selection
Easy method when there is an electronic
database
Disadvantages:
Complete accounting of population needed.
Cumbersome to provide unique designations
to every population member.
Very inefficient when applied to skewed
population distribution (over- and under-
sampling problems).
29. Way to select probability based sample from
a directory or list.
This method is at times more efficient than
simple random sampling.
This is a type of cluster sampling method.
Sampling interval (SI) =
population size (N)/sample size (n)
30. How to draw sample:
1) Calculate SI
2) Select a number between 1 and SI randomly
3) Go to this number as the starting point and
the item on the list here is the first in the
sample,
4) Add SI to the position number of this item
and the new position will be the second
sampled item,
5) Continue this process until desired sample
size is reached.
31.
32. Advantages:
Known and equal chance of any of the SI
“clusters” being selected
Efficiency..do not need to designate
(assign a number to) every population
member, just those early on the list (unless
there is a very large sampling frame).
Less expensive…faster than SRS
Disadvantages:
Small loss in sampling precision
Potential “periodicity” problems
33. Done correctly, this is a form of random
sampling
Population is divided into groups(Clusters)
Some of the groups are randomly chosen
In pure cluster sampling, whole cluster is
sampled.
In simple multistage cluster, there is random
sampling within each randomly chosen cluster
34.
35. For given sample size, a cluster sample has more
error than a simple random sample
Cost savings of clustering may permit larger
sample
Error is smaller if the clusters are similar to each
other
Cluster sampling has very high error if the
clusters are different from each other
Cluster sampling is NOT desirable if the clusters
are different
36.
37. It will help to-
1. Give true picture of vaccination status of target
population.
2. Cross check results with routine reporting
system.
3. Identify areas with poor & good vaccine
coverage.
4. Determine vaccines are given at correct age.
38. 1. List all villages & wards/sectors of cities &
towns included in area.
2. Write population of each village/ward/sector
against name.
3. Calculate & write cumulative population,
done in serial order in each village/ward.
4. Determine sampling interval = Cumulative
population/ 30 clusters.
5. Select random number ≤ sampling interval,
must have same number of digits.
39. 6. Identify community in which first cluster is
located, done by locating first village in
which cumulative population exceeds or
equals random number.
7. Cluster 2 = Random number + sampling
interval
8. Cluster 3 = Cluster 2 + sampling interval
.
.
.
9. Cluster 30 = Cluster 29 + sampling interval
40. Selection of first house in cluster
1. Go to center of village
2. Number the paths leading from center
3. Randomly select a direction. A good method is
to spin a bottle from the central location in the
village.
4. Count / estimate number of houses from center
of village to last boundary along that path
5. Select random number between 1 & total
number of houses.
6. This number represents first house from which
we start survey
41. Selection of next household
1. This is one, whose front door is nearest to the
front door of household just visited
2. Keep moving to the next till completed 7 children
3. If there is > 1 child of right age group, record the
particulars of all the children.
Excluded from the survey are
1. Household already visited
2. Household outside the survey area
3. Households that are locked
4. Military establishments, hostels, schools, temples
42. Household selection in densely populated
area & In multi-storied buildings –
If it is not possible to count/estimate the number of
houses along a particular road then distance may
be measured/estimated.
If time taken to end of road is 15 minute, then
choose the number between 1 & 15, e.g. the
number is 7, then walk for 7 minute & go to
nearest building to start.
Multistoried – Select the floor, then household
randomly
Double Storied – Even digits indicates ground
floor & uneven digits first floor
43. This method is used when the population is not
homogenous.
The population is separated into homogeneous
groups/segments/strata and a sample is taken
from each.
The results are then combined to get the picture
of the total population.
44. Sample stratum size determination
Proportional method (stratum share of
total sample is stratum share of total
population)
Disproportionate method (variances
among strata affect sample size for each
stratum)
45. Divide population into groups that differ in
important ways
Basis for grouping must be known before
sampling
Select random sample from within each group
For a given sample size, reduces error
compared to simple random sampling IF the
groups are different from each other
46.
47. Probabilities of selection may be different for
different groups, as long as they are known
Oversampling small groups improves intergroup
comparisons
It allows researcher to allocate a larger
sample size to strata with more variance
and smaller sample size to strata with less
variance.
Thus, for the same sample size, more precision is
achieved which is normally accomplished by
disproportionate sampling.
48. A stratified sampling approach is most
effective when three conditions are met-
1. Variability within strata are minimized
2. Variability between strata are maximized
3. The variables upon which the population is
stratified are strongly correlated with the
desired dependent variable.
49. Advantages over other sampling methods
1. Focuses on important subpopulations and
ignores irrelevant ones.
2. Improves the accuracy/efficiency of
estimation.
3. Permits greater balancing of statistical
power of tests of differences between strata
by sampling equal numbers from strata
varying widely in size.
50. Disadvantages
1. Requires selection of relevant stratification
variables which can be difficult.
2. Is not useful when there are no
homogeneous subgroups.
3. Can be expensive to implement.
51. Post stratification
Stratification is sometimes introduced after
the sampling phase in a process called "post
stratification".
This approach is typically implemented due to
a lack of prior knowledge of an appropriate
stratifying variable or when the experimenter
lacks the necessary information to create a
stratifying variable during the sampling
phase.
53. Also called Haphazard or Accidental
sampling or Chunk
Sample is selected from units that are
conveniently available or those population
groups in which it is easy to conduct surveys or
investigations are chosen irrespective of
representativeness of population e.g. person
interviewed at random in a shopping center for
television program, students in class, people on
State Street, friends.
54. Biases are maximum in this sampling
method & results obtained are
unsatisfactory in terms of drawing
conclusions.
If there is little variation in the population,
the possible bias problems are less
important
No reason tied to purposes of research.
Generally used for making pilot studies to
have just a basic idea about the study
variable.
55.
56. Sometimes called judgmental sampling.
The process whereby the researcher selects
a sample based on experience or knowledge
of the group to be sampling
Chose units because of certain
characteristics
Usually used when working with very small
samples
57. The non probability equivalent of stratified
sampling
Like stratified sampling, the researcher first
identifies the stratums and their proportions
as they are represented in the population
Then convenience or judgment sampling is
used to select the required number of subjects
from each stratum
This differs from stratified sampling, where
the stratums are filled by random sampling
58.
59. Steps
1. Divide the population into specific groups
2. Calculate a quota for each group based on
relevant and available data
3. Give each interviewer an assignment, which
states the number of cases in each quota
from which they must collect data
4. Each interviewers decide whom to interview
until they have completed their quota
5. Combine the data collected by interviewers
to provide the full sample
60. A special non probability method used when
the desired sample characteristic is rare
It may be extremely difficult or cost
prohibitive to locate respondents in these
situations
Snowball sampling relies on referrals from
initial subjects to generate additional subjects
While this technique can dramatically lower
search costs, it comes at the expense of
introducing bias because the technique itself
reduces the likelihood that the sample will
represent a good cross section from the
population
61. Steps
1. Make contact with one or two cases in the
population.
2. Ask these cases to identify further cases
3. Ask these new cases to identify further new
cases
4. Stop when either no new cases are given or
the sample is as large as is manageable.
64. A complex form of cluster sampling in which two
or more levels of units are embedded one in the
other.
The first stage consists of constructing the
clusters that will be used to sample from.
In the second stage, a sample of primary units is
randomly selected from each cluster (rather than
using all units contained in all selected clusters).
In following stages, in each of those selected
clusters, additional samples of units are selected,
and so on.
65. All ultimate units (individuals, for instance)
selected at the last step of this procedure are
then surveyed.
It is not as effective as true random sampling,
but it probably solves more of the problems
inherent to random sampling.
Multistage sampling is used frequently when a
complete list of all members of the population
does not exist and is inappropriate.
By avoiding the use of all sample units in all
selected clusters, it avoids large &
unnecessary, costs associated traditional
cluster sampling.
66. It is applied where an extensive area is
required to be studied within limited
resources.
To bring down the cost involvement, size of
the sample is reduced progressively in stages
by drawing a series of sub-samples till a
conveniently small yet representative sample is
obtained which can be studied within limited
resources.
67. If estimating the prevalence of ascariasis in
preschool children in all district of northern
India.
In first stage a random sample of the
concerned districts is obtained.
Random sample of all villages in concerned
districts in 2nd stage.
Random sample of houses in sampled villages
is obtained in 3rd stage.
There prevalence shall reflect prevalence of
ascariasis in all districts of north India.
68. • In 1st stage a
random sample
of the concerned
districts is
obtained.
Stage-1
• Random sample of all
villages in concerned
districts in 2nd stage.
Stage -2:
• Random sample of houses in
sampled villages is obtained in
3rd stage.
Stage -3
69. If we want to conduct in-person interviews with neighborhood
organizations.
There are 9 cities scattered around country with the relevant
types of organizations, and 16 organizations within each of the 9
cities (or 144 total organizations). we need to interview 12
organizations. A simple random sample would likely require
interviews in (and this travel to) these 9 distant cities:
70. In multi-stage clustered sampling, first randomly select a certain
number of cities (here three), and then randomly select four
organizations within each of the three cities. This saves travel
time, and also makes it easier to assemble a sampling frame (a list
of the ultimate sampling elements).
71. For studies to be carried out in several phases
A cross- sectional study on nutrition may be
carried out in phases-
1. Phase-1: kap study in all families
2. Phase -2: dietary assessment in subsample
3. Phase-3:anthropometric examination in
sub- sample of family members covered in
2nd phase.
4. Phase-4: Biochemical estimations of sub-
sample of members covered in 3rd phase.
72. • KAP study in
all families.
Phase-1
• Dietary assessment
in subsample
Phase -2
• Anthropometric
examination in sub-
sample of family members
covered in 2nd phase.
Phase- 3
• Biochemical estimations of
sub- sample of members
covered in 3rd phase
Phase- 4
73. Number of units get reduced in every
succeeding phase, which also reduces
magnitude of complicated procedures
reserved for last phase.
Study procedures are applied in phases of
increasing complexity so that most complex &
most expensive procedures are reserved for
last phase or smallest sample.
Therefore this sampling makes the studies less
expensive, less time consuming, less laborious
& more purposeful.
74. Surveys carried out in two or more sets or
lots with same objective
Results obtained are compared
Lot giving unfavourable result is considered
for interventions
E.G: to determine low vaccination coverage
cities in bangladesh
75. Combination of LQA sampling and cluster
sampling
Lot determined with the help of indicators
Within lot cluster sampling technique
applied
E.G: maternal & neonatal tetanus
elimination surveys
76. 1. Sampling error
2. Non - sampling error
Sampling error - the difference between result
of studying a sample & inferring a result about
the population and the result of census of whole
population
Non-Sampling error – errors that occur in
acquiring, recording or tabulating statistical
data that cannot be ascribed to sampling error.
They may arise in either census or a sample
77. Sampling errors are of 2 types
Biased error – These arise from any bias in
selection, estimation etc.
e.g.in place of simple random sampling
deliberate sampling has been used
Unbiased errors – Occur due to chance
differences between the members of population
included in the sample & those not included
78. They may arise in either census or a sample
These can be reduced to a greater extent by
using better organization & suitably trained
personnel at the field & table stages
Non-sampling is likely to increase with
increase in sample size
79. Availability sampling-
selecting on the basis of
convenience convenience
Cluster sampling:
dividing the population
into clusters, taking a
sample of the clusters.
Multi-stage sampling:
sampling subunits within
sampled units.
Quota sampling:
selecting fixed no. of
units in each of a no.of
categories.
Random sampling:
every combination
of a given size has an
equal chance of
being chosen.
Snowball sampling:
asking individuals
studied to provide
references to others.
Stratified sampling:
dividing population into
groups and sampling
each group.
Systematic sampling:
choosing every nth item
, beginning at a random
point.
80. 1. K.Park, Textbook of Preventive & Social Medicine, 20th
ed.
2. A. Indrayan Basic Methods of Medical Research 2nd ed.
3. Beth Dawson, Basic & Clinical Biostatistics 4th ed.
4. G. M. Dharr Foundations of Community Medicine
5. Sunder Lal Textbook of Community Medicine 2nd ed.
6. BK Mahajan Methods in Biostatistics 6th ed.
7. S P Gupta Statistical Methods 37th ed.
8. www.ablebits.com/images/random-generator-exce...
9. Random Generator for Microsoft® Excel® -http--
www_ablebits_com-images-random-generator-excel-
sample_png_filesexcel-random-generator-assistant-
free-addins_files
10. Training for mid level managers facilitator guide for the
epi coverage survey expanded programme on
immunization world health organization who/epi,
revised 1991