Theory and Practice
Basic Terminology in Sampling
Sampling Element: This is the unit about which information is sought by
the marketing researcher for further analysis and action.
The most common sampling element in marketing research is a human
respondent who could be a consumer, a potential consumer, a dealer or a
person exposed to an advertisement, etc.
But some other possible elements for a study could be companies,
families or households, retail stores and so on.
Population : This is not the entire population of a given geographical area,
but the pre-defined set of potential respondents (elements) in a
For example, a population may be defined as "all mothers who buy
branded baby food in a given area" or "all teenagers who watch MTV in
the country" or " all adult males who have heard about or use the
AQUAFRESH brand of toothpaste" or similar definitions in line with the
study being done.
This is a subset of the defined target population, from which we can
realistically select a sample for our research.
For example, we may use a telephone directory of Mumbai as a
sampling frame to represent the target population defined as "the
adult residents of Mumbai".
Obviously, there would be a number of elements (people) who fit
our population definition, but do not figure in the telephone
directory. Similarly, some who have moved out of Mumbai recently
would still be listed.
Thus, a sampling frame is usually a practical listing of the
population, or a definition of the elements or areas which can be
used for the sampling exercise.
If individual respondents form the sample elements, and if we directly
select some individuals in a single step, the sampling unit is also the
element. That is, both the unit and the element are the same.
But in most marketing research, there is a multi-stage selection.
For example, we may first select areas or blocks in a city or town. These
form the first stage Sampling Units.
Then, we may select specific streets within a block or area, and these are
called second stage sampling units.
Then we may select apartments or houses - the third stage sampling units.
At the last stage, we reach the individual sampling element - the
respondent we wanted to meet.
The Sample Size Calculation
It is not a formula alone that determines sample size in actual
marketing research. Sampling in practice is based on science, but is
also an art.
The basic assumptions made while computing sample sizes through
the use of formulae are sometimes not met in practice. At other
times, there are other factors which are influential in increasing or
decreasing sample sizes obtained through the use of formulae.
For now, remember that sample size is decided based on
• use of formulae,
• experience of similar studies,
• time and budget constraints,
• output or analysis requirements,
• number of segments of the target population,
• number of centres where the study is conducted, etc.
There are two formulas depending on variable type, used for computing
sample size for a study. The first is used when the critical variable studied
is an interval-scaled one.
Formula for Sample Size Calculation when Estimating Means
(for Continuous or Interval Scaled Variables)
The formula for computing ‘n’, the sample size required to do the study,
n = ----------
Let us examine one by one what the quantities ‘Z’, ‘s’, and ‘e’ represent.
We will then apply the same to an example to see how it works in
Z :The ‘Z’ value represents the Z score from the standard normal
distribution for the confidence level desired by the researcher. For
example, a 95 percent confidence level would indicate (from a
standard normal distribution for a 2-sided probability value of 0.95)
a ‘z’ score of 1.96. Similarly, if the researcher desires a 90 percent
confidence level, the corresponding ‘z’ score would be 1.645
(again, from the standard normal distribution, for a ‘2’ sided
probability of 0.90).
Generally, 90 or 95 percent confidence is adequate for most
marketing research studies. A 100 percent confidence level is not
practical, as it means we have to take a census of the entire
population, instead of using a sample.
We will use z = 1.96, equivalent to a 95 percent confidence level,
in our example.
s : The ‘s’ represents the population standard deviation for the variable which
we are trying to measure from the study. By definition, this is an unknown
quantity, since we have not taken a sample yet. So, the question of knowing
the value of ‘s’, the sample standard deviation, does not arise.
However, we can use a rough estimate of the sample standard deviation for the
variable being measured. This estimate can be obtained in the following ways –
If past studies have measured this variable, we can use the standard deviation of
the variable from one of the studies from the recent past. It serves as a good
A very small sample can be taken as a test or pilot sample, only for the purpose
of roughly estimating the sample standard deviation of the concerned variable.
If the minimum and maximum values of the variable can be estimated, then the
range of the variable’s values is known. Range = Maximum value – Minimum
value. Assuming that in practically all variables, 99.7 percent of the values of the
variables would lie within + 3 standard deviations of the mean, we could get an
approximate value of the standard deviation by dividing the range by 6.
The logic of this is that Range is equal to 6 standard deviations for most variables.
Therefore, Range, when divided by 6, should give a fairly good estimate of the
e : The third value required for calculating the sample size required for the
study is ‘e’, called tolerable error in estimating the variable in question. This can
be decided only by the researcher or his sponsor for the study. The lower the
tolerance, the higher will be the sample size. The higher the tolerable error, the
smaller will be the sample size required.
Now, let us take an example of the use of the above formula, to see how it works.
Let us assume we are doing a customer satisfaction study for a washing machine.
We are measuring satisfaction on a scale of 1 to 10. 1 represents "Not at all
satisfied", and 10 represents "Completely Satisfied". The scale would look like this
on a questionnaire –
Customer Satisfaction Scale
We will assume that the questionnaire consists only of 7-8 questions, all of them
using this 10-point scale. Therefore, the variable we are trying to measure or
estimate through the survey, is Customer Satisfaction, which is being measured on
1 2 3 4 5 6 7 8 9 10
We will apply the formula discussed for sample size calculation, and
check for its usefulness.
Zs is the formula, for variables which are
continuous, or scaled.
Z Let us assume we want a 95 percent confidence level in our
estimate of customer satisfaction level from the study. Then, from the
standard normal distribution tables, (for a 2-sided probability value of
0.95), the Z value is 1.96.
s Let us assume that such a customer satisfaction study was not
conducted in the past by us. We have no idea of the standard deviation
of the variable “Customer Satisfaction”. We can then use the rough
approximation of Range divided by 6 to estimate the sample standard
In this case, the lowest value of customer satisfaction is 1, and the
highest value is 10. Thus, the Range of values for this variable is 10–1 =
9. Therefore, the estimated sample standard deviation becomes 9/6 = 1.5.
e The tolerable error is expressed in the same units as
the variable being measured or estimated by the study. Thus,
we have to decide how much error (on a scale of 1 to 10) we
can tolerate in the estimate of average customer satisfaction.
Let us say, we put the value at + 0.5. That means we are
putting the value of ‘e’ as 0.5. This means, we would like our
estimate of customer satisfaction to be within 0.5 of the actual
value, with a confidence level of 95 percent (decided earlier
while setting the ‘z’ value).
Slide 9 contd….
Now, we have all 3 values required for calculating
‘n’, the sample size. So let us calculate ‘n’.
n = Z s 2
1.96 x 1.5 2
= (1.96 x 3) 2
= 34.57 or 35 (approximately)
Therefore, a sample size of 35 would give us an
estimate of customer satisfaction measured on a 1–10
point scale, with 95 percent confidence level, and
error level maintained within + 0.5 of the actual
If we were to tighten our tolerance level of error (e)
to + 0.25 instead of + 0.5, we would have to take a
sample of higher size.
‘n’ would then be equal to
1.96 x 1.5 2
= ( 1.96 x 6 ) 2
= 138 (approximately)
Similarly, for any change in the estimate of ‘s’ or the value of ‘Z’ we choose to
set, the value of ‘n’, the sample size, would change.
In general, sample size would increase if
•.standard deviation ‘s’ is higher
•.confidence level required is higher
•.error tolerance 'e' is lower
The major things to remember in the above formula are that
1.‘Z’ value is set based on the confidence level we desire.
2. ‘s’ value is estimated from past studies involving the same variable, or from
the approximate formula of Range, if we can estimate the
Range of values for the variable in question.
3. ‘e’ value is also set by us.
Formula for Sample Size Calculation when Estimating Proportions
In cases where the variable being estimated is a proportion or a percentage, a
variation of the formula mentioned earlier should be used.
Such variables are typically found in questions that have a dichotomous
scale, with only two choices for an answer. For example, regular users
versus non-users. If we are estimating the proportion of respondents who
are regular users of our brand of toothpaste, say, we might use following
formula to determine sample size.
Here, the formula is
n = pq ----
Let us look at the meaning of each of the terms on the right hand side of the
‘p’ is the frequency of occurrence of something expressed as a
proportion. For example, if the number of users you would expect to find
in a sample is 1 out of every 4 respondents, ‘p’ would be ¼ or 0.25. ‘q’ is
simply the frequency of non-occurrence of the same event, and is
calculated as (1-p). In other words, ‘p’ and ‘q’ always add up to 1. Here
again, it should be noted that we are actually trying to determine ‘p’ or
estimate ‘p’ by doing our survey. So, the estimate of ‘p’ that we use to
compute ‘n’ in the formula is either a very rough guess based on prior
studies, or on some other data. It is used only to calculate the sample size
‘n’. Only after doing the study will we have our true estimate of ‘p’, the
proportion of users in the population. It is similar to the problem
mentioned earlier (in the estimation of means for continuous variables)
when we used an estimate of ‘s’ before doing the actual study, only for the
purpose of computing sample size.
Z : ‘Z’ is the confidence level-related value of the standard normal
variable, as discussed in the earlier section. It is equal to 1.645 for 90
percent confidence level, and 1.96 for 95 percent confidence level (from
the standard normal distribution table).
e : ‘e’ is once again, the tolerable level of error in
estimating ‘p’ that the researcher has to decide. If we decide
that we can tolerate only a 3 percent error, ‘e’ has to be
expressed in terms of the same units as ‘p’. So, a 3 percent
tolerable error would translate into e = 0.03 because ‘p’ is a
proportion, with values ranging from 0 to 1 only. ‘q’ is also a
proportion, with the same range of values, and p+q is equal to
Slide 13 contd….
Example of Use of Formula for Proportions
Let us plug in some numbers to see how the formula
works. Assuming we are trying to estimate the
proportion of the population who use our toothpaste
brand AQUA, let us assume that we want a
confidence level of 95 percent in our results (which
means Z = 1.96), and ‘e’ is 0.03, as discussed above.
‘p’, from previous studies or from prior knowledge,
is estimated as 0.25 for the purpose of sample size
Then, n = pq z . 2
which is equal to ( 0.25 ) ( 0.75 ) 1.96 2
or n = ( 0.25 ) ( 0.75 ) ( 4268.4 )
Therefore, we need a sample size of 800 respondents
to estimate the true value of ‘p’, with a 95 percent
confidence level, and with an error tolerance of +
0.03 from the true value.
Here, like in the earlier formula, the sample size is higher if
The confidence level is higher
The error tolerance is lower
But, the relationship between sample size and estimated ‘p’ is
somewhat different. The sample size increases as ‘p’ increases
from 0 to 0.5, but decreases thereafter, as ‘p’ increases from 0.5 to
1. Thus, other things being equal, sample size required is
maximum if ‘p’ is equal to 0.5. This is because the formula also
contains ‘q’ which is equal to (1-p). The product of ‘p’ and ‘q’ is
maximum when p = 0.5, q = 0.5 (0.5 x 0.5 = 0.25). At all other ‘p’
values, the product of ‘p’ and ‘q’ is less than 0.25. Therefore, the
sample size formula gives the highest value when p = 0.5.
This also gives us an easy way out of estimating the value of ‘p’, if
past information is not available. We can simply set the value of
‘p’ to 0.5, because that will give us the maximum sample size. This
could be an overestimated sample size, but it can never
underestimate sample size.
Limitations of Formulae
Number of Centres
Most studies deal with multiple locations spread across the country. If the data is
to be analysed separately for each geographical segment, the overall sample size
obtained from the formula has to be split into these geographical centres or
segments. In such cases, we may intervene, and fix a minimum sample size for
each centre / city.
Different varieties and scales of variables are used in a questionnaire. Our
assumption in using the above formulae was that we have only one major type of
variable in the questionnaire – either a continuous variable or a proportion.
Actually, we have many different types of variables in any commonly used
questionnaire. This may require formulas to be used for each different scale / type
of variable. Then, we have to reconcile the different sample sizes arrived at for
each different variable type. Usually, the easy way out in such cases is to take the
maximum sample size which is calculated, for one important variable in the
Cell Size in Analysis
There may be 5 income categories among our respondents, and 4 age
categories. This creates a table with 5x4, or 20 cells. Now, even though the
overall sample size was adequate for simple analysis, the sample size in some
of these 20 cells may not be adequate. There are various rules of thumb used
to overcome or prevent such problems. One says that each cell must have a
minimum of 10 entries for us to do any analysis using that cell. Such problems
can be overcome more easily if we know in advance what types of analysis we
are likely to do. In other words, blank formats of output tables can be specified
before doing the study.
Time and Budget Constraints
Many a time, a study has to be done quickly to aid decision-making, or to prevent
competitors from learning too much about possible marketing strategy changes.
There may also be budget constraints, because more money has been spent in
product development, or in promotions, etc. Sampling design has to keep in
mind both the time and budget constraints for the study, before finalising a
The Role of Experience in Determination of Sample Size
Given the many limitations in using formulae to determine the “right” sample
size, past experience of conducting marketing research studies is often used to
We will now discuss some of the commonly used sampling techniques,
their merits and demerits
Sampling Techniques can be classified under two major types –
probability and non-probability.
Probability Sampling Techniques
These are techniques where each sampling unit (usually a household or
individual in a marketing research study) has a known probability of being
included in the sample. The probability of inclusion need not be equal for
every sampling unit. In some methods, it is equal, and in some others, it is
unequal. But it should be a known probability, for it to be classified as a
probability sampling method.
The other major distinguishing feature of probability sampling methods is
that they are unbiased. The scheme of selection of units from the target
population is pre-specified, and then the sample is selected according to
the scheme. Not according to any biases or preferences of the researcher.
In practice, there are quite a few difficulties in using the probability
sampling methods. In such cases, the best feasible theoretical
method with minor modifications may be used. The major types of
probability sampling techniques are –
•.Simple Random Sampling
•.Stratified Random Sampling
•.Multi-stage or Combination Sampling
Slide 18 contd...
Simple Random Sampling
This technique is conceptually the easiest to understand, but quite difficult to
implement in a realistic marketing research project. To illustrate what it is,
assume that we wish to estimate the average income level of 100 employees of
a company. We do not have access to their income levels, so we have to
interview them and find out their income level. We have a time constraint,
and we just need a quick estimate. Assume that we have decided we would be
happy with a sample of 5, randomly selected from the 100. How do we select
If we wish to use simple random sampling we could make a list of all 100
employees. Then, a number could be allotted to each employee. We could
then write these 100 numbers on small pieces of paper, one number on each.
Shuffling these folded pieces of paper, we can draw 5 pieces out of the 100,
and use these employees as our sample.
This appears very easy to do when there is a relatively small number of
people to pick from. But when we deal with typical marketing research
problems, the numbers are quite large, and more importantly, the exact
numbers are not known. This creates a very practical difficulty for the
marketing researcher who wishes to use Simple Random Sampling.
Imagine trying to procure a list of all Indian consumers of toilet soap, for
a study into their brand preferences. It is an impossible task, and
therefore, Simple Random Sampling, strictly speaking, is infeasible.
But it is possible to use modifications of the basic technique, with
reasonable checks and balances to keep the method unbiased in practice.
Slide 19 contd...
Stratified Random Sampling
In this technique, the total target population is
divided into strata or segments on the basis of some
important variables. For example, a consumer
population may be divided into age brackets of below
25, 25-40 and above 40 years. Then, a sample is
taken from each of the strata defined earlier.
Practically, the overall sample size is first calculated,
using a formula of the type discussed earlier, or based
on judgement and experience. This overall sample is
then divided into sub-samples for each stratum or
segment. There are two ways of doing this – called
proportionate stratification, and disproportionate
stratification. We will illustrate, based on our
example of the 3 age-based strata.
Total Sample Size for Proportionate Stratified
First, to compute the overall sample size for a
proportionate stratified sample, we have to use a
Σ Wi Si
instead of the earlier formula discussed at the
beginning of this chapter. The pre-condition for
using this formula is that we need to know the
standard deviation (estimated) of the concerned
variable for each of the strata S1, S2, S3, etc. We also
have to assign a weight to each stratum, which is Wi
in the formula above. Wi is generally calculated as a
proportion of number of people in stratum ‘i’, to the
number of people in all the strata. In other words,
Wi = Ni , where Ni is the population of stratum ‘i’,
N and ‘N’ is the total population targeted
F or the study.
For calculating the weights, therefore, we must have
at least an estimate of the distribution of our target
population among the strata. We also need Si , the
standard deviation of the variable being estimated,
for each stratum. These are not always easy to get.
Slide 20 contd...
However, we will illustrate, assuming we are trying
to gather data for a Customer Satisfaction Study for a
T.V. Channel. Let us assume we want to know the
overall Customer Satisfaction level among three age
groups – below 25, 25 to 40 and above 40, for an
entertainment channel such as Sony. We want to
determine the customer satisfaction on a 7 point
scale, 1 being low satisfaction level, and 7 being high
Our formula for total sample size, we recall, is
n = ---- Σ Wi Si
We will now assume that
Z = 1.96 (assuming 95 percent confidence level)
e = 0.05 (tolerable error on the 7 point scale)
We will assume that for the three age-based strata,
the weights and standard deviations are known or can
be calculated. A rough estimate of the standard
deviation ‘s’ (overall) is given by the formula (Range
÷ 6). Range is 7–1 = 6 because the maximum value
of the rating can be 7, and minimum can be 1.
Therefore Range = 6 = 1
We will now assume that S 1 , S 2 , S 3 , the standard
deviations of customer satisfaction are 1.2, 0.9 and
0.7 for the three age-based strata we have described.
Also, let us assume that 40 percent of the target
population of TV watchers is in the 40 plus age
group, 30 percent is in the 25-40 age group and 30
percent is in the below 25 age group. The weights
for the age groups W 1 , W 2 , W 3 will then be (from the
lower age group to the higher), 0.3, 0.3 and 0.4. The
values are written again below –
S1 = 1.2 W 1 = 0.3
S2 = 0.9 W 2 = 0.3
S 3= 0.7 W 3 = 0.4
Now, applying the formula,
n = ---- Σ Wi Si
, we get
n = 1.96 2
[ (0.3) (1.2) 2
+ (0.3) (0.9) 2
+ (0.4) (0.7) 2
= 1536 [0.871] = 1338 (approx.)
This is the total sample size required. (Note that if
we had used the formula for simple random sampling
discussed earlier, sample size n would have been
(using s=1 as estimated above) equal to 1536. So,
stratified sampling has led to a smaller sample size of
1338 for the same z and e values.)
To split this total sample of 1338 into proportionately stratified sub-
samples, we simply use the same weights as determined earlier. Thus, the
sample size for stratum 1 (below 25 age group) would be
1338 x W1 = 1338 x 0.3 = 401
For stratum 2, it would be
1338 x W2 = 1338 x 0.3 = 401
For stratum 3 (above 40 age group), it would be
1338 x W3 = 1338 x 0.4 = 536 (approx.)
Thus, we would take a sample of 401, 401 and 536 from each of the three
strata. The total sample size is maintained at 1338.
Disproportionate Stratified Sampling
One of the keys to effective sampling is to take a sample as large or as small as
required. Not too high and not too low. But in practice, we need to know the
variability of the population to be able achieve an accurate sampling plan.
As we know intuitively, the higher the variability among the population (of the
variable we are measuring or estimating), the higher the sample size required from
As an illustration (though exaggerated), if we know that all the population is of
exactly the same characteristics, then a sample size of 1 is enough to tell us the
characteristics of the entire population.
At the other extreme, if the population is extremely variable, each unit having its
own different characteristics, we would need a very large sample to accurately
represent the population. Most populations do not fall into extreme zones, and
generally strata or segments consist of units that are similar to each other.
When doing stratified sampling, we would probably go for disproportionate
stratified samples if the variability of the variable being estimated is different from
segment to segment. If the variability is the same, we could take a proportionate
stratified sample. We measure variability by the standard deviation of the
The formula for the total sample size calculation is
(for disproportionate sampling)
n = ---- ( Σ Wi Si ) 2
This is slightly different from the formula used in
case of proportionate stratified sampling.
To illustrate, let us use the same example of three
age-based strata, and check how to use a
disproportionate sample in the same.
n = ---- ( Σ Wi Si ) 2
n = 1.96 2
[ (0.3) (1.2) + (0.3) (0.9) + (0.4) (0.7)] 2
= (1536) (0.8281) = 1272 (approx.)
Thus, we see that compared to the proportionate
stratified sample, we have got a lower sample size,
for the same level of tolerable error (e) and Z (1.96,
95 percent confidence level). In general, we will note
that disproportionate stratified samples tend to be
more efficient (lower sample sizes are obtained), than
proportionate stratified samples, because we allocate
sample size according to the variability in the strata.
We have yet to allocate the sub-samples to the strata.
We will now do that. The criterion for doing so
would be to do it in proportion to the variation in a
given stratum, compared to the total variation in all
In other words,
ni = ( Ni Si ) n
( Σ Ni Si )
In our three strata,
nI = Sample size for stratum ‘i’
n = Total sample size = 1272 (calculated above)
NI =Proportion of population belonging to stratum ‘i’
SI = Standard deviation of the variable (customer
satisfaction) in stratum ‘i’
We have assumed
N1= 0.3 S1 = 1.2
N2= 0.3 S2 = 0.9
N3= 0.4 S3 = 0.7
n = 1272 from our calculation
Therefore, the sample size in stratum 1 (age group
n 1= (0.3) (1.2) (1272)
(0.3) (1.2) + (0.3) (0.9) + (0.4) (0.7)
= (0.36) x (1272) = 503
n 2 = (0.3) (0.9) x 1272
= 0.27 x 1272 = 377
n 3 = (0.4) (0.7) x 1272
= 0.28 x 1272 = 391
Thus, the sample is divided into the three age groups in proportion to the
variation in customer satisfaction, and not in proportion to the number of
respondents in each stratum.
For example, the below 25 segment has the largest sample size of 503, even
though it has only 0.3 or 30 percent of the population. If we had gone for
proportionate stratified sampling, this segment would have got a sample size
of 0.3 x 1272 = 382 only. This would have been under-representative for
We have discussed the pros and cons of proportionate and disproportionate
stratified sampling in these two sections. The reason for such an extensive
discussion is because many of the questions about sampling efficiency get
answered when we think about the need for stratification.
It has been researched and proven that if feasible, stratified sampling is the
most efficient method of probabilistic sampling. That is, for a given sample
size, it produces less sampling error than either simple random sampling or
We now move on to a discussion of other probabilistic methods of sampling.
Cluster Sampling / Area Sampling
A major difference between previously discussed methods of sampling and cluster
sampling is that a group of objects / units for sampling is selected in cluster
A cluster is a group of sampling units or elements, which can be identified, listed
and a sample of which can be chosen. Theoretically, a cluster could be on the
basis of any criterion. But in practice, clusters tend to be found either in terms of
geographical areas, or membership of some groups such as a church, a club, or a
When the clusters are selected on the basis of geographical area, it is also called
If cluster sampling is only a single stage procedure, then
1. A list of all available clusters should be prepared.
2. All clusters should be numbered.
3. A sample of clusters (number to be decided by researcher) should be
Practically, most of the time, 2 or more stages of sampling takes place.
Out of the clusters selected in the first stage, a sample of units
(households) is generally taken, because the number of people in a cluster
is usually too large for sampling purposes.
One problem with cluster sampling is that the members of a cluster tend to
be similar – for example, people living in a block or neighbourhood come
from the same socio-economic background; have similar tastes, buying
In general, cluster sampling is statistically inferior to simple random
sampling and stratified random sampling. Its sample tends to be less
representative than the other two methods. In other words, it produces
more sampling error for the same sample size, when compared to the other
But on the positive side, the cost of cluster sampling is also usually lower.
So, the researcher may be able to justify using this technique on the
grounds of low cost and convenience.
Systematic sampling is very similar to Simple Random Sampling, and easier to
practice. Just as we do in a simple random sample, we start with a list of all
sampling units or respondents in the population. We first compute the sample
size required, based on a formula.
Once the sample size (n) is decided, we divide the total population into (N ÷ n)
parts, where ‘n’ is the sample size required. From the first part of sampling units,
we pick one at random. Thereafter, we pick every (N ÷ n) th
item from the
To illustrate, say we have a population of 300 students, for some research. We
need a sample of 15 out of these. The sampling fraction is 15/300 which means
1 out of every 20 students will be selected, on an average.
We divide the list into 300/15 = 20 parts. Out of the first 20 students, we choose
any one at random. Let us say, we choose student number 7 (all students are
listed). Thereafter, we choose student numbers 7+20, 7+20+20, 7+20+20+20
and so on in a systematic sampling plan. Therefore, the selected students will be
numbers 7, 27, 47, 67, 87, 107, 127, 147, 167, 187, 217, 237, 257, 277 and 297.
All these 15 students will comprise our total sample for the study.
In an ordered list according to the criterion of interest, systematic
sampling produces a more representative sample than simple random
sampling. For example, if all students were arranged in ascending
order of age, a systematic sample would produce a sample consisting
of all age groups.
However, a potential drawback also exists. If the list is drawn up such
that every 20th
student were similar on the characteristic we are
estimating, either by chance or design, then systematic samples can go
very wrong. So a list should be examined to see that there is no
cyclicality which coincides with our sampling interval.
Slide 32 contd...
Slide 33 Multistage or Combination Sampling
As the name indicates, in this type of sampling, we do not choose the final sample
in one stage. We combine two or more stages, and sometimes 2 or more different
methods of probability sampling.
We have already talked about 2-stage Area Samples while discussing Cluster
Sampling. Usually, multi-stage methods have to be used when doing research on a
We may divide the national-level target population for our survey into clusters or
some such units. For example, we may divide India into 5 metro clusters, 20 class
A towns, 200 class B towns, and take our first stage sample as 1 metro, 3 class A
towns, and 10 class B towns, based on our sampling plan.
In the second stage, we may choose a stratified sample based on household income
and age of respondent. In such a case, we are using a two stage sampling plan,
which is a combination of Cluster Sampling, and Stratified Random Sampling.
If we go on sampling by geographical area based clusters in all the stages, it could
be a 3 or 4 stage cluster sample.
Non-Probability Sampling Techniques
We have so far discussed probability sampling techniques. In reality, because of
various difficulties involved in obtaining reliable lists of the desired target
population, it is difficult to use a textbook probability sampling prescription.
Therefore, some compromises could be made, or approximately probability-type of
sampling procedures may be used. Some of the non-probabilistic techniques may
also be used explicitly in cases where it is not feasible to use probability based
The major difference is that in non-probability techniques, the extent of bias in
selecting a sample is not known. This makes it difficult to say anything about the
representativeness or accuracy of the sample. Nevertheless, if done
conscientiously, some of these are good approximations for the probability
There are four major non-probability sampling techniques. These are –
The first method, quota sampling, is very similar to stratified random sampling.
The first step of deciding on the strata, or segments which the population is divided
into, is actually the same.
The second step, of calculating a total sample size, and allocating it to the various
strata, is also the same. The major difference is that, random selection of
respondents is not strictly adhered to. More liberty is given to the field worker to
select enough respondents to complete the segmentwise quota.
In practice, unless there are untrained field workers, or the field supervision is lax,
the results produced by a quota sample could be very similar to the one produced
by a stratified random sample. But there is no guarantee that it would be similar.
In practice, many researchers use quota sampling, because it saves time, compared
with stratified random sampling. For example, if a household is locked, a quota
sample would permit the field worker to use a substitute household in the same
apartment block. But with a stratified random sample, he would be expected to
make a second or third attempt at different times of the day to contact the same
locked household. This would increase the time taken to complete the required
This is not used often, as it is difficult to justify. The method relies only on the
judgement of the researcher as to who should be in the sample.
It obviously suffers from a researcher bias. If a different researcher were to do
the same study, he is likely to select an entirely different kind of sample.
This is employed usually in pre-testing of questionnaires. It involves picking
any available set of respondents convenient for the researcher to use.
For example, students could be used as a sample by a marketing researcher who
lives in a college town. They (the students) need not be representative of the
target population for the study, for the product being researched.
Other examples of convenience sampling includes on-the-street interviews, or
any other meetings, or from employees of one office block or factory. Another
common example of convenience sampling is the one by TV reporters who
This technique is used when the population being sought
is a small one, and chances of finding them by traditional
means are low. For example, to find owners of Mercedes
Benz cars in a city, we may go to one or two, and ask
them if they know anyone else who owns one. They in
turn are asked for more names of owners.
Slide 36 contd...
Census Versus Sample
It would appear from our discussion of sampling that it is not possible to do a
census in marketing research. Strictly speaking, it is possible to do one if the
population size is small. For example, if 200 solar cooker owners exist in a
town, it may be possible to meet all of them, if their addresses were available,
or could be obtained.
In some cases, like a survey of distributors or dealers, or even industrial
buyers, it may make sense to do a census if it is feasible. Particularly if
opinions or buying behaviour of respondents in a small population are likely
to be widely divergent.
But in most cases, if populations are reasonably large or very large, it makes
little sense to do a census. One major reason is that it may simply take too
long. Data may arrive too late for decision-making. Inaccuracies also are
likely to be a function of the volume of data collected. We will discuss these
in the next section under the subject “Sampling and Non-sampling Errors”.
Types of Errors in Marketing Research
Any research study has an error margin associated with it. No method is foolproof,
as we will see, including a census. This is because there are two major types of
errors associated with a research study. These are called –
•Sampling Error or Random Error
•Non-sampling or Human Error
This is the error which occurs due to the selection of some units and non-selection of
other units into the sample. It is controllable if the selection of sample is done in a
random, unbiased way. In other words, if a probability sampling technique is used, it
is possible to control this error. In general, this error reduces as sample size
This is the effect of various errors in doing the study, by the interviewer, data entry
operator or the researcher himself. Handling a large quantity of data is not an easy
job, and errors may creep in at any stage of the researcher. The data entry person
may interchange the column of ‘yes’ and ‘no’ responses while entering or
compiling data, or the interviewer may cheat by not filling up the questionnaire in
the field, and instead, fudge the data. Or, the respondent may say one thing, but
another may be recorded by mistake. These errors are usually proportionate to the
sample size. That is, the larger the sample size, the larger the non-sampling error.
Also, it is difficult to estimate the size of non-sampling error. But we can use some
controls on the quality of manpower, and supervise effectively to minimize it.
Slide 38 contd...
1. This is the total of sampling error + non-sampling error.
2. Out of this, the sampling error can be estimated in the case of probability
samples, but not in the case of non-probability samples.
3. Non-sampling errors can be controlled through hiring better field workers,
qualified data entry persons, and good control procedures throughout the
4. One important outcome of this discussion of errors is that the total error is
usually unknown. But, we may have to live with higher non-sampling error
in our attempt to reduce sampling error by increasing the sample size of the
study, not to mention the higher cost of a larger sample.
5. Therefore, it is worthwhile to optimise total error by optimising the sample
size, rather than going blindly for the largest possible sample size.