TOPIC: report on sampling techniques
What is Sampling:
A process used in statistical analysis in which a predetermined number of observations will
be taken from a larger population. The methodology used to sample from a larger population
will depend on the type of analysis being performed, but will include simple random
sampling, systematic sampling and observational sampling.
It is technique of drawing samples, I.e., it is a technique of collecting data only on a part of
the population to reveal the characteristics of the entire population.
Purpose of Sampling
The purpose of sampling is to provide various types of statistical information of a qualitative
or quantitative nature about the whole by examining a few selected units. The sampling
method is the scientific procedure of selecting those sampling units which would provide the
required estimates with associated margins of uncertainity, arising from examining only a
part and not the whole.
Reasons for sampling instead of census / Need for sampling:
There are 6 reasons for sampling
(3) Large size of many population
(4) Inaccessibility of the entire population.
(5) Destructive nature of many Observation
(6) Reliability or accuracy.
Unit cost of collecting data in the case of census is significantly less then in the case of
sampling for example: In case of census is taka 200, while in the case of sampling is taka
1,000 but due to the larger number of items the total cost involve in the case of census of
census is significantly higher then in the case of sampling.
For example, We can find out the total cost of collecting information by multiplying the total
population with the unit cost in case of census. Here total population = N
We can find out the total cost of collecting information by multiplying the sample size with
the unit cost in case of census.
Here sample size=n.
10,00,000 x 200 = 20,00,00,000
5,000 x 1000 = 50,00,000
Unit time involve in the case of sampling then in the case census but due to the larger size of
population total time involve in the case of census in significantly higher then in the case of
(3) Large size of many population:
In some cases the size of the population is extremely large. All of them are not treaseable due
in traveling, disease, death, mental abnormality, prisoners etc. In that situation the only way
to conduct the research is collecting data through a sample survey.
(4) Inaccessibility of the entire population:
In some cases the entire population may not be accessible. At that case sampling is necessary.
Suppose in some cases the entire population is inaccessible because of aircraft crash.
(5) Destructive nature of many population:
Due to destructive nature of many of the population, the resources is completed to collect
information only on a part of the population.
Blood test for a patient.
Life hours of a tube light.
By using a scientific sampling technique one can minimize the sampling error and as
qualified investigators are included, the non-sampling error committed in the case of sample
survey is also minimum.
The amount of non-sampling error in the case of census is much higher than the total amount
of sampling and non-sampling error committed in the case of a sample survey ( as less
qualified investigator are involve in the case of census and the supervision, monitoring and
quality control mechanism in the case of census.
The degree of errors has a relationship with reliability. If error decrease than the reliability
increase sampling decrease both the sampling and non-sampling error. So, it enhance the
reliability of information.
How to Determine Sample Size, Determining Sample Size
In order to prove that a process has been improved, you must measure the process capability
before and after improvements are implemented. This allows you to quantify the process
improvement (e.g., defect reduction or productivity increase) and translate the effects into an
estimated financial result – something business leaders can understand and appreciate. If data
is not readily available for the process, how many members of the population should be
selected to ensure that the population is properly represented? If data has been collected, how
do you determine if you have enough data?
Determining sample size is a very important issue because samples that are too large may
waste time, resources and money, while samples that are too small may lead to inaccurate
results. In many cases, we can easily determine the minimum sample size needed to estimate
a process parameter, such as the population mean .
When sample data is collected and the sample mean is calculated, that sample mean is
typically different from the population mean . This difference between the sample and
population means can be thought of as an error. The margin of error is the maximum
difference between the observed sample mean
and the true value of the population mean
is known as the critical value, the positive
value that is at the vertical boundary for the
in the right tail of the standard normal distribution.
is the population standard deviation.
is the sample size.
Rearranging this formula, we can solve for the sample size necessary to produce results
accurate to a specified confidence and margin of error.
This formula can be used when you know and want to determine the sample size necessary
to establish, with a confidence of
, the mean value to within
. You can still use
this formula if you don’t know your population standard deviation and you have a small
sample size. Although it’s unlikely that you know when the population mean is not known,
you may be able to determine from a similar process or from a pilot test/simulation.
Let’s put all this statistical mumbo-jumbo to work. Take for example that we would like to
start an Internet service provider (ISP) and need to estimate the average Internet usage of
households in one week for our business plan and model.
Sample Size Calculation Example
We would like to start an ISP and need to estimate the average Internet usage of households
in one week for our business plan and model. How many households must we randomly
select to be 95 percent sure that the sample mean is within 1 minute of the population mean
. Assume that a previous survey of household usage has shown = 6.95 minutes.
We are solving for the sample size
A 95% degree confidence corresponds to
= 0.05. Each of the shaded tails in the following
figure has an area of
= 0.025. The region to the left of
and to the right of = 0 is 0.5
– 0.025, or 0.475. In the table of the standard normal ( ) distribution, an area of 0.475
corresponds to a
value of 1.96. The critical value is therefore
The margin of error = 1 and the standard deviation
size, we can calculate :
= 6.95. Using the formula for sample
So we will need to sample at least 186 (rounded up) randomly selected households. With this
sample we will be 95 percent confident that the sample mean
true population of Internet usage.
will be within 1 minute of the
This formula can be used when you know and want to determine the sample size necessary
to establish, with a confidence of
, the mean value to within
. You can still use
this formula if you don’t know your population standard deviation and you have a small
sample size. Although it is unlikely that you know when the population mean is not
known, you may be able to determine from a similar process or from a pilot
STEPS IN SAMPLING PROCESS:
It is the procedure required right from defining a population to the actual selection of sample
elements. There are seven steps involved in this process.
Step 1: Define the population
It is the aggregate of all the elements defined prior to selection of the sample. It is necessary
to define population in terms of
A few examples are given here.
If we were to conduct a survey on the consumption of tea in Gujarat, then these specifications
might be as follows
Sampling units: Households, then housewives
(iii) Extent Gujarat State
(iv) Time January 1-10, 1999
If we were to monitor the sales of a product recently introduced by us, the population might
Element Our product
Sampling units Retail outlets, super markets, then our product
(iii) Extent Delhi and New Delhi
(iv) Time January 7-14, 1999
It may be emphasized that all these four specifications must be contained in the designated
population Omission of any of them would render the definition of population incomplete
Step 2 : Identify the sampling frame
Identifying the sampling frame, which could be a telephone directory, a list of blocks and
localities of a city, a map or any other list consisting of all the sampling units. It may be
pointed out that if the frame is incomplete or otherwise defective, sampling will not be able to
overcome these shortcomings
The question is—How to ensure that the frame is perfect and free from any defect Leslie Kish
has observed that a perfect frame is one where “every element appears on the list separately,
once only once, and nothing else appears on the list” This type of perfect frame would
indicate one-to-one correspondence between frame units and sampling units But such perfect
frames are rather rare Accordingly, one has to use frames with one deficiency or another, but
one should ensure that the frame is not too deficient so as to be given up altogether
This raises a pertinent question -What are the criteria for a suitable frame? In order to
examine the suitability or otherwise of a sampling frame, a number of questions need be
asked. These are
1 Does it adequately cover the population to be surveyed?
2 How complete is the frame? Is every unit that should be included represented?
3 Is it accurate? Is the information about each individual unit correct? Does the frame as a
whole contain units, which no longer exist?
4 Is there any duplication? If so, then the probability of selection is disturbed as a unit can
enter the sample more than once
5 Is the frame up-to-date? It could have met all the criteria when compiled but could well be
deficient when it came to be used This could well be true of all frames involving the human
population as change is taking place continuously
6 How convenient is it to use? Is it readily accessible? Is it arranged in a way suitable for
sampling? Can it easily be re-arranged so as to enable us to introduce stratification and to
undertake multi-stage sampling?
These are demanding criteria and it is most unlikely that any frame would meet them all
Nevertheless, they are the factors to be borne in mind whenever we undertake random
In marketing research most of the frames are from census reports, electoral registers, lists of
member units of trade and industry associations, lists of members of professional bodies, lists
of dwelling units maintained by local bodies, returns from an earlier survey and large scale
Step 3: Specify the sampling unit
The sampling unit is the basic unit containing the elements of the target population. The
sampling unit may be different from the element. For example, if one wanted a sample of
housewives, it might be possible to have access to such a sample directly. However, it is
easier to select households as the sampling unit and then interview housewives in each of the
As mentioned in the preceding step, the sampling frame should be complete and accurate
otherwise the selection of the sampling unit might be defective. It is necessary to get a further
specification of the sampling unit both in personal interviews and in telephone interviews.
Thus, in personal interviews, a pertinent question is—of the several persons in a household,
who should be interviewed? If interviews were held during office timings when the heads of
families and other employed persons are away, interviewing would under-represent employed
persons and over-represent elderly persons, housewives and the unemployed. In view of these
considerations, it is necessary to have a random process of selection of the adult residents of
each household. One method that could be used for this purpose is to list all the eligible
persons living at a particular address and then select one of them.
Step 4: Specify the sampling method
It indicates how the sample units are selected. One of the most important decisions in this
regard is to determine which of the two—probability and non-probability sample—is to be
In case of a probability sample, the probability or chance of every unit in the population
being in the sample is known. Further, the selection of specific units in the sample depends
entirely on chance. No substitution of one unit for another is permissible. This means that no
human judgment is involved in the selection of a sample. In contrast, in a non-probability
sample, the probability of inclusion of any unit in the population in the sample is not known.
In addition, the selection of units within a sample involves human judgment rather than pure
In case of a probability sample, it is possible to measure the sampling error and thereby
determine the degree of precision in the estimates with the help of the theory of probability.
This theory also enables us to consider, from amongst the various possible sample designs,
the one that will give the maximum information per rupee. This is not possible when a nonprobability sample is used.
Probability sampling enables us to choose representative sample designs. It also enables us to
estimate the extent to which the results based on such a sample are likely to be different from
what we would have obtained had we covered the population in our study. Conversely, the
use of probability sampling enables us to determine the sample size for a given degree of
precision, indicating that our sample results do not differ by more than a specified amount
from those yielded by a study covering entire population.
Although non-probability sampling does not yield these benefits, on account of its
convenience and economy, it is often preferred to probability sampling. If the researcher is
convinced that the risks involved in the use of a non-probability sample are more than offset
by its being relatively cheap and convenient, his choice should be in favor of non-probability
There are various types of sample designs that can be covered under the two broad groups,
random or probability samples and non-random or non-probability samples.
Step 5: Determine the sample size
In other words, one has to decide how many elements of the target population are to be
Step 6: Specify the sampling plan
This means that one should indicate how decisions made so far are to be implemented. For
example, if a survey of households is to be conducted, a sampling plan should define a
household, contain instructions to the interviewer as to how he should take a systematic
sample of households, advise him on what he should do when no one is available on his visit
to the household, and so on. These are some pertinent issues in a sampling survey to which a
sampling plan should provide answers.
Step 7: Select the sample
This is the final step in the sampling process. A good deal of office and fieldwork is involved
in the actual selection of the sampling elements. Most of the problems in this stage are faced
by the interviewer while contacting the sample-respondents.
Sampling is done in a wide variety of research settings. Listed below are a few of the benefits
1. Reduced cost: It is obviously less costly to obtain data for a selected subset of a
population, rather than the entire population. Furthermore, data collected through a
carefully selected sample are highly accurate measures of the larger population.
Public opinion researchers can usually draw accurate inferences for the entire
population of the United States from interviews of only 1,000 people.
2. Speed: Observations are easier to collect and summarize with a sample than with a
complete count. This consideration may be vital if the speed of the analysis is
important, such as through exit polls in elections.
3. Greater scope: Sometimes highly trained personnel or specialized equipment limited
in availability must be used to obtain the data. A complete census (enumeration) is not
practical or possible. Thus, surveys that rely on sampling have greater flexibility
regarding the type of information that can be obtained.
It is important to keep in mind that the primary point of sampling is to create a small group
from a population that is as similar to the larger population as possible. In essence, we want
to have a little group that is like the big group. With that in mind, one of the features we look
for in a sample is the degree of representativeness - how well does the sample represent the
larger population from which it was drawn? How closely do the features of the sample
resemble those of the larger population?
There are, of course, good and bad samples, and different sampling methods have different
strengths and weaknesses. Before turning to specific methods, a few specialized terms used in
sampling should be defined.
Samples are always drawn from a population, but we have not defined the term "population."
By "population" we denote the aggregate from which the sample is drawn. The population to
be sampled (the sampled population) should coincide with the population about which
information is wanted (the target population). Sometimes, for reasons of practicality or
convenience, the sampled population is more restricted than the target population. In such
cases, precautions must be taken to secure that the conclusions only refer to the sampled
Before selecting the sample, the population must be divided into parts that are called
sampling units or units. These units must cover the whole of the population and they must not
overlap, in the sense that every element in the population belongs to one and only one unit.
Sometimes the choice of the unit is obvious, as in the case of the population of Americans so
often used for opinion polling. In sampling individuals in a town, the unit might be an
individual person, the members of a family, or all persons living in the same city block. In
sampling an agricultural crop, the unit might be a field, a farm, or an area of land whose
shape and dimensions are at our disposal. The construction of this list of sampling units,
called a frame, is often one of the major practical problems.
Types of Sampling
We may then consider different types of probability samples. Although there are a number of
different methods that might be used to create a sample, they generally can be grouped into
one of two categories: probability samples or non-probability samples.
The idea behind this type is random selection. More specifically, each sample from the
population of interest has a known probability of selection under a given sampling scheme.
There are four categories of probability samples described below.
Simple Random Sampling
The most widely known type of a random sample is the simple random sample (SRS). This is
characterized by the fact that the probability of selection is the same for every case in the
population. Simple random sampling is a method of selecting n units from a population of
size N such that every possible sample of size an has equal chance of being drawn.
An example may make this easier to understand. Imagine you want to carry out a survey of
100 voters in a small town with a population of 1,000 eligible voters. With a town this size,
there are "old-fashioned" ways to draw a sample. For example, we could write the names of
all voters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at
random. You shake the box, draw a piece of paper and set it aside, shake again, draw another,
set it aside, etc. until we had 100 slips of paper. These 100 form our sample. And this sample
would be drawn through a simple random sampling procedure - at each draw, every name in
the box had the same probability of being chosen.
In real-world social research, designs that employ simple random sampling are difficult to
come by. We can imagine some situations where it might be possible - you want to interview
a sample of doctors in a hospital about work conditions. So you get a list of all the physicians
that work in the hospital, write their names on a piece of paper, put those pieces of paper in
the box, shake and draw. But in most real-world instances it is impossible to list everything
on a piece of paper and put it in a box, then randomly draw numbers until desired sample size
There are many reasons why one would choose a different type of probability sample in
Stratified Random Sampling
In this form of sampling, the population is first divided into two or more mutually exclusive
segments based on some categories of variables of interest in the research. It is designed to
organize the population into homogenous subsets before sampling, then drawing a random
sample within each subset. With stratified random sampling the population of N units is
divided into subpopulations of units respectively. These subpopulations, called strata, are
non-overlapping and together they comprise the whole of the population. When these have
been determined, a sample is drawn from each, with a separate draw for each of the different
strata. The sample sizes within the strata are denoted by respectively. If a SRS is taken within
each stratum, then the whole sampling procedure is described as stratified random sampling.
The primary benefit of this method is to ensure that cases from smaller strata of the
population are included in sufficient numbers to allow comparison. An example makes it
easier to understand. Say that you're interested in how job satisfaction varies by race among a
group of employees at a firm. To explore this issue, we need to create a sample of the
employees of the firm. However, the employee population at this particular firm is
predominantly white, as the following chart illustrates:
If we were to take a simple random sample of employees, there's a good chance that we
would end up with very small numbers of Blacks, Asians, and Latinos. That could be
disastrous for our research, since we might end up with too few cases for comparison in one
or more of the smaller groups.
Rather than taking a simple random sample from the firm's population at large, in a stratified
sampling design, we ensure that appropriate numbers of elements are drawn from each racial
group in proportion to the percentage of the population as a whole. Say we want a sample of
1000 employees - we would stratify the sample by race (group of White employees, group of
African American employees, etc.), then randomly draw out 750 employees from the White
group, 90 from the African American, 100 from the Asian, and 60 from the Latino. This
yields a sample that is proportionately representative of the firm as a whole.
Stratification is a common technique. There are many reasons for this, such as:
1. If data of known precision are wanted for certain subpopulations, than each of these
should be treated as a population in its own right.
2. Administrative convenience may dictate the use of stratification, for example, if an
agency administering a survey may have regional offices, which can supervise the
survey for a part of the population.
3. Sampling problems may be inherent with certain sub populations, such as people
living in institutions (e.g. hotels, hospitals, prisons).
4. Stratification may improve the estimates of characteristics of the whole population. It
may be possible to divide a heterogeneous population into sub-populations, each of
which is internally homogenous. If these strata are homogenous, i.e., the
measurements vary little from one unit to another; a precise estimate of any stratum
mean can be obtained from a small sample in that stratum. The estimate can then be
combined into a precise estimate for the whole population.
5. There is also a statistical advantage in the method, as a stratified random sample
nearly always results in a smaller variance for the estimated mean or other population
parameters of interest.
This method of sampling is at first glance very different from SRS. In practice, it is a variant
of simple random sampling that involves some listing of elements - every nth element of list
is then drawn for inclusion in the sample. Say you have a list of 10,000 people and you want
a sample of 1,000.
Creating such a sample includes three steps:
1. Divide number of cases in the population by the desired sample size. In this example,
dividing 10,000 by 1,000 gives a value of 10.
2. Select a random number between one and the value attained in Step 1. In this
example, we choose a number between 1 and 10 - say we pick 7.
3. Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.).
More generally, suppose that the N units in the population are ranked 1 to N in some order
(e.g., alphabetic). To select a sample of n units, we take a unit at random, from the 1st k units
and take every k-th unit thereafter.
The advantages of systematic sampling method over simple random sampling include:
1. It is easier to draw a sample and often easier to execute without mistakes. This is a
particular advantage when the drawing is done in the field.
2. Intuitively, you might think that systematic sampling might be more precise than SRS.
In effect it stratifies the population into n strata, consisting of the 1st k units, the 2nd k
units, and so on. Thus, we might expect the systematic sample to be as precise as a
stratified random sample with one unit per stratum. The difference is that with the
systematic one the units occur at the same relative position in the stratum whereas
with the stratified, the position in the stratum is determined separately by
randomization within each stratum.
In some instances the sampling unit consists of a group or cluster of smaller units that we call
elements or subunits (these are the units of analysis for your study). There are two main
reasons for the widespread application of cluster sampling. Although the first intention may
be to use the elements as sampling units, it is found in many surveys that no reliable list of
elements in the population is available and that it would be prohibitively expensive to
construct such a list. In many countries there are no complete and updated lists of the people,
the houses or the farms in any large geographical region.
Even when a list of individual houses is available, economic considerations may point to the
choice of a larger cluster unit. For a given size of sample, a small unit usually gives more
precise results than a large unit. For example a SRS of 600 houses covers a town more evenly
than 20 city blocks containing an average of 30 houses apiece. But greater field costs are
incurred in locating 600 houses and in traveling between them than in covering 20 city
blocks. When cost is balanced against precision, the larger unit may prove superior.
Important things about cluster sampling:
1. Most large scale surveys are done using cluster sampling;
2. Clustering may be combined with stratification, typically by clustering within strata;
3. In general, for a given sample size n cluster samples are less accurate than the other
types of sampling in the sense that the parameters you estimate will have greater
variability than an SRS, stratified random or systematic sample.
Social research is often conducted in situations where a researcher cannot select the kinds of
probability samples used in large-scale social surveys. For example, say you wanted to study
homelessness - there is no list of homeless individuals nor are you likely to create such a list.
However, you need to get some kind of a sample of respondents in order to conduct your
research. To gather such a sample, you would likely use some form of non-probability
To reiterate, the primary difference between probability methods of sampling and nonprobability methods is that in the latter you do not know the likelihood that any element of a
population will be selected for study.
There are four primary types of non-probability sampling methods:
Availability sampling is a method of choosing subjects who are available or easy to find. This
method is also sometimes referred to as haphazard, accidental, or convenience sampling. The
primary advantage of the method is that it is very easy to carry out, relative to other methods.
A researcher can merely stand out on his/her favorite street corner or in his/her favorite
tavern and hand out surveys. One place this used to show up often is in university courses.
Years ago, researchers often would conduct surveys of students in their large lecture courses.
For example, all students taking introductory sociology courses would have been given a
survey and compelled to fill it out. There are some advantages to this design - it is easy to do,
particularly with a captive audience, and in some schools you can attain a large number of
interviews through this method.
The primary problem with availability sampling is that you can never be certain what
population the participants in the study represent. The population is unknown, the method for
selecting cases is haphazard, and the cases studied probably don't represent any population
you could come up with.
However, there are some situations in which this kind of design has advantages - for
example, survey designers often want to have some people respond to their survey before it is
given out in the "real" research setting as a way of making certain the questions make sense
to respondents. For this purpose, availability sampling is not a bad way to get a group to take
a survey, though in this case researchers care less about the specific responses given than
whether the instrument is confusing or makes people feel bad.
Despite the known flaws with this design, it's remarkably common. Ask a provocative
question, give telephone number and web site address ("Vote now at CNN.com), announce
results of poll. This method provides some form of statistical data on a current issue, but it is
entirely unknown what population the results of such polls represents. At best, a researcher
could make some conditional statement about people who are watching CNN at a particular
point in time who cared enough about the issue in question to log on or call in.
Quota sampling is designed to overcome the most obvious flaw of availability sampling.
Rather than taking just anyone, you set quotas to ensure that the sample you get represents
certain characteristics in proportion to their prevalence in the population. Note that for this
method, you have to know something about the characteristics of the population ahead of
time. Say you want to make sure you have a sample proportional to the population in terms of
gender - you have to know what percentage of the population is male and female, then collect
sample until yours matches. Marketing studies are particularly fond of this form of research
The primary problem with this form of sampling is that even when we know that a quota
sample is representative of the particular characteristics for which quotas have been set, we
have no way of knowing if sample is representative in terms of any other characteristics. If
we set quotas for gender and age, we are likely to attain a sample with good
representativeness on age and gender, but one that may not be very representative in terms of
income and education or other factors.
Moreover, because researchers can set quotas for only a small fraction of the characteristics
relevant to a study quota sampling is really not much better than availability sampling. To
reiterate, you must know the characteristics of the entire population to set quotas; otherwise
there's not much point to setting up quotas. Finally, interviewers often introduce bias when
allowed to self-select respondents, which is usually the case in this form of research. In
choosing males 18-25, interviewers are more likely to choose those that are better-dressed,
seem more approachable or less threatening. That may be understandable from a practical
point of view, but it introduces bias into research findings.
Purposive sampling is a sampling method in which elements are chosen based on purpose of
the study. Purposive sampling may involve studying the entire population of some limited
group (sociology faculty at Columbia) or a subset of a population (Columbia faculty who
have won Nobel Prizes). As with other non-probability sampling methods, purposive
sampling does not produce a sample that is representative of a larger population, but it can be
exactly what is needed in some cases - study of organization, community, or some other
clearly defined and relatively limited group.
Snowball sampling is a method in which a researcher identifies one member of some
population of interest, speaks to him/her, then asks that person to identify others in the
population that the researcher might speak to. This person is then asked to refer the researcher
to yet another person, and so on.
Snowball sampling is very good for cases where members of a special population are difficult
to locate. For example, several studies of Mexican migrants in Los Angeles have used
snowball sampling to get respondents.
The method also has an interesting application to group membership - if you want to look at
pattern of recruitment to a community organization over time, you might begin by
interviewing fairly recent recruits, asking them who introduced them to the group. Then
interview the people named, asking them who recruited them to the group.
The method creates a sample with questionable representativeness. A researcher is not sure
who is in the sample. In effect snowball sampling often leads the researcher into a realm
he/she knows little about. It can be difficult to determine how a sample compares to a larger
population. Also, there's an issue of who respondents refer you to - friends refer to friends,
less likely to refer to ones they don't like, fear, etc.