Sampling
and
Sample Design
Sampling
 Sampling is a process used in statistical analysis in which a
predetermined number of observations are taken from a
larger population.
 The methodology used to sample from a larger population
depends on the type of analysis being performed.
 When you collect any sort of data, especially quantitative
data, whether observational, through surveys or from
secondary data, you need to decide which data to collect
and from whom.
 This is called the sample.
 There are a variety of ways to select your sample, and to
make sure that it gives you results that will be reliable and
credible.
2
Jyoti Rastogi (Assistant
Professor)
Principles Behind Choosing a Sample
Sample must be:
 Representative of the population. In other words, it should
contain similar proportions of subgroups as the whole population,
and not exclude any particular groups, either by method of
sampling or by design, or by who chooses to respond.
 Large enough to give you enough information to avoid
errors. It does not need to be a specific proportion of your
population, but it does need to be at least a certain size so that
you know that your answers are likely to be broadly correct.
If your sample is not representative, you can introduce bias into
the study. If it is not large enough, the study will be imprecise.
However, if you get the relationship between sample and
population right, then you can draw strong conclusions
about the nature of the population.
3
Jyoti Rastogi (Assistant
Professor)
 A sample design is the framework, or road map, that
serves as the basis for the selection of a survey sample
and affects many other important aspects of a survey as
well.
 In a broad context, survey researchers are interested in
obtaining some type of information through a survey for
some population, or universe, of interest.
 One must define a sampling frame that represents the
population of interest, from which a sample is to be drawn.
 The sampling frame may be identical to the population, or
it may be only part of it and is therefore subject to some
under coverage, or it may have an indirect relationship to
the population (e. g. the population is preschool children
and the frame is a listing of preschools). ...
Sampling Design
4
Jyoti Rastogi (Assistant
Professor)
Defining the Population
Defining the Sample Unit
Determining the Sample Frame
Selecting a Sampling Technique
Determining the Sample Size
Execution of Sampling Process
Jyoti Rastogi (Assistant
Professor) 5
Sampling Design Process
Sampling Techniques
Probability or Random Non-probability or Non-random
Jyoti Rastogi (Assistant
Professor) 6
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Area Sampling
Multi Stage Sampling
Judgement Sampling
Convenience Sampling
Quota Sampling
Panel Sampling
Snowball Sampling
Probability Sampling
 Probability sampling methods allow the
researcher to be precise about the
relationship between the sample and the
population.
 This means that you can be absolutely confident
about whether your sample is representative or
not, and you can also put a number on how
certain you are about your findings
Jyoti Rastogi (Assistant
Professor) 7
Simple Random
 In simple random sampling, every member of the population
has an equal chance of being chosen. The drawback is that the
sample may not be genuinely representative. Small but important
sub-sections of the population may not be included.
Advantages
 Simplicity
 Requires little prior knowledge of the population
Disadvantages
 Lower accuracy
 Higher cost
 Lower efficiency
 Samples may be clustered spatially
 Samples may not be representative of the feature attribute(s)
Jyoti Rastogi (Assistant
Professor) 8
Procedure of Simple Random Sampling
Simple Random
Lottery Method
Random
Number Tables
Jyoti Rastogi (Assistant
Professor) 9
Lottery Method
 The method of lottery is the most primitive and
mechanical example of random sampling.
 In this method you will have to number each member of
population in a consequent manner, writing numbers in
separate pieces of paper. These pieces of papers are to be
folded and mixed into a box. Lastly, samples are to be
taken randomly from the box by choosing folded pieces of
papers in a random manner.
 Lottery method suffers from few drawbacks. The process of
writing N number of slips is cumbersome and shuffling a
large number of slips, where population size is very large,
is difficult. Also human bias may enter while choosing the
slips. Hence the other alternative i.e. random numbers can
be used.
Jyoti Rastogi (Assistant
Professor) 10
Random Number Tables Method
 These consist of columns of numbers which have been randomly
prepared. Number of random tables are available e.g. Fisher and
Yates Tables, Tippets random number etc. Listed below is a sequence
of two digit random numbers from Fisher & Yates table:
 61, 44, 65, 22, 01, 67, 76, 23, 57, 58, 54, 11, 33, 86, 07, 26, 75, 76,
64, 22, 19, 35, 74, 49, 86, 58, 69, 52, 27, 34, 91, 25, 34, 67, 76, 73,
27, 16, 53, 18, 19, 69, 32, 52, 38, 72, 38, 64, 81, 79 and 38.
 The first step involves assigning a unique number to each member of
the population e.g. if the population comprises of 20 people then all
individuals are numbered from 01 to 20. If we are to collect a sample
of 5 units then referring to the random number tables 5 double digit
numbers are chosen. E.g. using the above table the units having the
following five numbers will form a sample: 01, 11, 07, 19 and 16. If
the sampling is without replacement and a particular random number
repeats itself then it will not be taken again and the next number that
fits our criteria will be chosen.
Jyoti Rastogi (Assistant
Professor) 11
Systematic random
 Systematic random sampling relies on having a list of the
population, which should ideally be randomly ordered. The
researcher then takes every nth name from the list.
Advantages
 There is no need to assign a unique number to each element.
 It is statistically more efficient if the population elements have
similar characteristics.
Disadvantages
 “Periodicity” in population that coincides with the sampling
ratio, then the randomness is lost.
 There is a “monotonic trend” in population i.e. The sampling
frame has been arranged in some order like a chronological
order or from smallest to largest etc.
Jyoti Rastogi (Assistant
Professor) 12
Stratified Random
 An alternative method called stratified random
sampling. This method divides the population into smaller
homogeneous groups, called strata, and then takes a
random sample from each stratum.
 Ex- Understudies of school can be separated into strata on
the premise of sexual orientation, courses offered, age and
so forth. In this the population is initially partitioned into
strata and afterward a basic irregular specimen is taken
from every stratum.
Jyoti Rastogi (Assistant
Professor) 13
Types of Stratified Sampling
Stratified
Sampling
Proportionate
Stratified
Sampling
Disproportionate
Stratified
Sampling
Jyoti Rastogi (Assistant
Professor) 14
Proportionate Stratified Sampling
 In this the number of units selected from each stratum is
proportionate to the share of stratum in the population.
 Ex- In a college there are total 2500 students out of which
1500 students are enrolled in graduate courses and 1000
are enrolled in post graduate courses. If a sample of 100 is
to be chosen using proportionate stratified sampling then
the number of undergraduate students in sample would be
60 and 40 would be post graduate students. Thus the two
strata are represented in the same proportion in the
sample as is their representation in the population.
 This method is most suitable when the purpose of sampling
is to estimate the population value of some characteristic
and there is no difference in within- stratum variances.
Jyoti Rastogi (Assistant
Professor) 15
Disproportionate Stratified Sampling
 In disproportionate stratified random sampling, the different
strata do not have the same sampling fractions as each other.
 For instance, if your four strata contain 200, 400, 600, and
800 people, you may choose to have different sampling
fractions for each stratum.
 Perhaps the first stratum with 200 people has a sampling
fraction of ½, resulting in 100 people selected for the sample,
while the last stratum with 800 people has a sampling fraction
of ¼, resulting in 200 people selected for the sample.
 The precision of using disproportionate stratified random
sampling is highly dependent on the sampling fractions chosen
and used by the researcher.
 Here, the researcher must be very careful and know exactly
what they are doing. Mistakes made in choosing and using
sampling fractions could result in a stratum that is over-
represented or under-represented, resulting in skewed
results.
Jyoti Rastogi (Assistant
Professor) 16
Advantages of Stratified Sampling
 Stratified random sampling is superior to simple random
sampling because the process of stratifying reduces sampling
error and ensures a greater level of representation.
 Thanks to the choice of stratified random sampling adequate
representation of all subgroups can be ensured.
 When there is homogeneity within strata and heterogeneity
between strata, the estimates can be as precise (or even more
precise) as with the use of simple random sampling.
Disadvantages of Stratified Sampling
 The application of stratified random sampling requires the
knowledge of strata membership a priori. The requirement to be
able to easily distinguish between strata in the sample frame may
create difficulties in practical levels.
 Research process may take longer and prove to be more
expensive due to the extra stage in the sampling procedure.
 The choice of stratified sampling method adds certain complexity
to the analysis plan.
Jyoti Rastogi (Assistant
Professor) 17
Cluster sampling
 Cluster sampling is used in statistics when natural
groups are present in a population.
 Designed to address problems of a widespread
geographical population. Random sampling from a large
population is likely to lead to high costs of access. This can
be overcome by dividing the population into clusters,
selecting only two or three clusters, and sampling from
within those. For example, if you wished to find out about
the use of transport in urban areas in the UK, you could
randomly select just two or three cities, and then sample
fully from within these.
Jyoti Rastogi (Assistant
Professor) 18
Difference Between Cluster Sampling and
Stratified Sampling
 For a stratified random sample, a population is divided into
stratum, or sub-populations, before sampling.
 At first glance, the two techniques seem very similar.
However, in cluster sampling the actual cluster is
the sampling unit; in stratified sampling, analysis is done
on elements within each strata.
 In cluster sampling, a researcher will only study selected
clusters; with stratified sampling, a random sample is
drawn from each strata.
Jyoti Rastogi (Assistant
Professor) 19
Area Sampling
 Area sampling involves sampling from a map, an aerial
photograph, or a similar area frame. It is often the sampling
method of choice when a sampling frame isn’t available.
 For example, a city map can be divided into equal size blocks,
from which random samples can be drawn. Although area
sampling is most often associated with maps.
 Clusters and Sub sampling
 The samples drawn from an area frame are often referred to
as clusters. These clusters may be sub sampled several more
times.
 For example, let’s say you wanted to sample from a population of
middle school students. The first sample might be drawn from a
list of school districts, the second sample from a list of schools,
the third a list of classes and then finally a list of students within
those classes. The “frame” in this example is the four successive
layers.
Jyoti Rastogi (Assistant
Professor) 20
Area Sampling
Advantages
 Area frames can be used for multiple variables at the same time. For
example, an area sample on a city can collect data on land use,
population and income statistics.
 There’s no overlap between sampling units; Every unit has an equal
chance of being selected. This complete coverage results in unbiased
estimates.
Disadvantages
 Although the area frames can be used in subsequent surveys, they can
quickly become outdated (for example, if a city undergoes tremendous
growth).
 Area frames can be costly to build.
 Outliers can be a problem, especially if your map has a few particularly
dense or sparse areas (for example a city that has a national park in its
boundaries might have zero population in some areas and a huge
population in another.
Jyoti Rastogi (Assistant
Professor) 21
Multistage Sampling
 Multi-stage sampling (also known as multi-stage cluster
sampling) is a more complex form of cluster sampling which
contains two or more stages in sample selection.
 A combination of stratified sampling or cluster
sampling and simple random sampling is usually used.
Advantages of Multi-Stage Sampling
 Effective in primary data collection from geographically
dispersed. population when face-to-face contact in required
(e.g. semi-structured in-depth interviews)
 Cost-effectiveness and time-effectiveness.
 High level of flexibility.
Disadvantages of Multi-Stage Sampling
 High level of subjectivity.
 Research findings can never be 100% representative of
population.
 The presence of group-level information is required.
Jyoti Rastogi (Assistant
Professor) 22
Real Life Examples of Multistage Sampling
 The Census Bureau uses multistage sampling for the U.S.
National Centre for Health Statistics’ National Health Interview
Survey (NHIS). A multistage probability sample of 42,000
households in 376 probability sampling units.
 The Gallup poll uses multistage sampling. For example, they
might randomly choose a certain number of area codes then
randomly sample a number of phone numbers from within each
area code.
 Johnston et. al’s survey on drug use in high schools used three
stage sampling: geographic areas, followed by high schools within
those areas, followed by senior students in those schools.
 The Australian Bureau of Statistics divides cities into
“collection districts”, then blocks, then households. Each stage
uses random sampling, creating a need to list specific households
only after the final stage of sampling.
Jyoti Rastogi (Assistant
Professor) 23
Non Probability Sampling
Jyoti Rastogi (Assistant
Professor) 24
To be continued...

SAMPLE DESIGN, Probability sampling

  • 1.
  • 2.
    Sampling  Sampling isa process used in statistical analysis in which a predetermined number of observations are taken from a larger population.  The methodology used to sample from a larger population depends on the type of analysis being performed.  When you collect any sort of data, especially quantitative data, whether observational, through surveys or from secondary data, you need to decide which data to collect and from whom.  This is called the sample.  There are a variety of ways to select your sample, and to make sure that it gives you results that will be reliable and credible. 2 Jyoti Rastogi (Assistant Professor)
  • 3.
    Principles Behind Choosinga Sample Sample must be:  Representative of the population. In other words, it should contain similar proportions of subgroups as the whole population, and not exclude any particular groups, either by method of sampling or by design, or by who chooses to respond.  Large enough to give you enough information to avoid errors. It does not need to be a specific proportion of your population, but it does need to be at least a certain size so that you know that your answers are likely to be broadly correct. If your sample is not representative, you can introduce bias into the study. If it is not large enough, the study will be imprecise. However, if you get the relationship between sample and population right, then you can draw strong conclusions about the nature of the population. 3 Jyoti Rastogi (Assistant Professor)
  • 4.
     A sampledesign is the framework, or road map, that serves as the basis for the selection of a survey sample and affects many other important aspects of a survey as well.  In a broad context, survey researchers are interested in obtaining some type of information through a survey for some population, or universe, of interest.  One must define a sampling frame that represents the population of interest, from which a sample is to be drawn.  The sampling frame may be identical to the population, or it may be only part of it and is therefore subject to some under coverage, or it may have an indirect relationship to the population (e. g. the population is preschool children and the frame is a listing of preschools). ... Sampling Design 4 Jyoti Rastogi (Assistant Professor)
  • 5.
    Defining the Population Definingthe Sample Unit Determining the Sample Frame Selecting a Sampling Technique Determining the Sample Size Execution of Sampling Process Jyoti Rastogi (Assistant Professor) 5 Sampling Design Process
  • 6.
    Sampling Techniques Probability orRandom Non-probability or Non-random Jyoti Rastogi (Assistant Professor) 6 Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Area Sampling Multi Stage Sampling Judgement Sampling Convenience Sampling Quota Sampling Panel Sampling Snowball Sampling
  • 7.
    Probability Sampling  Probabilitysampling methods allow the researcher to be precise about the relationship between the sample and the population.  This means that you can be absolutely confident about whether your sample is representative or not, and you can also put a number on how certain you are about your findings Jyoti Rastogi (Assistant Professor) 7
  • 8.
    Simple Random  Insimple random sampling, every member of the population has an equal chance of being chosen. The drawback is that the sample may not be genuinely representative. Small but important sub-sections of the population may not be included. Advantages  Simplicity  Requires little prior knowledge of the population Disadvantages  Lower accuracy  Higher cost  Lower efficiency  Samples may be clustered spatially  Samples may not be representative of the feature attribute(s) Jyoti Rastogi (Assistant Professor) 8
  • 9.
    Procedure of SimpleRandom Sampling Simple Random Lottery Method Random Number Tables Jyoti Rastogi (Assistant Professor) 9
  • 10.
    Lottery Method  Themethod of lottery is the most primitive and mechanical example of random sampling.  In this method you will have to number each member of population in a consequent manner, writing numbers in separate pieces of paper. These pieces of papers are to be folded and mixed into a box. Lastly, samples are to be taken randomly from the box by choosing folded pieces of papers in a random manner.  Lottery method suffers from few drawbacks. The process of writing N number of slips is cumbersome and shuffling a large number of slips, where population size is very large, is difficult. Also human bias may enter while choosing the slips. Hence the other alternative i.e. random numbers can be used. Jyoti Rastogi (Assistant Professor) 10
  • 11.
    Random Number TablesMethod  These consist of columns of numbers which have been randomly prepared. Number of random tables are available e.g. Fisher and Yates Tables, Tippets random number etc. Listed below is a sequence of two digit random numbers from Fisher & Yates table:  61, 44, 65, 22, 01, 67, 76, 23, 57, 58, 54, 11, 33, 86, 07, 26, 75, 76, 64, 22, 19, 35, 74, 49, 86, 58, 69, 52, 27, 34, 91, 25, 34, 67, 76, 73, 27, 16, 53, 18, 19, 69, 32, 52, 38, 72, 38, 64, 81, 79 and 38.  The first step involves assigning a unique number to each member of the population e.g. if the population comprises of 20 people then all individuals are numbered from 01 to 20. If we are to collect a sample of 5 units then referring to the random number tables 5 double digit numbers are chosen. E.g. using the above table the units having the following five numbers will form a sample: 01, 11, 07, 19 and 16. If the sampling is without replacement and a particular random number repeats itself then it will not be taken again and the next number that fits our criteria will be chosen. Jyoti Rastogi (Assistant Professor) 11
  • 12.
    Systematic random  Systematicrandom sampling relies on having a list of the population, which should ideally be randomly ordered. The researcher then takes every nth name from the list. Advantages  There is no need to assign a unique number to each element.  It is statistically more efficient if the population elements have similar characteristics. Disadvantages  “Periodicity” in population that coincides with the sampling ratio, then the randomness is lost.  There is a “monotonic trend” in population i.e. The sampling frame has been arranged in some order like a chronological order or from smallest to largest etc. Jyoti Rastogi (Assistant Professor) 12
  • 13.
    Stratified Random  Analternative method called stratified random sampling. This method divides the population into smaller homogeneous groups, called strata, and then takes a random sample from each stratum.  Ex- Understudies of school can be separated into strata on the premise of sexual orientation, courses offered, age and so forth. In this the population is initially partitioned into strata and afterward a basic irregular specimen is taken from every stratum. Jyoti Rastogi (Assistant Professor) 13
  • 14.
    Types of StratifiedSampling Stratified Sampling Proportionate Stratified Sampling Disproportionate Stratified Sampling Jyoti Rastogi (Assistant Professor) 14
  • 15.
    Proportionate Stratified Sampling In this the number of units selected from each stratum is proportionate to the share of stratum in the population.  Ex- In a college there are total 2500 students out of which 1500 students are enrolled in graduate courses and 1000 are enrolled in post graduate courses. If a sample of 100 is to be chosen using proportionate stratified sampling then the number of undergraduate students in sample would be 60 and 40 would be post graduate students. Thus the two strata are represented in the same proportion in the sample as is their representation in the population.  This method is most suitable when the purpose of sampling is to estimate the population value of some characteristic and there is no difference in within- stratum variances. Jyoti Rastogi (Assistant Professor) 15
  • 16.
    Disproportionate Stratified Sampling In disproportionate stratified random sampling, the different strata do not have the same sampling fractions as each other.  For instance, if your four strata contain 200, 400, 600, and 800 people, you may choose to have different sampling fractions for each stratum.  Perhaps the first stratum with 200 people has a sampling fraction of ½, resulting in 100 people selected for the sample, while the last stratum with 800 people has a sampling fraction of ¼, resulting in 200 people selected for the sample.  The precision of using disproportionate stratified random sampling is highly dependent on the sampling fractions chosen and used by the researcher.  Here, the researcher must be very careful and know exactly what they are doing. Mistakes made in choosing and using sampling fractions could result in a stratum that is over- represented or under-represented, resulting in skewed results. Jyoti Rastogi (Assistant Professor) 16
  • 17.
    Advantages of StratifiedSampling  Stratified random sampling is superior to simple random sampling because the process of stratifying reduces sampling error and ensures a greater level of representation.  Thanks to the choice of stratified random sampling adequate representation of all subgroups can be ensured.  When there is homogeneity within strata and heterogeneity between strata, the estimates can be as precise (or even more precise) as with the use of simple random sampling. Disadvantages of Stratified Sampling  The application of stratified random sampling requires the knowledge of strata membership a priori. The requirement to be able to easily distinguish between strata in the sample frame may create difficulties in practical levels.  Research process may take longer and prove to be more expensive due to the extra stage in the sampling procedure.  The choice of stratified sampling method adds certain complexity to the analysis plan. Jyoti Rastogi (Assistant Professor) 17
  • 18.
    Cluster sampling  Clustersampling is used in statistics when natural groups are present in a population.  Designed to address problems of a widespread geographical population. Random sampling from a large population is likely to lead to high costs of access. This can be overcome by dividing the population into clusters, selecting only two or three clusters, and sampling from within those. For example, if you wished to find out about the use of transport in urban areas in the UK, you could randomly select just two or three cities, and then sample fully from within these. Jyoti Rastogi (Assistant Professor) 18
  • 19.
    Difference Between ClusterSampling and Stratified Sampling  For a stratified random sample, a population is divided into stratum, or sub-populations, before sampling.  At first glance, the two techniques seem very similar. However, in cluster sampling the actual cluster is the sampling unit; in stratified sampling, analysis is done on elements within each strata.  In cluster sampling, a researcher will only study selected clusters; with stratified sampling, a random sample is drawn from each strata. Jyoti Rastogi (Assistant Professor) 19
  • 20.
    Area Sampling  Areasampling involves sampling from a map, an aerial photograph, or a similar area frame. It is often the sampling method of choice when a sampling frame isn’t available.  For example, a city map can be divided into equal size blocks, from which random samples can be drawn. Although area sampling is most often associated with maps.  Clusters and Sub sampling  The samples drawn from an area frame are often referred to as clusters. These clusters may be sub sampled several more times.  For example, let’s say you wanted to sample from a population of middle school students. The first sample might be drawn from a list of school districts, the second sample from a list of schools, the third a list of classes and then finally a list of students within those classes. The “frame” in this example is the four successive layers. Jyoti Rastogi (Assistant Professor) 20
  • 21.
    Area Sampling Advantages  Areaframes can be used for multiple variables at the same time. For example, an area sample on a city can collect data on land use, population and income statistics.  There’s no overlap between sampling units; Every unit has an equal chance of being selected. This complete coverage results in unbiased estimates. Disadvantages  Although the area frames can be used in subsequent surveys, they can quickly become outdated (for example, if a city undergoes tremendous growth).  Area frames can be costly to build.  Outliers can be a problem, especially if your map has a few particularly dense or sparse areas (for example a city that has a national park in its boundaries might have zero population in some areas and a huge population in another. Jyoti Rastogi (Assistant Professor) 21
  • 22.
    Multistage Sampling  Multi-stagesampling (also known as multi-stage cluster sampling) is a more complex form of cluster sampling which contains two or more stages in sample selection.  A combination of stratified sampling or cluster sampling and simple random sampling is usually used. Advantages of Multi-Stage Sampling  Effective in primary data collection from geographically dispersed. population when face-to-face contact in required (e.g. semi-structured in-depth interviews)  Cost-effectiveness and time-effectiveness.  High level of flexibility. Disadvantages of Multi-Stage Sampling  High level of subjectivity.  Research findings can never be 100% representative of population.  The presence of group-level information is required. Jyoti Rastogi (Assistant Professor) 22
  • 23.
    Real Life Examplesof Multistage Sampling  The Census Bureau uses multistage sampling for the U.S. National Centre for Health Statistics’ National Health Interview Survey (NHIS). A multistage probability sample of 42,000 households in 376 probability sampling units.  The Gallup poll uses multistage sampling. For example, they might randomly choose a certain number of area codes then randomly sample a number of phone numbers from within each area code.  Johnston et. al’s survey on drug use in high schools used three stage sampling: geographic areas, followed by high schools within those areas, followed by senior students in those schools.  The Australian Bureau of Statistics divides cities into “collection districts”, then blocks, then households. Each stage uses random sampling, creating a need to list specific households only after the final stage of sampling. Jyoti Rastogi (Assistant Professor) 23
  • 24.
    Non Probability Sampling JyotiRastogi (Assistant Professor) 24 To be continued...