Data sampling.pptx

• Data sampling is a statistical analysis technique used to select,
manipulate and analyze a representative subset of data points to
identify patterns and trends in the larger data set being examined. It
enables data scientists, predictive modelers and other data analysts to
work with a small, manageable amount of data about a
statistical population to build and run analytical models more quickly,
while still producing accurate findings.
• Sampling can be particularly useful with data sets that are too large to
efficiently analyze in full -- for example, in big data
analytics applications or surveys, Identifying and analyzing a
representative sample is more efficient and cost-effective than
surveying the entirety of the data or population.

Populations and Samples
• Population: Population is the group of elements which has
common characteristics. It is a collection of observations
about which we would like to make inferences.
• Sample: A sample is the subset of population
• Sampling: a collection of samples from the population is a
sampling. In other words, sampling units are an overlapping
collection of elements from the population.

• An important consideration, though, is the size of the required
data sample and the possibility of introducing a sampling error. In
some cases, a small sample can reveal the most important
information about a data set. In others, using a larger sample can
increase the likelihood of accurately representing the data as a
whole, even though the increased size of the sample may impede
ease of manipulation and interpretation.

Sampling Error
• Sampling error is the deviation between the estimate of an ideal
sample and the true population.
• The core assumption of data sampling is that samples are a
subset of the population, and the sample mean is equal to the
mean of the population.
• To the degree that doesn’t happen is the term Sampling Error
• We can reduce sampling error by following sampling best
practices, like having a large enough sample size, choosing the
right kind of sampling to do, and avoiding sampling bias.

Data Sampling Methods
When taking a sample from a larger population you must
make sure that the samples are an appropriate size and
without bias.
There are two types of sampling
• Probability sampling
• Non-probability sampling

Probability Sampling:
Every element in the sample population has an equal chance of
being selected. A sampling method is biased if every member of
the population doesn’t have equal likelihood of being in the
sample.
Different types of probability sampling
• Simple random sampling
• Stratified sampling
• Systematic sampling
• Cluster sampling

Simple random sampling:
• It is a method of sampling in which every element of the
universe has equal probability of being chosen. For example,
choose an individual from a lottery. The advantage of this
method is free from personal bias, and the universe gets fairly
represented by samples.

Stratified sampling:
• The population is broken down into non-overlapping groups. In other
words, strata (elements within the subgroups are homogenous or
heterogeneous). Then random samples are taken from each strata, so
that entire population gets represented. The advantage of this method is
it covers all the elements of the population. But there is a possibility of
bias at the time of classification of population.

Systematic sampling:
• Samples are selected from the population according to a pre
determined rule. In other words, every nth element selected from
the population as a sample. Arrange all the elements in a
sequence and then select the samples from the population at
regular intervals.

Cluster sampling:
• The population is broken down into many different clusters, and
then clusters or subgroups are randomly selected. For example,
clusters are of different ages, sex, locations etc.

Different types of non-probability
sampling
• Purposive sampling
• Convenience sampling
• Quota sampling
• Snowball/referral sampling

Purposive sampling:
• Purposive sampling is also
known as judgment sampling.
Samples are selected based on
the purpose or intention of
research. The method is flexible
to allow the inclusion of those
items in the sample which are
of special significance.

Convenience sampling:
• Convenience sampling is
one of the easiest
sampling methods.
Samples selection is
based on availability and
selecting the samples that
are convenient to the
researcher.

Quota sampling:
• It is one type of stratified
sampling, where samples
are collected in each
subgroup until the desired
quota is met. The
proportion of this sample
does not match the
proportion of the group to
the population.

Snowball/referral sampling:
• Snowball sampling or referral
sampling is the method famous in
medical and social science surveys
where the population is unknown
and difficult to get the sample. Hence
researchers will take help from the
existing elements to refer the others
as samples who can fit in the
population. Since it is based on
referrals, there is a chance of bias.

Kinds of Sampling Bias
Sampling bias is a bias in which samples are collected in such a
way that some elements of the intended population have less or
more sampling probability than the others.
Following are the different types of sampling bias
• Response Bias: A response or data bias is a systematic bias that
occurs during data collection that influences the response.

• Voluntary response Bias: Occurs when individuals can chose to
participate.
• Non response Bias: Non response bias occurs when units
selected as part of the sampling procedure do not respond in
whole or part.
• Convenience Bias: When sample is taken from individuals that
are conveniently available.

Data sampling.pptx

More Related Content

Similar to Data sampling.pptx

Recently uploaded

Data sampling.pptx