SlideShare a Scribd company logo
1 of 82
Download to read offline
SAMPLING
METHODS
Data science and AI Certification Course
Visit: Learnbay.co
Populations vs. Samples
oWho = Population:
nall individuals of interest
n US Voters, Dentists, College students, Children
oWhat = Parameter
nCharacteristic of population
oProblem: can’t study/survey whole pop
oSolution: Use a sample for the “who”
nsubset, selected from population
ncalculate a statistic for the “what” Visit: Learnbay.co
Types of Sampling
oSimple Random Sampling
oStratified Random Sampling
oCluster Sampling
oSystematic Sampling
oRepresentative Sampling
(Can be stratified random or quota sampling)
oConvenience or Haphazard Sampling
oSampling with Replacement vs. Sampling
without Replacement Visit: Learnbay.co
Types of Statistics
§ Descriptive statistics:
§ Organize and summarize scores from samples
§ Inferential statistics:
§ Infer information about the population based on
what we know from sample data
§ Decide if an experimental manipulation has had
an effect
Visit: Learnbay.co
THE BIG PICTURE OF STATISTICS
Theory
Question to answer / Hypothesis to test
Design Research Study
Collect Data
(measurements, observations)
Organize and make sense of the #s
USING STATISTICS!
Depends on our goal:
Describe characteristics Test hypothesis, Make conclusions,
organize, summarize, condense data interpret data, understand relations
DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS
Visit: Learnbay.co
Types of Variables
oQuantitative
n Measured in amounts
n Ht, Wt, Test score
lDiscrete:
l separate categories
l Letter grade
oQualitative
n Measured in categories
n Gender, race, diagnosis
lContinuous:
l infinite values in between
l GPA
Visit: Learnbay.co
Scales of Measurement
oNominal Scale: Categories, labels, data carry no numerical
value
oOrdinal Scale: Rank ordered data, but no information about
distance between ranks
oInterval Scale: Degree of distance between scores can be
assessed with standard sized intervals
oRatio Scale: Same as interval scale with an absolute zero
point.
Visit: Learnbay.co
SAMPLING
o A sample is “a smaller (but hopefully representative)
collection of units from a population used to determine
truths about that population” (Field, 2005)
o Why sample?
n Resources (time, money) and workload
n Gives results with known accuracy that can be calculated
mathematically
o The sampling frame is the list from which the potential
respondents are drawn
n Registrar’s office
n Class rosters
n Must assess sampling frame errors
Visit: Learnbay.co
SAMPLING……
oWhat is your population of interest?
pTo whom do you want to generalize your
results?
nAll doctors
nSchool children
nIndians
nWomen aged 15-45 years
nOther
oCan you sample the entire population? Visit: Learnbay.co
SAMPLING…….
o3 factors that influence sample representative-
ness
pSampling procedure
pSample size
pParticipation (response)
oWhen might you sample the entire population?
pWhen your population is very small
pWhen you have extensive resources
pWhen you don’t expect a very high response
Visit: Learnbay.co
SAMPLING BREAKDOWN
Visit: Learnbay.co
SAMPLING…….
TARGET POPULATION
STUDY POPULATION
SAMPLE
Visit: Learnbay.co
Types of Samples
o Probability (Random) Samples
o Simple random sample
n Systematic random sample
n Stratified random sample
n Multistage sample
n Multiphase sample
n Cluster sample
o Non-Probability Samples
n Convenience sample
n Purposive sample
n Quota Visit: Learnbay.co
SIMPLE RANDOM SAMPLING
• Applicable when population is small, homogeneous &
readily available
• All subsets of the frame are given an equal
probability. Each element of the frame thus has an
equal probability of selection.
• It provides for greatest number of possible samples.
This is done by assigning a number to each unit in the
sampling frame.
• A table of random number or lottery system is used
to determine which units are to be selected.
Visit: Learnbay.co
SIMPLE RANDOM SAMPLING….
o Estimates are easy to calculate.
o Disadvantages
o If sampling frame large, this method impracticable.
o Minority subgroups of interest in population may not
be present in sample in sufficient numbers for study.
Visit: Learnbay.co
SYSTEMATIC SAMPLING
o Systematic sampling relies on arranging the target
population according to some ordering scheme and then
selecting elements at regular intervals through that
ordered list.
o Systematic sampling involves a random start and then
proceeds with the selection of every kth element from
then onwards. In this case, k=(population size/sample size).
o It is important that the starting point is not automatically
the first in the list, but is instead randomly chosen from
within the first to the kth element in the list.
o A simple example would be to select every 10th name from
the telephone directory (an 'every 10th' sample, also
referred to as 'sampling with a skip of 10').
Visit: Learnbay.co
SYSTEMATIC SAMPLING……
As described above, systematic sampling is a method, because all
elements have the same probability of selection (in the example
given, one in ten). It is not 'simple random sampling' because
different subsets of the same size have different selection
probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten
probability of selection, but the set {4,13,24,34,...} has zero
probability of selection.
Visit: Learnbay.co
SYSTEMATIC SAMPLING……
o ADVANTAGES:
o Sample easy to select
o Suitable sampling frame can be identified easily
o Sample evenly spread over entire reference
population
o DISADVANTAGES:
o Sample may be biased if hidden periodicity in
population coincides with that of selection.
o Difficult to assess precision of estimate from one
survey.
Visit: Learnbay.co
STRATIFIED SAMPLING
Where population embraces a number of distinct categories,
the frame can be organized into separate "strata." Each
stratum is then sampled as an independent sub-population,
out of which individual elements can be randomly selected.
— Every unit in a stratum has same chance of being selected.
— Using same sampling fraction for all strata ensures
proportionate representation in the sample.
— Adequate representation of minority subgroups of interest
can be ensured by stratification & varying sampling fraction
between strata as required.
Visit: Learnbay.co
STRATIFIED SAMPLING……
o Finally, since each stratum is treated as an independent population,
different sampling approaches can be applied to different strata.
o Drawbacks to using stratified sampling.
o First, sampling frame of entire population has to be prepared
separately for each stratum
o Second, when examining multiple criteria, stratifying variables may
be related to some, but not to others, further complicating the
design, and potentially reducing the utility of the strata.
o Finally, in some cases (such as designs with a large number of
strata, or those with a specified minimum sample size per group),
stratified sampling can potentially require a larger sample than
would other methods
Visit: Learnbay.co
STRATIFIED SAMPLING…….
Draw a sample from each stratum
Visit: Learnbay.co
CLUSTER SAMPLING
oCluster sampling is an example of 'two-stage
sampling' .
o First stage a sample of areas is chosen;
o Second stage a sample of respondents within
those areas is selected.
o Population divided into clusters of homogeneous
units, usually based on geographical contiguity.
oSampling units are groups rather than individuals.
oA sample of such clusters is then selected.
oAll units from the selected clusters are studied.
Visit: Learnbay.co
CLUSTER SAMPLING…….
oAdvantages :
oCuts down on the cost of preparing a sampling
frame.
oThis can reduce travel and other administrative
costs.
oDisadvantages: sampling error is higher for a
simple random sample of same size.
oOften used to evaluate vaccination coverage
Visit: Learnbay.co
CLUSTER SAMPLING…….
• Identification of clusters
– List all cities, towns, villages & wards of cities with their
population falling in target area under study.
– Calculate cumulative population & divide by 30, this gives
sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1st cluster.
– Random no.+ sampling interval = population of 2nd cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling interval
Visit: Learnbay.co
CLUSTER SAMPLING…….
Two types of cluster sampling methods.
One-stage sampling. All of the elements within
selected clusters are included in the sample.
Two-stage sampling. A subset of elements within
selected clusters are randomly selected for
inclusion in the sample.
Visit: Learnbay.co
Difference Between Strata and
Clusters
— Although strata and clusters are both non-overlapping
subsets of the population, they differ in several ways.
— All strata are represented in the sample; but only a
subset of clusters are in the sample.
— With stratified sampling, the best survey results
occur when elements within strata are internally
homogeneous. However, with cluster sampling, the
best results occur when elements within clusters are
internally heterogeneous
Visit: Learnbay.co
CONVENIENCE SAMPLING
o Sometimes known as grab or opportunity sampling or
accidental or haphazard sampling.
o A type of nonprobability sampling which involves the sample
being drawn from that part of the population which is close to
hand. That is, readily available and convenient.
o The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample
because it would not be representative enough.
o For example, if the interviewer was to conduct a survey at a
shopping center early in the morning on a given day, the people
that he/she could interview would be limited to those given
there at that given time, which would not represent the views
of other members of society in such an area, if the survey was
to be conducted at different times of day and several times
per week.
o This type of sampling is most useful for pilot testing.
o In social science research, snowball sampling is a similar
technique, where existing study subjects are used to recruit
more subjects into the sample.
Visit: Learnbay.co
CONVENIENCE SAMPLING…….
— Use results that are easy to get
28
Visit: Learnbay.co
Statistical Inference
The process of making guesses about the
truth from a sample.
Sample
(observation)
Make guesses about
the whole
population
Truth (not
observable)
N
x
N
i
i
2
12
)( µ
s
-
=
å=
N
x
N
i
å=
= 1
µ
Population
parameters
1
)(
ˆ
2
122
-
-
==
å=
n
Xx
s
n
n
i
i
s
n
x
X
n
i
n
å=
== 1
ˆµ
Sample
statistics
*hat notation ^ is often used to indicate
“estimate”
Visit:
Learnbay.co
Interval Estimation
! Population Mean: s Known
! Population Mean: s Unknown
! Determining the Sample Size
! Population Proportion
Visit: Learnbay.co
A point estimator cannot be expected to provide the
exact value of the population parameter.
An interval estimate can be computed by adding and
subtracting a margin of error to the point estimate.
Point Estimate +/- Margin of Error
The purpose of an interval estimate is to provide
information about how close the point estimate is to
the value of the parameter.
Margin of Error and the Interval Estimate
Visit: Learnbay.co
The general form of an interval estimate of a
population mean is
Margin of Error and the Interval Estimate
Margin of Errorx ±
Visit: Learnbay.co
33Slide
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied
or duplicated, or posted to a publicly accessible website, in whole or in part.
µ
a/2 a/2
1 - a of all
values
Sampling
distribution
of
[------------------------- -------------------------]
[------------------------- -------------------------]
[------------------------- -------------------------]
interval
does not
include µ interval
includes µ
interval
includes µ
Interval Estimate of a Population Mean:
s Known
x
x
x
z xa s/2z xa s/2
x
x
x Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Known
! In order to develop an interval estimate of a
population mean, the margin of error must be
computed using either:
• the population standard deviation s , or
• the sample standard deviation s
! s is rarely known exactly, but often a good estimate
can be obtained based on historical data or other
information.
! We refer to such cases as the s known case.
Visit: Learnbay.co
There is a 1 - a probability that the value of a
sample mean will provide a margin of error of
or less.
µ
a/2 a/21 - a of all
values
Sampling
distribution
of
Interval Estimate of a Population Mean:
s Known
z xa s/2
x
x
x
z xa s/2
z xa s/2
Visit: Learnbay.co
µ
a/2 a/2
1 - a of all
values
Sampling
distribution
of
[------------------------- -------------------------]
[------------------------- -------------------------]
[------------------------- -------------------------]
interval
does not
include µ interval
includes µ
interval
includes µ
Interval Estimate of a Population Mean:
s Known
x
x
x
z xa s/2z xa s/2
x
x
x
Visit: Learnbay.co
! Interval Estimate of µ
Interval Estimate of a Population Mean:
s Known
where: is the sample mean
1 -a is the confidence coefficient
za/2 is the z value providing an area of
a/2 in the upper tail of the standard
normal probability distribution
s is the population standard deviation
n is the sample size
x z
n
± a
s
/2
x
Visit: Learnbay.co
! Interval Estimate of µ
Interval Estimate of a Population Mean:
s Known
where: is the sample mean
1 -a is the confidence coefficient
za/2 is the z value providing an area of
a/2 in the upper tail of the standard
normal probability distribution
s is the population standard deviation
n is the sample size
x z
n
± a
s
/2
x
Visit: Learnbay.co
Meaning of Confidence
Because 90% of all the intervals constructed using
will contain the population mean,
we say we are 90% confident that the interval
includes the population mean µ.
We say that this interval has been established at the
90% confidence level.
The value .90 is referred to as the confidence
coefficient.
xx s645.1±
xx s645.1±
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Known
! Example: Discount Sounds
Discount Sounds has 260 retail outlets throughout
the United States. The firm is evaluating a potential
location for a new outlet, based in part, on the mean
annual income of the individuals in the marketing
area of the new location.
A sample of size n = 36 was taken; the sample
mean income is $41,100. The population is not
believed to be highly skewed. The population
standard deviation is estimated to be $4,500, and the
confidence coefficient to be used in the interval
estimate is .95.
Visit: Learnbay.co
95% of the sample means that can be observed
are within + 1.96 of the population mean µ.
The margin of error is:
Thus, at 95% confidence,
the margin of error is $1,470.
Interval Estimate of a Population Mean:
s Known
! Example: Discount Sounds
sx
a
s æ ö
= =ç ÷
è ø
/2
4,500
1.96 1,470
36
z
n
Visit: Learnbay.co
Interval estimate of µ is:
Interval Estimate of a Population Mean:
s Known
We are 95% confident that the interval contains the
population mean.
$41,100 + $1,470
or
$39,630 to $42,570
! Example: Discount Sounds
Visit: Learnbay.co
Interval estimate of µ is:
Interval Estimate of a Population Mean:
s Known
We are 95% confident that the interval contains the
population mean.
$41,100 + $1,470
or
$39,630 to $42,570
! Example: Discount Sounds
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Known
! Adequate Sample Size
In most applications, a sample size of n = 30 is
adequate.
If the population distribution is highly skewed or
contains outliers, a sample size of 50 or more is
recommended.
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Known
! Adequate Sample Size (continued)
If the population is believed to be at least
approximately normal, a sample size of less than 15
can be used.
If the population is not normally distributed but is
roughly symmetric, a sample size as small as 15
will suffice.
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Unknown
! If an estimate of the population standard deviation s
cannot be developed prior to sampling, we use the
sample standard deviation s to estimate s .
! This is the s unknown case.
! In this case, the interval estimate for µ is based on the
t distribution.
! (We’ll assume for now that the population is
normally distributed.)
Visit: Learnbay.co
William Gosset, writing under the name “Student”,
is the founder of the t distribution.
t Distribution
Gosset was an Oxford graduate in mathematics
and worked for the Guinness Brewery in Dublin.
He developed the t distribution while working on
small-scale materials and temperature experiments.
Visit: Learnbay.co
The t distribution is a family of similar probability
distributions.
t Distribution
A specific t distribution depends on a parameter
known as the degrees of freedom.
Degrees of freedom refer to the number of
independent pieces of information that go into the
computation of s.
Visit: Learnbay.co
t Distribution
A t distribution with more degrees of freedom has
less dispersion.
As the degrees of freedom increases, the difference
between the t distribution and the standard
normal probability distribution becomes smaller
and smaller.
Visit: Learnbay.co
For more than 100 degrees of freedom, the standard
normal z value provides a good approximation to
the t value.
t Distribution
The standard normal z values can be found in the
infinite degrees ( ) row of the t distribution table.¥
Visit: Learnbay.co
t Distribution
Degrees Area in Upper Tail
of Freedom .20 .10 .05 .025 .01 .005
. . . . . . .
50 .849 1.299 1.676 2.009 2.403 2.678
60 .848 1.296 1.671 2.000 2.390 2.660
80 .846 1.292 1.664 1.990 2.374 2.639
100 .845 1.290 1.660 1.984 2.364 2.626
.842 1.282 1.645 1.960 2.326 2.576
Standard normal
z values
¥
Visit: Learnbay.co
! Interval Estimate
where: 1 -a = the confidence coefficient
ta/2 = the t value providing an area of a/2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
Interval Estimate of a Population Mean:
s Unknown
x t
s
n
± a/2
Visit: Learnbay.co
A reporter for a student newspaper is writing an
article on the cost of off-campus housing. A sample of
16 one-bedroom apartments within a half-mile of
campus resulted in a sample mean of $750 per month
and a sample standard deviation of $55.
Interval Estimate of a Population Mean:
s Unknown
! Example: Apartment Rents
Let us provide a 95% confidence interval estimate
of the mean rent per month for the population of one-
bedroom efficiency apartments within a half-mile of
campus. We will assume this population to be
normally distributed.
Visit: Learnbay.co
At 95% confidence, a = .05, and a/2 = .025.
In the t distribution table we see that t.025 = 2.131.
t.025 is based on n - 1 = 16 - 1 = 15 degrees of freedom.
Interval Estimate of a Population Mean:
s Unknown
Degrees Area in Upper Tail
of Freedom .20 .100 .050 .025 .010 .005
15 .866 1.341 1.753 2.131 2.602 2.947
16 .865 1.337 1.746 2.120 2.583 2.921
17 .863 1.333 1.740 2.110 2.567 2.898
18 .862 1.330 1.734 2.101 2.520 2.878
19 .861 1.328 1.729 2.093 2.539 2.861
. . . . . . .
Visit: Learnbay.co
We are 95% confident that the mean rent per month
for the population of one-bedroom apartments within
a half-mile of campus is between $720.70 and $779.30.
! Interval Estimate
Interval Estimate of a Population Mean:
s Unknown
Margin
of Error
27
x t
s
n
± .025
55
750 2.131 750 29.30
16
± = ±
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Unknown
! Adequate Sample Size
If the population distribution is highly skewed or
contains outliers, a sample size of 50 or more is
recommended.
In most applications, a sample size of n = 30 is
adequate when using the expression to
develop an interval estimate of a population mean.
Visit: Learnbay.co
Interval Estimate of a Population Mean:
s Unknown
! Adequate Sample Size (continued)
If the population is believed to be at least
approximately normal, a sample size of less than 15
can be used.
If the population is not normally distributed but is
roughly symmetric, a sample size as small as 15
will suffice.
Visit: Learnbay.co
Summary of Interval Estimation Procedures
for a Population Mean
Can the
population standard
deviation s be assumed
known ?
Use
Yes No
Use
s Known
Case
s Unknown
Case
Use the sample
standard deviation
s to estimate s
/2
s
x t
n
a±/2x z
n
a
s
±
Visit: Learnbay.co
Let E = the desired margin of error.
E is the amount added to and subtracted from the
point estimate to obtain an interval estimate.
Sample Size for an Interval Estimate
of a Population Mean
If a desired margin of error is selected prior to
sampling, the sample size necessary to satisfy the
margin of error can be determined.
Visit: Learnbay.co
Sample Size for an Interval Estimate
of a Population Mean
! Margin of Error
! Necessary Sample Size
E z
n
= a
s
/2
n
z
E
=
( )/a s2
2 2
2
Visit: Learnbay.co
Sample Size for an Interval Estimate
of a Population Mean
The Necessary Sample Size equation requires a
value for the population standard deviation s .
If s is unknown, a preliminary or planning value
for s can be used in the equation.
1. Use the estimate of the population standard
deviation computed in a previous study.
2. Use a pilot study to select a preliminary study and
use the sample standard deviation from the study.
3. Use judgment or a “best guess” for the value of s .
Visit: Learnbay.co
Recall that Discount Sounds is evaluating a
potential location for a new retail outlet, based in
part, on the mean annual income of the individuals in
the marketing area of the new location.
Sample Size for an Interval Estimate
of a Population Mean
! Example: Discount Sounds
Suppose that Discount Sounds’ management team
wants an estimate of the population mean such that
there is a .95 probability that the sampling error is
$500 or less.
How large a sample size is needed to meet the
required precision?
Visit: Learnbay.co
At 95% confidence, z.025 = 1.96. Recall that s = 4,500.
Sample Size for an Interval Estimate
of a Population Mean
A sample of size 312 is needed to reach a desired
precision of + $500 at 95% confidence.
z
n
a
s
/2 500=
2 2
2
(1.96) (4,500)
311.17 312
(500)
n = = =
Visit: Learnbay.co
The general form of an interval estimate of a
population proportion is
Interval Estimate
of a Population Proportion
Margin of Errorp ±
Visit: Learnbay.co
Interval Estimate
of a Population Proportion
The sampling distribution of plays a key role in
computing the margin of error for this interval
estimate.
The sampling distribution of can be approximated
by a normal distribution whenever np > 5 and
n(1 – p) > 5.
p
p
Visit: Learnbay.co
a/2 a/2
Interval Estimate
of a Population Proportion
! Normal Approximation of Sampling Distribution of
Sampling
distribution
of
p
1 - a of all
values
p
p
(1 )
p
p p
n
s
-
=
p
/2 pza s/2 pza s
p
Visit: Learnbay.co
! Interval Estimate
Interval Estimate
of a Population Proportion
where: 1 -a is the confidence coefficient
za/2 is the z value providing an area of
a/2 in the upper tail of the standard
normal probability distribution
is the sample proportion
p z
p p
n
±
-
a/
( )
2
1
p
Visit: Learnbay.co
Political Science, Inc. (PSI) specializes in voter polls
and surveys designed to keep political office seekers
informed of their position in a race.
Using telephone surveys, PSI interviewers ask
registered voters who they would vote for if the
election were held that day.
Interval Estimate
of a Population Proportion
! Example: Political Science, Inc.
Visit: Learnbay.co
In a current election campaign, PSI has just found
that 220 registered voters, out of 500 contacted, favor
a particular candidate. PSI wants to develop a 95%
confidence interval estimate for the proportion of the
population of registered voters that favor the
candidate.
Interval Estimate
of a Population Proportion
! Example: Political Science, Inc.
Visit: Learnbay.co
where: n = 500, = 220/500 = .44, za/2 = 1.96
Interval Estimate
of a Population Proportion
PSI is 95% confident that the proportion of all voters
that favor the candidate is between .3965 and .4835.
= .44 + .0435
p z
p p
n
±
-
a/
( )
2
1
p
.44(1 .44)
.44 1.96
500
-
±
Visit: Learnbay.co
Solving for the necessary sample size, we get
! Margin of Error
Sample Size for an Interval Estimate
of a Population Proportion
However, will not be known until after we have
selected the sample. We will use the planning value
p* for .
/2
(1 )p p
E z
n
a
-
=
2
/2
2
( ) (1 )z p p
n
E
a -
=
p
p
Visit: Learnbay.co
Sample Size for an Interval Estimate
of a Population Proportion
The planning value p* can be chosen by:
1. Using the sample proportion from a previous
sample of the same or similar units, or
2. Selecting a preliminary sample and using the
sample proportion from this sample.
3. Use judgment or a “best guess” for a p* value.
4. Otherwise, use .50 as the p* value.
! Necessary Sample Size
2 * *
/2
2
( ) (1 )z p p
n
E
a -
=
Visit: Learnbay.co
Suppose that PSI would like a .99 probability that
the sample proportion is within + .03 of the
population proportion.
How large a sample size is needed to meet the
required precision? (A previous sample of similar
units yielded .44 for the sample proportion.)
Sample Size for an Interval Estimate
of a Population Proportion
! Example: Political Science, Inc.
Visit: Learnbay.co
At 99% confidence, z.005 = 2.576. Recall that p* = .44.
A sample of size 1817 is needed to reach a desired
precision of + .03 at 99% confidence.
Sample Size for an Interval Estimate
of a Population Proportion
2 * * 2
/2
2 2
( ) (1 ) (2.576) (.44)(.56)
1817
(.03)
z p p
n
E
a -
= = @
* *
/2
(1 )
.03
p p
z
n
a
-
=
Visit: Learnbay.co
Note: We used .44 as the best estimate of p in the
preceding expression. If no information is available
about p, then .5 is often assumed because it provides
the highest possible sample size. If we had used
p = .5, the recommended n would have been 1843.
Sample Size for an Interval Estimate
of a Population Proportion
Visit: Learnbay.co
Z-score Review
o A sample mean (X-bar) approximates a population mean (μ)
o The standard error provides a measure of
n how well a sample mean approximates the population mean
n determines how much difference between X-bar and μ is reasonable to
expect just by chance
o The z-score is a statistic used to quantify this inference
o obtained difference between data and hypothesis/standard
distance expected by chance
s
µ-
=
X
z
Visit: Learnbay.co
What’s the problem with z?
o Need to know the population mean and variance!!!
Not always available.
Visit: Learnbay.co
What is the t statistic?
o “Cousin” of the z statistic that does not require the
population mean (μ) or variance (σ2)to be known
o Can be used to test hypotheses about a completely
unknown population (when the only information
about the population comes from the sample)
o Required: a sample and a reasonable hypothesis
about the population mean (μ)
o Can be used with one sample or to compare two
samples
Visit: Learnbay.co
What is the t statistic?
o “Cousin” of the z statistic that does not require the
population mean (μ) or variance (σ2)to be known
o Can be used to test hypotheses about a completely
unknown population (when the only information about
the population comes from the sample)
o Required: a sample and a reasonable hypothesis
about the population mean (μ)
o Can be used with one sample or to compare two
samples
The t Distribution
We use t when the population variance is unknown (the
usual case) and sample size is small (N<100, the usual case).
If you use a stat package for testing hypotheses about means,
you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends on its
df. As N becomes infinitely large, t becomes normal.
Visit: Learnbay.co
From Z to t…
1
)( 2
-
-
=
å
n
XX
s
n
s
sX
=
X
hyp
s
X
t
µ-
=
X
hypX
Z
s
µ-
=
nX
s
s =
N
X 2
)( µ
s
-S
=
Note
lowercase
“s”.
)1(
)( 22
-
S-S
=
nn
XXn
s2
22
)(
N
XXN S-S
=s
Visit: Learnbay.co
Thank You…..!!!!!
Visit: Learnbay.co

More Related Content

What's hot

Stratified sampling
Stratified samplingStratified sampling
Stratified sampling
suncil0071
 

What's hot (20)

Sampling and sample size determination
Sampling and sample size determinationSampling and sample size determination
Sampling and sample size determination
 
Sampling.pptx
Sampling.pptxSampling.pptx
Sampling.pptx
 
Probability & Non-Probability.pptx
Probability & Non-Probability.pptxProbability & Non-Probability.pptx
Probability & Non-Probability.pptx
 
4.sampling design
4.sampling design4.sampling design
4.sampling design
 
Systematic sampling
Systematic samplingSystematic sampling
Systematic sampling
 
Stratified sampling
Stratified samplingStratified sampling
Stratified sampling
 
Sampling techniques
Sampling techniquesSampling techniques
Sampling techniques
 
Ch04 sampling
Ch04 samplingCh04 sampling
Ch04 sampling
 
Sampling
SamplingSampling
Sampling
 
Probability Sampling
Probability SamplingProbability Sampling
Probability Sampling
 
Types of Sampling
Types of SamplingTypes of Sampling
Types of Sampling
 
Sampling bigslides
Sampling bigslidesSampling bigslides
Sampling bigslides
 
Sampling
SamplingSampling
Sampling
 
Data Science interview questions of Statistics
Data Science interview questions of Statistics Data Science interview questions of Statistics
Data Science interview questions of Statistics
 
Systematic Random Sampling
Systematic Random SamplingSystematic Random Sampling
Systematic Random Sampling
 
SAMPLING ; SAMPLING TECHNIQUES – RANDOM SAMPLING (SIMPLE RANDOM SAMPLING)
SAMPLING ; SAMPLING TECHNIQUES – RANDOM SAMPLING  (SIMPLE RANDOM SAMPLING)SAMPLING ; SAMPLING TECHNIQUES – RANDOM SAMPLING  (SIMPLE RANDOM SAMPLING)
SAMPLING ; SAMPLING TECHNIQUES – RANDOM SAMPLING (SIMPLE RANDOM SAMPLING)
 
Evaluating classification algorithms
Evaluating classification algorithmsEvaluating classification algorithms
Evaluating classification algorithms
 
Sampling:Medical Statistics Part III
Sampling:Medical Statistics Part IIISampling:Medical Statistics Part III
Sampling:Medical Statistics Part III
 
Experimental designs
Experimental designsExperimental designs
Experimental designs
 
Jayesh sampling technique semi (1) (1)
Jayesh  sampling technique semi (1) (1)Jayesh  sampling technique semi (1) (1)
Jayesh sampling technique semi (1) (1)
 

Similar to Sampling Methods

Similar to Sampling Methods (20)

Population vs Samples in data science
Population vs Samples in data science Population vs Samples in data science
Population vs Samples in data science
 
43911
4391143911
43911
 
sampling
samplingsampling
sampling
 
sampling-techniques.ppsx
sampling-techniques.ppsxsampling-techniques.ppsx
sampling-techniques.ppsx
 
Sampling.ppt
Sampling.pptSampling.ppt
Sampling.ppt
 
File
FileFile
File
 
sampling methods...
sampling methods...sampling methods...
sampling methods...
 
Sampling
SamplingSampling
Sampling
 
Sampling Design
Sampling DesignSampling Design
Sampling Design
 
Sampling method.ppt
Sampling method.pptSampling method.ppt
Sampling method.ppt
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
Soil Sampling Soil Sampling Soil Sampling Soil Sampling
Soil Sampling Soil Sampling Soil Sampling Soil SamplingSoil Sampling Soil Sampling Soil Sampling Soil Sampling
Soil Sampling Soil Sampling Soil Sampling Soil Sampling
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
sampling%20method.pptx
sampling%20method.pptxsampling%20method.pptx
sampling%20method.pptx
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 
Phyton class by Pavan - Study notes incl
Phyton class by Pavan - Study notes inclPhyton class by Pavan - Study notes incl
Phyton class by Pavan - Study notes incl
 
43911.ppt
43911.ppt43911.ppt
43911.ppt
 

More from Learnbay Datascience

More from Learnbay Datascience (20)

Top data science projects
Top data science projectsTop data science projects
Top data science projects
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
 
Python my SQL - create database
Python my SQL - create databasePython my SQL - create database
Python my SQL - create database
 
Python my sql database connection
Python my sql   database connectionPython my sql   database connection
Python my sql database connection
 
Python - mySOL
Python - mySOLPython - mySOL
Python - mySOL
 
AI - Issues and Terminology
AI - Issues and TerminologyAI - Issues and Terminology
AI - Issues and Terminology
 
AI - Fuzzy Logic Systems
AI - Fuzzy Logic SystemsAI - Fuzzy Logic Systems
AI - Fuzzy Logic Systems
 
AI - working of an ns
AI - working of an nsAI - working of an ns
AI - working of an ns
 
Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural Networks
 
AI - Robotics
AI - RoboticsAI - Robotics
AI - Robotics
 
Applications of expert system
Applications of expert systemApplications of expert system
Applications of expert system
 
Components of expert systems
Components of expert systemsComponents of expert systems
Components of expert systems
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systems
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
 
AI - Agents & Environments
AI - Agents & EnvironmentsAI - Agents & Environments
AI - Agents & Environments
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Artificial intelligence composed
Artificial intelligence composedArtificial intelligence composed
Artificial intelligence composed
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systems
 
Applications of ai
Applications of aiApplications of ai
Applications of ai
 

Recently uploaded

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Sampling Methods

  • 1. SAMPLING METHODS Data science and AI Certification Course Visit: Learnbay.co
  • 2. Populations vs. Samples oWho = Population: nall individuals of interest n US Voters, Dentists, College students, Children oWhat = Parameter nCharacteristic of population oProblem: can’t study/survey whole pop oSolution: Use a sample for the “who” nsubset, selected from population ncalculate a statistic for the “what” Visit: Learnbay.co
  • 3. Types of Sampling oSimple Random Sampling oStratified Random Sampling oCluster Sampling oSystematic Sampling oRepresentative Sampling (Can be stratified random or quota sampling) oConvenience or Haphazard Sampling oSampling with Replacement vs. Sampling without Replacement Visit: Learnbay.co
  • 4. Types of Statistics § Descriptive statistics: § Organize and summarize scores from samples § Inferential statistics: § Infer information about the population based on what we know from sample data § Decide if an experimental manipulation has had an effect Visit: Learnbay.co
  • 5. THE BIG PICTURE OF STATISTICS Theory Question to answer / Hypothesis to test Design Research Study Collect Data (measurements, observations) Organize and make sense of the #s USING STATISTICS! Depends on our goal: Describe characteristics Test hypothesis, Make conclusions, organize, summarize, condense data interpret data, understand relations DESCRIPTIVE STATISTICS INFERENTIAL STATISTICS Visit: Learnbay.co
  • 6. Types of Variables oQuantitative n Measured in amounts n Ht, Wt, Test score lDiscrete: l separate categories l Letter grade oQualitative n Measured in categories n Gender, race, diagnosis lContinuous: l infinite values in between l GPA Visit: Learnbay.co
  • 7. Scales of Measurement oNominal Scale: Categories, labels, data carry no numerical value oOrdinal Scale: Rank ordered data, but no information about distance between ranks oInterval Scale: Degree of distance between scores can be assessed with standard sized intervals oRatio Scale: Same as interval scale with an absolute zero point. Visit: Learnbay.co
  • 8. SAMPLING o A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005) o Why sample? n Resources (time, money) and workload n Gives results with known accuracy that can be calculated mathematically o The sampling frame is the list from which the potential respondents are drawn n Registrar’s office n Class rosters n Must assess sampling frame errors Visit: Learnbay.co
  • 9. SAMPLING…… oWhat is your population of interest? pTo whom do you want to generalize your results? nAll doctors nSchool children nIndians nWomen aged 15-45 years nOther oCan you sample the entire population? Visit: Learnbay.co
  • 10. SAMPLING……. o3 factors that influence sample representative- ness pSampling procedure pSample size pParticipation (response) oWhen might you sample the entire population? pWhen your population is very small pWhen you have extensive resources pWhen you don’t expect a very high response Visit: Learnbay.co
  • 13. Types of Samples o Probability (Random) Samples o Simple random sample n Systematic random sample n Stratified random sample n Multistage sample n Multiphase sample n Cluster sample o Non-Probability Samples n Convenience sample n Purposive sample n Quota Visit: Learnbay.co
  • 14. SIMPLE RANDOM SAMPLING • Applicable when population is small, homogeneous & readily available • All subsets of the frame are given an equal probability. Each element of the frame thus has an equal probability of selection. • It provides for greatest number of possible samples. This is done by assigning a number to each unit in the sampling frame. • A table of random number or lottery system is used to determine which units are to be selected. Visit: Learnbay.co
  • 15. SIMPLE RANDOM SAMPLING…. o Estimates are easy to calculate. o Disadvantages o If sampling frame large, this method impracticable. o Minority subgroups of interest in population may not be present in sample in sufficient numbers for study. Visit: Learnbay.co
  • 16. SYSTEMATIC SAMPLING o Systematic sampling relies on arranging the target population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. o Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). o It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list. o A simple example would be to select every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10'). Visit: Learnbay.co
  • 17. SYSTEMATIC SAMPLING…… As described above, systematic sampling is a method, because all elements have the same probability of selection (in the example given, one in ten). It is not 'simple random sampling' because different subsets of the same size have different selection probabilities - e.g. the set {4,14,24,...,994} has a one-in-ten probability of selection, but the set {4,13,24,34,...} has zero probability of selection. Visit: Learnbay.co
  • 18. SYSTEMATIC SAMPLING…… o ADVANTAGES: o Sample easy to select o Suitable sampling frame can be identified easily o Sample evenly spread over entire reference population o DISADVANTAGES: o Sample may be biased if hidden periodicity in population coincides with that of selection. o Difficult to assess precision of estimate from one survey. Visit: Learnbay.co
  • 19. STRATIFIED SAMPLING Where population embraces a number of distinct categories, the frame can be organized into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected. — Every unit in a stratum has same chance of being selected. — Using same sampling fraction for all strata ensures proportionate representation in the sample. — Adequate representation of minority subgroups of interest can be ensured by stratification & varying sampling fraction between strata as required. Visit: Learnbay.co
  • 20. STRATIFIED SAMPLING…… o Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to different strata. o Drawbacks to using stratified sampling. o First, sampling frame of entire population has to be prepared separately for each stratum o Second, when examining multiple criteria, stratifying variables may be related to some, but not to others, further complicating the design, and potentially reducing the utility of the strata. o Finally, in some cases (such as designs with a large number of strata, or those with a specified minimum sample size per group), stratified sampling can potentially require a larger sample than would other methods Visit: Learnbay.co
  • 21. STRATIFIED SAMPLING……. Draw a sample from each stratum Visit: Learnbay.co
  • 22. CLUSTER SAMPLING oCluster sampling is an example of 'two-stage sampling' . o First stage a sample of areas is chosen; o Second stage a sample of respondents within those areas is selected. o Population divided into clusters of homogeneous units, usually based on geographical contiguity. oSampling units are groups rather than individuals. oA sample of such clusters is then selected. oAll units from the selected clusters are studied. Visit: Learnbay.co
  • 23. CLUSTER SAMPLING……. oAdvantages : oCuts down on the cost of preparing a sampling frame. oThis can reduce travel and other administrative costs. oDisadvantages: sampling error is higher for a simple random sample of same size. oOften used to evaluate vaccination coverage Visit: Learnbay.co
  • 24. CLUSTER SAMPLING……. • Identification of clusters – List all cities, towns, villages & wards of cities with their population falling in target area under study. – Calculate cumulative population & divide by 30, this gives sampling interval. – Select a random no. less than or equal to sampling interval having same no. of digits. This forms 1st cluster. – Random no.+ sampling interval = population of 2nd cluster. – Second cluster + sampling interval = 4th cluster. – Last or 30th cluster = 29th cluster + sampling interval Visit: Learnbay.co
  • 25. CLUSTER SAMPLING……. Two types of cluster sampling methods. One-stage sampling. All of the elements within selected clusters are included in the sample. Two-stage sampling. A subset of elements within selected clusters are randomly selected for inclusion in the sample. Visit: Learnbay.co
  • 26. Difference Between Strata and Clusters — Although strata and clusters are both non-overlapping subsets of the population, they differ in several ways. — All strata are represented in the sample; but only a subset of clusters are in the sample. — With stratified sampling, the best survey results occur when elements within strata are internally homogeneous. However, with cluster sampling, the best results occur when elements within clusters are internally heterogeneous Visit: Learnbay.co
  • 27. CONVENIENCE SAMPLING o Sometimes known as grab or opportunity sampling or accidental or haphazard sampling. o A type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, readily available and convenient. o The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough. o For example, if the interviewer was to conduct a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week. o This type of sampling is most useful for pilot testing. o In social science research, snowball sampling is a similar technique, where existing study subjects are used to recruit more subjects into the sample. Visit: Learnbay.co
  • 28. CONVENIENCE SAMPLING……. — Use results that are easy to get 28 Visit: Learnbay.co
  • 29. Statistical Inference The process of making guesses about the truth from a sample. Sample (observation) Make guesses about the whole population Truth (not observable) N x N i i 2 12 )( µ s - = å= N x N i å= = 1 µ Population parameters 1 )( ˆ 2 122 - - == å= n Xx s n n i i s n x X n i n å= == 1 ˆµ Sample statistics *hat notation ^ is often used to indicate “estimate” Visit: Learnbay.co
  • 30. Interval Estimation ! Population Mean: s Known ! Population Mean: s Unknown ! Determining the Sample Size ! Population Proportion Visit: Learnbay.co
  • 31. A point estimator cannot be expected to provide the exact value of the population parameter. An interval estimate can be computed by adding and subtracting a margin of error to the point estimate. Point Estimate +/- Margin of Error The purpose of an interval estimate is to provide information about how close the point estimate is to the value of the parameter. Margin of Error and the Interval Estimate Visit: Learnbay.co
  • 32. The general form of an interval estimate of a population mean is Margin of Error and the Interval Estimate Margin of Errorx ± Visit: Learnbay.co
  • 33. 33Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. µ a/2 a/2 1 - a of all values Sampling distribution of [------------------------- -------------------------] [------------------------- -------------------------] [------------------------- -------------------------] interval does not include µ interval includes µ interval includes µ Interval Estimate of a Population Mean: s Known x x x z xa s/2z xa s/2 x x x Visit: Learnbay.co
  • 34. Interval Estimate of a Population Mean: s Known ! In order to develop an interval estimate of a population mean, the margin of error must be computed using either: • the population standard deviation s , or • the sample standard deviation s ! s is rarely known exactly, but often a good estimate can be obtained based on historical data or other information. ! We refer to such cases as the s known case. Visit: Learnbay.co
  • 35. There is a 1 - a probability that the value of a sample mean will provide a margin of error of or less. µ a/2 a/21 - a of all values Sampling distribution of Interval Estimate of a Population Mean: s Known z xa s/2 x x x z xa s/2 z xa s/2 Visit: Learnbay.co
  • 36. µ a/2 a/2 1 - a of all values Sampling distribution of [------------------------- -------------------------] [------------------------- -------------------------] [------------------------- -------------------------] interval does not include µ interval includes µ interval includes µ Interval Estimate of a Population Mean: s Known x x x z xa s/2z xa s/2 x x x Visit: Learnbay.co
  • 37. ! Interval Estimate of µ Interval Estimate of a Population Mean: s Known where: is the sample mean 1 -a is the confidence coefficient za/2 is the z value providing an area of a/2 in the upper tail of the standard normal probability distribution s is the population standard deviation n is the sample size x z n ± a s /2 x Visit: Learnbay.co
  • 38. ! Interval Estimate of µ Interval Estimate of a Population Mean: s Known where: is the sample mean 1 -a is the confidence coefficient za/2 is the z value providing an area of a/2 in the upper tail of the standard normal probability distribution s is the population standard deviation n is the sample size x z n ± a s /2 x Visit: Learnbay.co
  • 39. Meaning of Confidence Because 90% of all the intervals constructed using will contain the population mean, we say we are 90% confident that the interval includes the population mean µ. We say that this interval has been established at the 90% confidence level. The value .90 is referred to as the confidence coefficient. xx s645.1± xx s645.1± Visit: Learnbay.co
  • 40. Interval Estimate of a Population Mean: s Known ! Example: Discount Sounds Discount Sounds has 260 retail outlets throughout the United States. The firm is evaluating a potential location for a new outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. A sample of size n = 36 was taken; the sample mean income is $41,100. The population is not believed to be highly skewed. The population standard deviation is estimated to be $4,500, and the confidence coefficient to be used in the interval estimate is .95. Visit: Learnbay.co
  • 41. 95% of the sample means that can be observed are within + 1.96 of the population mean µ. The margin of error is: Thus, at 95% confidence, the margin of error is $1,470. Interval Estimate of a Population Mean: s Known ! Example: Discount Sounds sx a s æ ö = =ç ÷ è ø /2 4,500 1.96 1,470 36 z n Visit: Learnbay.co
  • 42. Interval estimate of µ is: Interval Estimate of a Population Mean: s Known We are 95% confident that the interval contains the population mean. $41,100 + $1,470 or $39,630 to $42,570 ! Example: Discount Sounds Visit: Learnbay.co
  • 43. Interval estimate of µ is: Interval Estimate of a Population Mean: s Known We are 95% confident that the interval contains the population mean. $41,100 + $1,470 or $39,630 to $42,570 ! Example: Discount Sounds Visit: Learnbay.co
  • 44. Interval Estimate of a Population Mean: s Known ! Adequate Sample Size In most applications, a sample size of n = 30 is adequate. If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. Visit: Learnbay.co
  • 45. Interval Estimate of a Population Mean: s Known ! Adequate Sample Size (continued) If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. Visit: Learnbay.co
  • 46. Interval Estimate of a Population Mean: s Unknown ! If an estimate of the population standard deviation s cannot be developed prior to sampling, we use the sample standard deviation s to estimate s . ! This is the s unknown case. ! In this case, the interval estimate for µ is based on the t distribution. ! (We’ll assume for now that the population is normally distributed.) Visit: Learnbay.co
  • 47. William Gosset, writing under the name “Student”, is the founder of the t distribution. t Distribution Gosset was an Oxford graduate in mathematics and worked for the Guinness Brewery in Dublin. He developed the t distribution while working on small-scale materials and temperature experiments. Visit: Learnbay.co
  • 48. The t distribution is a family of similar probability distributions. t Distribution A specific t distribution depends on a parameter known as the degrees of freedom. Degrees of freedom refer to the number of independent pieces of information that go into the computation of s. Visit: Learnbay.co
  • 49. t Distribution A t distribution with more degrees of freedom has less dispersion. As the degrees of freedom increases, the difference between the t distribution and the standard normal probability distribution becomes smaller and smaller. Visit: Learnbay.co
  • 50. For more than 100 degrees of freedom, the standard normal z value provides a good approximation to the t value. t Distribution The standard normal z values can be found in the infinite degrees ( ) row of the t distribution table.¥ Visit: Learnbay.co
  • 51. t Distribution Degrees Area in Upper Tail of Freedom .20 .10 .05 .025 .01 .005 . . . . . . . 50 .849 1.299 1.676 2.009 2.403 2.678 60 .848 1.296 1.671 2.000 2.390 2.660 80 .846 1.292 1.664 1.990 2.374 2.639 100 .845 1.290 1.660 1.984 2.364 2.626 .842 1.282 1.645 1.960 2.326 2.576 Standard normal z values ¥ Visit: Learnbay.co
  • 52. ! Interval Estimate where: 1 -a = the confidence coefficient ta/2 = the t value providing an area of a/2 in the upper tail of a t distribution with n - 1 degrees of freedom s = the sample standard deviation Interval Estimate of a Population Mean: s Unknown x t s n ± a/2 Visit: Learnbay.co
  • 53. A reporter for a student newspaper is writing an article on the cost of off-campus housing. A sample of 16 one-bedroom apartments within a half-mile of campus resulted in a sample mean of $750 per month and a sample standard deviation of $55. Interval Estimate of a Population Mean: s Unknown ! Example: Apartment Rents Let us provide a 95% confidence interval estimate of the mean rent per month for the population of one- bedroom efficiency apartments within a half-mile of campus. We will assume this population to be normally distributed. Visit: Learnbay.co
  • 54. At 95% confidence, a = .05, and a/2 = .025. In the t distribution table we see that t.025 = 2.131. t.025 is based on n - 1 = 16 - 1 = 15 degrees of freedom. Interval Estimate of a Population Mean: s Unknown Degrees Area in Upper Tail of Freedom .20 .100 .050 .025 .010 .005 15 .866 1.341 1.753 2.131 2.602 2.947 16 .865 1.337 1.746 2.120 2.583 2.921 17 .863 1.333 1.740 2.110 2.567 2.898 18 .862 1.330 1.734 2.101 2.520 2.878 19 .861 1.328 1.729 2.093 2.539 2.861 . . . . . . . Visit: Learnbay.co
  • 55. We are 95% confident that the mean rent per month for the population of one-bedroom apartments within a half-mile of campus is between $720.70 and $779.30. ! Interval Estimate Interval Estimate of a Population Mean: s Unknown Margin of Error 27 x t s n ± .025 55 750 2.131 750 29.30 16 ± = ± Visit: Learnbay.co
  • 56. Interval Estimate of a Population Mean: s Unknown ! Adequate Sample Size If the population distribution is highly skewed or contains outliers, a sample size of 50 or more is recommended. In most applications, a sample size of n = 30 is adequate when using the expression to develop an interval estimate of a population mean. Visit: Learnbay.co
  • 57. Interval Estimate of a Population Mean: s Unknown ! Adequate Sample Size (continued) If the population is believed to be at least approximately normal, a sample size of less than 15 can be used. If the population is not normally distributed but is roughly symmetric, a sample size as small as 15 will suffice. Visit: Learnbay.co
  • 58. Summary of Interval Estimation Procedures for a Population Mean Can the population standard deviation s be assumed known ? Use Yes No Use s Known Case s Unknown Case Use the sample standard deviation s to estimate s /2 s x t n a±/2x z n a s ± Visit: Learnbay.co
  • 59. Let E = the desired margin of error. E is the amount added to and subtracted from the point estimate to obtain an interval estimate. Sample Size for an Interval Estimate of a Population Mean If a desired margin of error is selected prior to sampling, the sample size necessary to satisfy the margin of error can be determined. Visit: Learnbay.co
  • 60. Sample Size for an Interval Estimate of a Population Mean ! Margin of Error ! Necessary Sample Size E z n = a s /2 n z E = ( )/a s2 2 2 2 Visit: Learnbay.co
  • 61. Sample Size for an Interval Estimate of a Population Mean The Necessary Sample Size equation requires a value for the population standard deviation s . If s is unknown, a preliminary or planning value for s can be used in the equation. 1. Use the estimate of the population standard deviation computed in a previous study. 2. Use a pilot study to select a preliminary study and use the sample standard deviation from the study. 3. Use judgment or a “best guess” for the value of s . Visit: Learnbay.co
  • 62. Recall that Discount Sounds is evaluating a potential location for a new retail outlet, based in part, on the mean annual income of the individuals in the marketing area of the new location. Sample Size for an Interval Estimate of a Population Mean ! Example: Discount Sounds Suppose that Discount Sounds’ management team wants an estimate of the population mean such that there is a .95 probability that the sampling error is $500 or less. How large a sample size is needed to meet the required precision? Visit: Learnbay.co
  • 63. At 95% confidence, z.025 = 1.96. Recall that s = 4,500. Sample Size for an Interval Estimate of a Population Mean A sample of size 312 is needed to reach a desired precision of + $500 at 95% confidence. z n a s /2 500= 2 2 2 (1.96) (4,500) 311.17 312 (500) n = = = Visit: Learnbay.co
  • 64. The general form of an interval estimate of a population proportion is Interval Estimate of a Population Proportion Margin of Errorp ± Visit: Learnbay.co
  • 65. Interval Estimate of a Population Proportion The sampling distribution of plays a key role in computing the margin of error for this interval estimate. The sampling distribution of can be approximated by a normal distribution whenever np > 5 and n(1 – p) > 5. p p Visit: Learnbay.co
  • 66. a/2 a/2 Interval Estimate of a Population Proportion ! Normal Approximation of Sampling Distribution of Sampling distribution of p 1 - a of all values p p (1 ) p p p n s - = p /2 pza s/2 pza s p Visit: Learnbay.co
  • 67. ! Interval Estimate Interval Estimate of a Population Proportion where: 1 -a is the confidence coefficient za/2 is the z value providing an area of a/2 in the upper tail of the standard normal probability distribution is the sample proportion p z p p n ± - a/ ( ) 2 1 p Visit: Learnbay.co
  • 68. Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held that day. Interval Estimate of a Population Proportion ! Example: Political Science, Inc. Visit: Learnbay.co
  • 69. In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters that favor the candidate. Interval Estimate of a Population Proportion ! Example: Political Science, Inc. Visit: Learnbay.co
  • 70. where: n = 500, = 220/500 = .44, za/2 = 1.96 Interval Estimate of a Population Proportion PSI is 95% confident that the proportion of all voters that favor the candidate is between .3965 and .4835. = .44 + .0435 p z p p n ± - a/ ( ) 2 1 p .44(1 .44) .44 1.96 500 - ± Visit: Learnbay.co
  • 71. Solving for the necessary sample size, we get ! Margin of Error Sample Size for an Interval Estimate of a Population Proportion However, will not be known until after we have selected the sample. We will use the planning value p* for . /2 (1 )p p E z n a - = 2 /2 2 ( ) (1 )z p p n E a - = p p Visit: Learnbay.co
  • 72. Sample Size for an Interval Estimate of a Population Proportion The planning value p* can be chosen by: 1. Using the sample proportion from a previous sample of the same or similar units, or 2. Selecting a preliminary sample and using the sample proportion from this sample. 3. Use judgment or a “best guess” for a p* value. 4. Otherwise, use .50 as the p* value. ! Necessary Sample Size 2 * * /2 2 ( ) (1 )z p p n E a - = Visit: Learnbay.co
  • 73. Suppose that PSI would like a .99 probability that the sample proportion is within + .03 of the population proportion. How large a sample size is needed to meet the required precision? (A previous sample of similar units yielded .44 for the sample proportion.) Sample Size for an Interval Estimate of a Population Proportion ! Example: Political Science, Inc. Visit: Learnbay.co
  • 74. At 99% confidence, z.005 = 2.576. Recall that p* = .44. A sample of size 1817 is needed to reach a desired precision of + .03 at 99% confidence. Sample Size for an Interval Estimate of a Population Proportion 2 * * 2 /2 2 2 ( ) (1 ) (2.576) (.44)(.56) 1817 (.03) z p p n E a - = = @ * * /2 (1 ) .03 p p z n a - = Visit: Learnbay.co
  • 75. Note: We used .44 as the best estimate of p in the preceding expression. If no information is available about p, then .5 is often assumed because it provides the highest possible sample size. If we had used p = .5, the recommended n would have been 1843. Sample Size for an Interval Estimate of a Population Proportion Visit: Learnbay.co
  • 76. Z-score Review o A sample mean (X-bar) approximates a population mean (μ) o The standard error provides a measure of n how well a sample mean approximates the population mean n determines how much difference between X-bar and μ is reasonable to expect just by chance o The z-score is a statistic used to quantify this inference o obtained difference between data and hypothesis/standard distance expected by chance s µ- = X z Visit: Learnbay.co
  • 77. What’s the problem with z? o Need to know the population mean and variance!!! Not always available. Visit: Learnbay.co
  • 78. What is the t statistic? o “Cousin” of the z statistic that does not require the population mean (μ) or variance (σ2)to be known o Can be used to test hypotheses about a completely unknown population (when the only information about the population comes from the sample) o Required: a sample and a reasonable hypothesis about the population mean (μ) o Can be used with one sample or to compare two samples Visit: Learnbay.co
  • 79. What is the t statistic? o “Cousin” of the z statistic that does not require the population mean (μ) or variance (σ2)to be known o Can be used to test hypotheses about a completely unknown population (when the only information about the population comes from the sample) o Required: a sample and a reasonable hypothesis about the population mean (μ) o Can be used with one sample or to compare two samples
  • 80. The t Distribution We use t when the population variance is unknown (the usual case) and sample size is small (N<100, the usual case). If you use a stat package for testing hypotheses about means, you will use t. The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N becomes infinitely large, t becomes normal. Visit: Learnbay.co
  • 81. From Z to t… 1 )( 2 - - = å n XX s n s sX = X hyp s X t µ- = X hypX Z s µ- = nX s s = N X 2 )( µ s -S = Note lowercase “s”. )1( )( 22 - S-S = nn XXn s2 22 )( N XXN S-S =s Visit: Learnbay.co