Sampling distribution concepts

Population & Sample
Population Sample
 Population in statistics means
the whole of the information
that comes under the purview
of statistical investigation.
 It is the totality of all the
observations of a statistical
inquiry.
 It is also known as
“UNIVERSE”
 A population may be finite or
infinite
 A part of the population
selected for study is called a
SAMPLE.
 Hence, Sample is nothing but
the selection of a group of
items from a population in
such a way that this group
represents the population.
 The number of individuals
included in the finite sample
is called the SIZE OFTHE
SAMPLE.

Parameter & Statistic
Parameter Statistic
 Any statistical measure
(such as mean, mode ,
S.D.) computed from
population data is known
as PARAMETER.
 Any statistical measure
computed from sample data
is known as STATISTIC.
 STATISTIC computed from a
sample drawn from the
parent population plays an
important role in
 A)TheTheory of Estimation
 B)Testing of Hypothesis

Notations used
Notations
Statistical Measure Population Sample
Mean µ X
Standard deviation σ S
Size N n

Sampling & Sampling Theory
Sampling Sampling theory
 It is the process of selecting a
sample from the population.
 Sampling can also be defined
as the process of drawing a
sample from the population &
compiling a suitable statistic
in order to estimate the
parameter drawn from the
parent population & to test
the significance of the
statistic computed from such
sample.
 Sampling theory is based on
Sampling
 It deals with statistical
inferences drawn from sampling
results, which are of three types:
i. Statistical Estimation,
ii. Tests of significance, and
iii. Statistical inference

Objects of Sampling theory
 To estimate population parameter on the
basis of sample statistic.
 To set the limits of accuracy & degree of
confidence of the estimates of the population
parameter computed on the basis of sample
statistic.
 To test significance about the population
characteristic on the basis of sample statistic.

Methods of Sampling
Random (Probability)
Sampling Non-random Sampling
 Simple Random sampling
 Stratified Sampling
 Systematic Sampling
 Multi-stage Sampling
 Judgment Sampling
 Quota Sampling
 Convenience Sampling

Simple Random sampling
 This method refers to the sampling technique in
which each and every item of the population is given
a chance of being included in the sample;
 The selection is free from personal bias;
 This method is also known as method of chance
selection.
 It is sometimes also referred to as “representative
sampling” (if the sample is chosen at random and if
the size of the sample is sufficiently large, it’ll
represent all groups in the population)

Contd..
 It is a probability sampling because every
item of the population has an equal
opportunity of being selected in the sample;
 Methods of obtaining a Simple Random
Sample:
1. Lottery method
2. Table of random numbers ( a number of
random tables are available such as Tippets
table; Fisher and Yates numbers; Kendall and
Babington Smith numbers)

Stratified Sampling
 It is one of the restricted random methods which by
using available information concerning the data
attempts to design a more efficient sample than that
obtained by the simple random procedure;
 The process of stratification requires that the
populationmay be divided into homogeneous
groups or classes called strata
 then a sample may be taken from each group by
simple random method
 And the resulting sample is called a stratified sample

Contd..
 A stratified sample may be either proportional
or disproportionate.
 In a proportional stratified sampling plan, the
number of items drawn from each stratum is
proportional to the size of the strata.
 For example, if the population is divided into 4
strata, their respective sizes being 15, 10,20 ,55
% of the population and a sample of 1000 is to be
drawn, the desired proportional sample may be
obtained in the following manner:

Contd..
From stratum one 1000 (0.15) 150 items
From stratum two 1000 (0.10) 100
From stratum three 1000 (0.20) 200
From stratum four 1000 (0.55) 550
Sample Size 1000
Disproportionate Stratified sampling includes
procedures of taking an equal number of items
from each stratum irrespective of its size.

Systematic Sampling
 This method is popularly used in such cases
where a complete list of the population from
which sampling is to be drawn is available;
 The method is to select every kth item from
the list where ‘k’ refers to the sampling
interval;
 k = size of population / sample size (N/n);
 The starting point between the first & the kth
is selected at random

Contd..
 For example, if a complete list of 1000
students is available and we want to draw a
sample of 200 students; this means we must
take every 5th item.
 But the first item between one and five shall
be selected at random.
 Let it be three, now we shall go on adding 5 &
obtain numbers of desired sample.

Cluster Sampling
 It is different from stratified sampling in a
way that each strata consists of
homogeneous items but the groups in
clusters are mutually exclusive and not
exactly homogeneous;
 Multi- stage sampling is a type of cluster
sampling;

Multi-stage Sampling
 As the name suggests this method refers to a
sampling procedure which is carried out in several
stages;
 The material is regarded as made up of a number of
first stage sampling units, each made up of a
number of second stage units;
 At first the first stage units are sampled by some
suitable method such as random sampling, then, a
sample of second stage is selected from each of the
selected first stage units again by some suitable
method which may be the same or different from
the method employed for the first stage units.

Judgment Sampling
 In this method of sampling the choice of
sample items depends exclusively on the
judgment of the of the investigator;
 This method, though simple , is not scientific;
 This method is used in solving many types of
economic & business problems such as
i. When sample size is small;
ii. With the help of Judgment sampling,
estimation can be made available quickly;

Quota Sampling
 It is a type of judgment sampling;
 In a quota sample, quotas are set up
according to given criteria but within quotas
the selection of sample items depends on
personal judgment.

Convenience Sampling
 It is also known as the Chunk;
 A Chunk is a fraction of one population taken
for investigation because of its convenient
availability;
 Hence chunk is selected neither by
probability nor by judgment but by
convenience;
 Convenience samples are sometimes called
accidental samples because those entering
into the sample enter by ‘accident’;

Errors in Sampling: Discrepancies in
Statistical measure of population (Parameter) & of the
sample drawn from the same population (Statistic).
Sampling Errors Non Sampling Errors
 These are of two types
a. Biased arise due to any bias
in selection , estimation, tec
b. Unbiased errors arise due to
chance factors
 Occurs primarily due to the
following reasons:
1. Faulty selection of the
sample
2. Substitution
 May arise in the following
ways:
1. Due to negligence &
carelessness on the part of
investigator;
2. Due to incomplete
investigation & sample
survey;
3. Due to negligence & non
response on the part of the
respondents;
4. Errors in data processing.

Principles of Sampling
 Principle of “Statistical Regularity”: This
principle lays down that a moderately large
number of items chosen at random from a
large group are almost sure on an average to
possess the characteristics of the large group.
 Principle of “Inertia of Large Numbers”: this is
principle is corollary of the above principle.
It states that, other things being equal, larger
the size of sample, more accurate the results
are likely to be.

Theory of Estimation
 Statistical estimation is the procedure of
using a sample statistic to estimate a
population parameter.
 A Statistic is used to estimate a parameter is
called an estimator, and
 The value taken by the estimator is called an
estimate.
 for example, the sample mean(say 7.65) is an
estimator of the population mean.

Statistical estimation is divided
into two major categories:
Point Estimation Interval Estimation
 In point estimation, a
single statistic is used to
provide an estimate of the
population parameter;
 Change in sample will
cause deviation in
estimate;
 An interval estimate is a
range of values within
which a researcher can say
with some confidence that
the population parameter
falls;
 This range is called
confidence interval;

Qualities of a good
estimator:
 A good estimator is one which is close to the
true value of the parameter as possible.
 A good estimator must possess the following
characteristics:
i. Unbiasedness
ii. Consistency
iii. Efficiency and
iv. Sufficiency

Contd..
 Unbiasedness: this is a desirable property for a good
estimator to have; “unbiasedness” refers to the fact
that a sample mean is an unbiased estimator of a
population mean because the mean of the sampling
distribution of a sample means taken from the same
population is equal to the population mean itself;
 Efficiency: it refers to the size of the standard error
of the statistic; if two statistic are compared from a
sample of the same size & try to decide which is a
good estimator; the statistic that has a smaller
standard error or standard deviation of the sampling
distribution will be selected.

Contd..
 Consistency: a statistic is a consistent estimator
if the sample size increases, it becomes almost
certain that the value of statistic comes very
close to the value of the population parameter;
 Sufficiency: an estimator is sufficient if it makes
so much use of the information in the sample
that no other estimator could extract from the
sample additional information about the
population estimator being estimated;

Hypothesis Testing
 Hypothesis testing is based on hypothesis;
 “Hypothesis” is an assumption about an
unknown population parameter;
 Hypothesis testing is a well defined
procedure which helps in deciding objectively
whether to accept or reject the hypothesis
based on the information available from the
sample;

Hypothesis Testing Procedure
STEP 1: SET NULL & ALTERNATIVE HYPOTHESIS:
 The assumption which we want to test is called
the NULL hypothesis;
 It is symbolized as Ho;
 Null hypothesis is set with no difference (i.e.
status quo) & considered true, unless and until it
is proved by the collected sample data;
 Example, Ho :µ =500
“the null hypothesis is that the population mean is equal to 500”

Contd..
 The Alternative hypothesis, generally referred by
H1 or Ha is the logical opposite of the null
hypothesis;
 H1 :µ ≠500; ( Ho :µ >500; or H1 :µ <500)
 In other words, when null hypothesis is found to
be true, the alternative hypothesis must be false;
or vice versa;
 Rejection in null hypothesis indicates that the
difference have statistical significance &
acceptance in null hypothesis indicates that the
difference are due to chance;

STEP2: SET UP A SUITABLE
SIGNIFICANCE
 The level of significance, generally denoted by ‘α’
is the probability, which is attached to a null
hypothesis, which may be rejected even when it
is true;
 The level of significance is also known as the size
of rejection region or size of critical region;
 It is generally specified before any samples are
drawn, so that results obtained will not influence
the direction to be taken;
 Any level of significance can be adopted in
practice we either take 5% or 1% level of
significance;

Contd..
 When we take 5% level of significance then there
are about 5 chances out of 100 that we would
reject the null hypothesis when it should be
accepted i.e. we are about 95% confident that
we have made the right decision;
 When the null hypothesis is rejected at α=0.5,
test result is said to be significant;
 When the null hypothesis is rejected at α=0.01,
test result is said to be highly significant;

STEP3: DETERMINATION OF A
SUITABLE TEST STATISTIC
 Many of the test statistic that we shall
encounter will have the following form:
 Test statistic = Sample Statistic- hypothesized population parameter
Standard Error of the sample statistic

STEP4 : SET THE DECISION
RULE
 The next step for the researcher is to
establish a critical region
 Acceptance region : when null hypothesis is
accepted;
 Rejection region ; when null hypothesis is
rejected;

STEP5: COLLECT THE SAMPLE
DATA
 Data is now collected;
 Appropriate sample statistic are computed;

STEP6: ANALYSE THE DATA
 This involves selection of an appropriate
probability distribution for a particular test;
 For example, when the sample is small (n<30)
the use of normal probability distribution (Z)
is not an accurate choice, (t) distribution
needs to be used in this case;
 Some commonly used testing procedures are
Z, t, F & Chi square

STEP7: ARRIVE AT A STATISTICAL
CONCLUSION & BUSINESS IMPLICATION
 Statistical conclusion is a decision to accept
or reject a null hypothesis;
 This depends on whether the computed test
statistic falls in acceptance region or rejection
region;

Types of Errors in Hypothesis
Testing
Correct Decision
Type I error (α)
Type II error (β)
Correct Decision
Decision
Condition
Ho: true Ho: false
Accept
Reject

Z-test
 Hypothesis testing for large samples i.e. n>= 30;
 Based on the assumption that the population , from
which the sample is drawn, has a normal
distribution;
 As a result, the sampling distribution of mean is also
normally distributed;
Application:
1. For testing hypothesis about a single population
mean;
2. Hypothesis testing for the difference between two
population means;
3. Hypothesis testing for attributes.

Formula for single population
mean (finite population)
 Z = x - µ
σ
√n
Where ,
µ = population mean
x = sample mean
σ = population standard deviation
n = sample size

Q A marketing research firm conducted a survey 10 yrs ago &
found that an average household income of a particular geographic
is Rs 10000. Mr. gupta who recently joined the firm a VP expresses
doubts. For verifying the data, firm decides to take a random
sample of 200 households that yield a sample mean of Rs 11000.
assume that the population S.D is Rs 1200. verify Mr. Gupta’s
doubts using α=0.05?
 Step 1: set null & alternative hypothesis
Ho: µ=10000
H1: µ≠10000
 Step2: Determine the appropriate statistical test
Since sample size >=30, so z-test can be used for hypothesis testing
 Step3: set the level of significance
The level of significance is known (α=0.05)
 Step4: Set the decision rule
Acceptance region covers 95% of the area & rejection region 5%
Critical area can be calculated from the table ( + 1.96)

 Step5:collectthesampledata
Asampleof200respondentsyieldasamplemeanofRs11000
 Step6:Analyze the data
n=200 µ=10 000
x=11000 σ=1200
 Z = x - µ = 11000-10000 = 11.79
σ 1200
√n √ 200
 Step7:Arrive at a statistical conclusion & business
implication
Z value is 11.79 which is greater than +1.96, hence null
hypothesis is rejected and alternative hypothesis is
accepted. Hence Mr. Gupta’s doubt about household
income was right.

Formula for single population
mean (infinite population)
 Z = x - µ
σ x √N-n
√n √N-1
When population Standard deviation is
not known:
 Z = x - µ
s
√n where s= sample standard deviation

Hypothesis testing for the difference
between two population means
 Z = (x1 – x2) – (µ1 - µ2)
√ σ1
2 + σ2
2
√n1 + n2

Hypothesis for attributes
 Z = x- µ
√ npq
Where,
n=sample size
µ= np
p=probability of happening
q=chance of not happening

Q In 600 throws of 6-faced dice, odd points appeared 360
times, would you say that the dice is fair at 5% level of
significance
 Ho=dice is fair
 P=q=½
 n=600
 np=300
 x=360
Z = x-np = 360-300 =4.9
√ npq √ 600* ½*½
Z is greater than 1.96(at 5%),
Ho is rejected.
Hence, dice is not fair.

t-test
 Given by W.S. Gosset in 1908 under the pen
name of student’s test
 t-test can be applied when:
1. When a researcher draws a small random
sample (n<30) to estimate the population (µ);
2. When the population standard deviation (σ)
is unknown;
3. The population is normally distributed

Application of t-test
 Hypothesis testing for single population
mean;
 Hypothesis testing for the difference
between two independent population means;
 Hypothesis testing for the difference
between two dependent population means;

Hypothesis testing for single
population mean
 t = x - µ
s
√n
With degree of freedom (n-1)
Where ,
µ = population mean
x = sample mean
s = sample standard deviation
n = sample size

Q: Royal tyre has launched a new brand of tyres for tractors & claims
that under normal circumstances the average life of tyres is 40000 km.
a retailer wants to test this claim & has taken a random sample of 8
tyres. He tests the life of tyres under normal circumstances. The
results obtained are:
Tyres 1 2 3 4 5 6 7 8
Km 35 000 38 000 42 000 41 000 39 000 41 500 43 000 38 500
Use α = 0.05 for testing the hypothesis
Step1: Set null & alternative hypothesis
Null hypothesis: Ho: µ = 40 000
Alternative hypothesis: Ho: µ ≠ 40 000
Step2:Determine the appropriate statistical test
The sample size is less than 30, so t test will be an appropriate test
Step3:Set the level of significance
The level of significance, i.e. α = 0.05
Step4: Set the decision rule
The t distribution value for a two-tailed test is t0.025 = 2.365 for degrees of freedom 7.
so if computed t value is outside the + 2.365 range, the null hypothesis will be rejected;
otherwise accepted.

 Step 5: Collect the sample data:
 Step 6: Analyze the data
X=39750; µ=40000; s=2618.61 n=8; df=n-1=7 ;
Table value of t0.025,7=2.365
 t = x - µ =39750-40000 = -0.27
s 2618.61
√n √ 8
 Step 7: Arrive at a statistical conclusion &
Business implication
The observed t value is -0.27 which falls within the
acceptance region & hence null hypothesis is
accepted i.e. Ho: µ = 40 000
Tyres 1 2 3 4 5 6 7 8
Km 350000 38000 42000 41000 39000 41500 43000 38500

Hypothesis testing for the difference
between two independent population means
 t= (x1 – x2) – (µ1 - µ2)
σ √ 1 + 1
√n1 + n2
 σ can be estimated by pooling two sample
variances & computing pooled standard
deviation
 σ= s pooled = √ s1
2 (n1 -1) + s2
2 (n2 -1)
n1 + n2– 2

F-test
 Is named after R.A. Fisher who first studied it in
1934;
 This distribution is usually defined in terms of the
ratio of the variances of two normally distributed
populations
 The quantity
s1
2 / σ1
2
s2
2 / σ2
2
is distributed as F-distributed with (n1 – 1) & (n2 -1) degree
of freedom

Contd..
 Where
s1
2 = Σ (x1 – x1)2
(n1 – 1)
s2
2 = Σ (x2 – x2)2
(n2 – 1)

Chi Square test
 Chi square is related to categorical data (as
counting of frequencies from one or more
variables);
 Some researchers place chi-square in the
category of Non-parametric tests
 X2 test was developed by Karl Pearson in
1900;
 the symbol X stands for the Greek letter
“chi”;
 X2 is a function of its degree of freedom;

Contd..
 Being a sum of square quantities X2
distribution can never be a negative value;
 X2 is a continuous probability distribution
with range zero to infinity;
 X2 = Σ (O-E)2
E
With df =(r-1)(c-1)
E= row total x column total
Grand total

Decision rule
 If X2 calculated > X2 critical, reject the null
hypothesis;
 If X2 calculated < X2 critical, accept the null
hypothesis;

Conditions to apply chi- square
test
 Data should not be in % or ratios rather they
should be expressed in original units;
 The sample should consist of atleast 50
observations & should be drawn randomly &
individual observation in a sample should be
independent from each other;

Sampling distribution concepts

More Related Content

What's hot

Similar to Sampling distribution concepts

Recently uploaded

Sampling distribution concepts