UNIT-V
Population & Sample
Population Sample
 Population in statistics means
the whole of the information
that comes under the purview
of statistical investigation.
 It is the totality of all the
observations of a statistical
inquiry.
 It is also known as
“UNIVERSE”
 A population may be finite or
infinite
 A part of the population
selected for study is called a
SAMPLE.
 Hence, Sample is nothing but
the selection of a group of
items from a population in
such a way that this group
represents the population.
 The number of individuals
included in the finite sample
is called the SIZE OFTHE
SAMPLE.
Parameter & Statistic
Parameter Statistic
 Any statistical measure
(such as mean, mode ,
S.D.) computed from
population data is known
as PARAMETER.
 Any statistical measure
computed from sample data
is known as STATISTIC.
 STATISTIC computed from a
sample drawn from the
parent population plays an
important role in
 A)TheTheory of Estimation
 B)Testing of Hypothesis
Notations used
Notations
Statistical Measure Population Sample
Mean µ X
Standard deviation σ S
Size N n
Sampling & Sampling Theory
Sampling Sampling theory
 It is the process of selecting a
sample from the population.
 Sampling can also be defined
as the process of drawing a
sample from the population &
compiling a suitable statistic
in order to estimate the
parameter drawn from the
parent population & to test
the significance of the
statistic computed from such
sample.
 Sampling theory is based on
Sampling
 It deals with statistical
inferences drawn from sampling
results, which are of three types:
i. Statistical Estimation,
ii. Tests of significance, and
iii. Statistical inference
Objects of Sampling theory
 To estimate population parameter on the
basis of sample statistic.
 To set the limits of accuracy & degree of
confidence of the estimates of the population
parameter computed on the basis of sample
statistic.
 To test significance about the population
characteristic on the basis of sample statistic.
Methods of Sampling
Random (Probability)
Sampling Non-random Sampling
 Simple Random sampling
 Stratified Sampling
 Systematic Sampling
 Multi-stage Sampling
 Judgment Sampling
 Quota Sampling
 Convenience Sampling
Random Sampling Methods
Simple Random sampling
 This method refers to the sampling technique in
which each and every item of the population is given
a chance of being included in the sample;
 The selection is free from personal bias;
 This method is also known as method of chance
selection.
 It is sometimes also referred to as “representative
sampling” (if the sample is chosen at random and if
the size of the sample is sufficiently large, it’ll
represent all groups in the population)
Contd..
 It is a probability sampling because every
item of the population has an equal
opportunity of being selected in the sample;
 Methods of obtaining a Simple Random
Sample:
1. Lottery method
2. Table of random numbers ( a number of
random tables are available such as Tippets
table; Fisher and Yates numbers; Kendall and
Babington Smith numbers)
Stratified Sampling
 It is one of the restricted random methods which by
using available information concerning the data
attempts to design a more efficient sample than that
obtained by the simple random procedure;
 The process of stratification requires that the
populationmay be divided into homogeneous
groups or classes called strata
 then a sample may be taken from each group by
simple random method
 And the resulting sample is called a stratified sample
Contd..
 A stratified sample may be either proportional
or disproportionate.
 In a proportional stratified sampling plan, the
number of items drawn from each stratum is
proportional to the size of the strata.
 For example, if the population is divided into 4
strata, their respective sizes being 15, 10,20 ,55
% of the population and a sample of 1000 is to be
drawn, the desired proportional sample may be
obtained in the following manner:
Contd..
From stratum one 1000 (0.15) 150 items
From stratum two 1000 (0.10) 100
From stratum three 1000 (0.20) 200
From stratum four 1000 (0.55) 550
Sample Size 1000
Disproportionate Stratified sampling includes
procedures of taking an equal number of items
from each stratum irrespective of its size.
Systematic Sampling
 This method is popularly used in such cases
where a complete list of the population from
which sampling is to be drawn is available;
 The method is to select every kth item from
the list where ‘k’ refers to the sampling
interval;
 k = size of population / sample size (N/n);
 The starting point between the first & the kth
is selected at random
Contd..
 For example, if a complete list of 1000
students is available and we want to draw a
sample of 200 students; this means we must
take every 5th item.
 But the first item between one and five shall
be selected at random.
 Let it be three, now we shall go on adding 5 &
obtain numbers of desired sample.
Cluster Sampling
 It is different from stratified sampling in a
way that each strata consists of
homogeneous items but the groups in
clusters are mutually exclusive and not
exactly homogeneous;
 Multi- stage sampling is a type of cluster
sampling;
Multi-stage Sampling
 As the name suggests this method refers to a
sampling procedure which is carried out in several
stages;
 The material is regarded as made up of a number of
first stage sampling units, each made up of a
number of second stage units;
 At first the first stage units are sampled by some
suitable method such as random sampling, then, a
sample of second stage is selected from each of the
selected first stage units again by some suitable
method which may be the same or different from
the method employed for the first stage units.
Non-Random Sampling Methods
Judgment Sampling
 In this method of sampling the choice of
sample items depends exclusively on the
judgment of the of the investigator;
 This method, though simple , is not scientific;
 This method is used in solving many types of
economic & business problems such as
i. When sample size is small;
ii. With the help of Judgment sampling,
estimation can be made available quickly;
Quota Sampling
 It is a type of judgment sampling;
 In a quota sample, quotas are set up
according to given criteria but within quotas
the selection of sample items depends on
personal judgment.
Convenience Sampling
 It is also known as the Chunk;
 A Chunk is a fraction of one population taken
for investigation because of its convenient
availability;
 Hence chunk is selected neither by
probability nor by judgment but by
convenience;
 Convenience samples are sometimes called
accidental samples because those entering
into the sample enter by ‘accident’;
Errors in Sampling: Discrepancies in
Statistical measure of population (Parameter) & of the
sample drawn from the same population (Statistic).
Sampling Errors Non Sampling Errors
 These are of two types
a. Biased arise due to any bias
in selection , estimation, tec
b. Unbiased errors arise due to
chance factors
 Occurs primarily due to the
following reasons:
1. Faulty selection of the
sample
2. Substitution
 May arise in the following
ways:
1. Due to negligence &
carelessness on the part of
investigator;
2. Due to incomplete
investigation & sample
survey;
3. Due to negligence & non
response on the part of the
respondents;
4. Errors in data processing.
Principles of Sampling
 Principle of “Statistical Regularity”: This
principle lays down that a moderately large
number of items chosen at random from a
large group are almost sure on an average to
possess the characteristics of the large group.
 Principle of “Inertia of Large Numbers”: this is
principle is corollary of the above principle.
It states that, other things being equal, larger
the size of sample, more accurate the results
are likely to be.
Theory of Estimation
 Statistical estimation is the procedure of
using a sample statistic to estimate a
population parameter.
 A Statistic is used to estimate a parameter is
called an estimator, and
 The value taken by the estimator is called an
estimate.
 for example, the sample mean(say 7.65) is an
estimator of the population mean.
Statistical estimation is divided
into two major categories:
Point Estimation Interval Estimation
 In point estimation, a
single statistic is used to
provide an estimate of the
population parameter;
 Change in sample will
cause deviation in
estimate;
 An interval estimate is a
range of values within
which a researcher can say
with some confidence that
the population parameter
falls;
 This range is called
confidence interval;
Qualities of a good
estimator:
 A good estimator is one which is close to the
true value of the parameter as possible.
 A good estimator must possess the following
characteristics:
i. Unbiasedness
ii. Consistency
iii. Efficiency and
iv. Sufficiency
Contd..
 Unbiasedness: this is a desirable property for a good
estimator to have; “unbiasedness” refers to the fact
that a sample mean is an unbiased estimator of a
population mean because the mean of the sampling
distribution of a sample means taken from the same
population is equal to the population mean itself;
 Efficiency: it refers to the size of the standard error
of the statistic; if two statistic are compared from a
sample of the same size & try to decide which is a
good estimator; the statistic that has a smaller
standard error or standard deviation of the sampling
distribution will be selected.
Contd..
 Consistency: a statistic is a consistent estimator
if the sample size increases, it becomes almost
certain that the value of statistic comes very
close to the value of the population parameter;
 Sufficiency: an estimator is sufficient if it makes
so much use of the information in the sample
that no other estimator could extract from the
sample additional information about the
population estimator being estimated;
Hypothesis Testing
 Hypothesis testing is based on hypothesis;
 “Hypothesis” is an assumption about an
unknown population parameter;
 Hypothesis testing is a well defined
procedure which helps in deciding objectively
whether to accept or reject the hypothesis
based on the information available from the
sample;
Hypothesis Testing Procedure
STEP 1: SET NULL & ALTERNATIVE HYPOTHESIS:
 The assumption which we want to test is called
the NULL hypothesis;
 It is symbolized as Ho;
 Null hypothesis is set with no difference (i.e.
status quo) & considered true, unless and until it
is proved by the collected sample data;
 Example, Ho :µ =500
“the null hypothesis is that the population mean is equal to 500”
Contd..
 The Alternative hypothesis, generally referred by
H1 or Ha is the logical opposite of the null
hypothesis;
 H1 :µ ≠500; ( Ho :µ >500; or H1 :µ <500)
 In other words, when null hypothesis is found to
be true, the alternative hypothesis must be false;
or vice versa;
 Rejection in null hypothesis indicates that the
difference have statistical significance &
acceptance in null hypothesis indicates that the
difference are due to chance;
STEP2: SET UP A SUITABLE
SIGNIFICANCE
 The level of significance, generally denoted by ‘α’
is the probability, which is attached to a null
hypothesis, which may be rejected even when it
is true;
 The level of significance is also known as the size
of rejection region or size of critical region;
 It is generally specified before any samples are
drawn, so that results obtained will not influence
the direction to be taken;
 Any level of significance can be adopted in
practice we either take 5% or 1% level of
significance;
Contd..
 When we take 5% level of significance then there
are about 5 chances out of 100 that we would
reject the null hypothesis when it should be
accepted i.e. we are about 95% confident that
we have made the right decision;
 When the null hypothesis is rejected at α=0.5,
test result is said to be significant;
 When the null hypothesis is rejected at α=0.01,
test result is said to be highly significant;
STEP3: DETERMINATION OF A
SUITABLE TEST STATISTIC
 Many of the test statistic that we shall
encounter will have the following form:
 Test statistic = Sample Statistic- hypothesized population parameter
Standard Error of the sample statistic
STEP4 : SET THE DECISION
RULE
 The next step for the researcher is to
establish a critical region
 Acceptance region : when null hypothesis is
accepted;
 Rejection region ; when null hypothesis is
rejected;
STEP5: COLLECT THE SAMPLE
DATA
 Data is now collected;
 Appropriate sample statistic are computed;
STEP6: ANALYSE THE DATA
 This involves selection of an appropriate
probability distribution for a particular test;
 For example, when the sample is small (n<30)
the use of normal probability distribution (Z)
is not an accurate choice, (t) distribution
needs to be used in this case;
 Some commonly used testing procedures are
Z, t, F & Chi square
STEP7: ARRIVE AT A STATISTICAL
CONCLUSION & BUSINESS IMPLICATION
 Statistical conclusion is a decision to accept
or reject a null hypothesis;
 This depends on whether the computed test
statistic falls in acceptance region or rejection
region;
Types of Errors in Hypothesis
Testing
Correct Decision
Type I error (α)
Type II error (β)
Correct Decision
Decision
Condition
Ho: true Ho: false
Accept
Reject
Z-test
 Hypothesis testing for large samples i.e. n>= 30;
 Based on the assumption that the population , from
which the sample is drawn, has a normal
distribution;
 As a result, the sampling distribution of mean is also
normally distributed;
Application:
1. For testing hypothesis about a single population
mean;
2. Hypothesis testing for the difference between two
population means;
3. Hypothesis testing for attributes.
Formula for single population
mean (finite population)
 Z = x - µ
σ
√n
Where ,
µ = population mean
x = sample mean
σ = population standard deviation
n = sample size
Q A marketing research firm conducted a survey 10 yrs ago &
found that an average household income of a particular geographic
is Rs 10000. Mr. gupta who recently joined the firm a VP expresses
doubts. For verifying the data, firm decides to take a random
sample of 200 households that yield a sample mean of Rs 11000.
assume that the population S.D is Rs 1200. verify Mr. Gupta’s
doubts using α=0.05?
 Step 1: set null & alternative hypothesis
Ho: µ=10000
H1: µ≠10000
 Step2: Determine the appropriate statistical test
Since sample size >=30, so z-test can be used for hypothesis testing
 Step3: set the level of significance
The level of significance is known (α=0.05)
 Step4: Set the decision rule
Acceptance region covers 95% of the area & rejection region 5%
Critical area can be calculated from the table ( + 1.96)
 Step5:collectthesampledata
Asampleof200respondentsyieldasamplemeanofRs11000
 Step6:Analyze the data
n=200 µ=10 000
x=11000 σ=1200
 Z = x - µ = 11000-10000 = 11.79
σ 1200
√n √ 200
 Step7:Arrive at a statistical conclusion & business
implication
Z value is 11.79 which is greater than +1.96, hence null
hypothesis is rejected and alternative hypothesis is
accepted. Hence Mr. Gupta’s doubt about household
income was right.
Formula for single population
mean (infinite population)
 Z = x - µ
σ x √N-n
√n √N-1
When population Standard deviation is
not known:
 Z = x - µ
s
√n where s= sample standard deviation
Hypothesis testing for the difference
between two population means
 Z = (x1 – x2) – (µ1 - µ2)
√ σ1
2 + σ2
2
√n1 + n2
Hypothesis for attributes
 Z = x- µ
√ npq
Where,
n=sample size
µ= np
p=probability of happening
q=chance of not happening
Q In 600 throws of 6-faced dice, odd points appeared 360
times, would you say that the dice is fair at 5% level of
significance
 Ho=dice is fair
 P=q=½
 n=600
 np=300
 x=360
Z = x-np = 360-300 =4.9
√ npq √ 600* ½*½
Z is greater than 1.96(at 5%),
Ho is rejected.
Hence, dice is not fair.
t-test
 Given by W.S. Gosset in 1908 under the pen
name of student’s test
 t-test can be applied when:
1. When a researcher draws a small random
sample (n<30) to estimate the population (µ);
2. When the population standard deviation (σ)
is unknown;
3. The population is normally distributed
Application of t-test
 Hypothesis testing for single population
mean;
 Hypothesis testing for the difference
between two independent population means;
 Hypothesis testing for the difference
between two dependent population means;
Hypothesis testing for single
population mean
 t = x - µ
s
√n
With degree of freedom (n-1)
Where ,
µ = population mean
x = sample mean
s = sample standard deviation
n = sample size
Q: Royal tyre has launched a new brand of tyres for tractors & claims
that under normal circumstances the average life of tyres is 40000 km.
a retailer wants to test this claim & has taken a random sample of 8
tyres. He tests the life of tyres under normal circumstances. The
results obtained are:
Tyres 1 2 3 4 5 6 7 8
Km 35 000 38 000 42 000 41 000 39 000 41 500 43 000 38 500
Use α = 0.05 for testing the hypothesis
Step1: Set null & alternative hypothesis
Null hypothesis: Ho: µ = 40 000
Alternative hypothesis: Ho: µ ≠ 40 000
Step2:Determine the appropriate statistical test
The sample size is less than 30, so t test will be an appropriate test
Step3:Set the level of significance
The level of significance, i.e. α = 0.05
Step4: Set the decision rule
The t distribution value for a two-tailed test is t0.025 = 2.365 for degrees of freedom 7.
so if computed t value is outside the + 2.365 range, the null hypothesis will be rejected;
otherwise accepted.
 Step 5: Collect the sample data:
 Step 6: Analyze the data
X=39750; µ=40000; s=2618.61 n=8; df=n-1=7 ;
Table value of t0.025,7=2.365
 t = x - µ =39750-40000 = -0.27
s 2618.61
√n √ 8
 Step 7: Arrive at a statistical conclusion &
Business implication
The observed t value is -0.27 which falls within the
acceptance region & hence null hypothesis is
accepted i.e. Ho: µ = 40 000
Tyres 1 2 3 4 5 6 7 8
Km 350000 38000 42000 41000 39000 41500 43000 38500
Hypothesis testing for the difference
between two independent population means
 t= (x1 – x2) – (µ1 - µ2)
σ √ 1 + 1
√n1 + n2
 σ can be estimated by pooling two sample
variances & computing pooled standard
deviation
 σ= s pooled = √ s1
2 (n1 -1) + s2
2 (n2 -1)
n1 + n2– 2
F-test
 Is named after R.A. Fisher who first studied it in
1934;
 This distribution is usually defined in terms of the
ratio of the variances of two normally distributed
populations
 The quantity
s1
2 / σ1
2
s2
2 / σ2
2
is distributed as F-distributed with (n1 – 1) & (n2 -1) degree
of freedom
Contd..
 Where
s1
2 = Σ (x1 – x1)2
(n1 – 1)
s2
2 = Σ (x2 – x2)2
(n2 – 1)
Chi Square test
 Chi square is related to categorical data (as
counting of frequencies from one or more
variables);
 Some researchers place chi-square in the
category of Non-parametric tests
 X2 test was developed by Karl Pearson in
1900;
 the symbol X stands for the Greek letter
“chi”;
 X2 is a function of its degree of freedom;
Contd..
 Being a sum of square quantities X2
distribution can never be a negative value;
 X2 is a continuous probability distribution
with range zero to infinity;
 X2 = Σ (O-E)2
E
With df =(r-1)(c-1)
E= row total x column total
Grand total
Decision rule
 If X2 calculated > X2 critical, reject the null
hypothesis;
 If X2 calculated < X2 critical, accept the null
hypothesis;
Conditions to apply chi- square
test
 Data should not be in % or ratios rather they
should be expressed in original units;
 The sample should consist of atleast 50
observations & should be drawn randomly &
individual observation in a sample should be
independent from each other;

Sampling distribution concepts

  • 1.
  • 2.
    Population & Sample PopulationSample  Population in statistics means the whole of the information that comes under the purview of statistical investigation.  It is the totality of all the observations of a statistical inquiry.  It is also known as “UNIVERSE”  A population may be finite or infinite  A part of the population selected for study is called a SAMPLE.  Hence, Sample is nothing but the selection of a group of items from a population in such a way that this group represents the population.  The number of individuals included in the finite sample is called the SIZE OFTHE SAMPLE.
  • 3.
    Parameter & Statistic ParameterStatistic  Any statistical measure (such as mean, mode , S.D.) computed from population data is known as PARAMETER.  Any statistical measure computed from sample data is known as STATISTIC.  STATISTIC computed from a sample drawn from the parent population plays an important role in  A)TheTheory of Estimation  B)Testing of Hypothesis
  • 4.
    Notations used Notations Statistical MeasurePopulation Sample Mean µ X Standard deviation σ S Size N n
  • 5.
    Sampling & SamplingTheory Sampling Sampling theory  It is the process of selecting a sample from the population.  Sampling can also be defined as the process of drawing a sample from the population & compiling a suitable statistic in order to estimate the parameter drawn from the parent population & to test the significance of the statistic computed from such sample.  Sampling theory is based on Sampling  It deals with statistical inferences drawn from sampling results, which are of three types: i. Statistical Estimation, ii. Tests of significance, and iii. Statistical inference
  • 6.
    Objects of Samplingtheory  To estimate population parameter on the basis of sample statistic.  To set the limits of accuracy & degree of confidence of the estimates of the population parameter computed on the basis of sample statistic.  To test significance about the population characteristic on the basis of sample statistic.
  • 7.
    Methods of Sampling Random(Probability) Sampling Non-random Sampling  Simple Random sampling  Stratified Sampling  Systematic Sampling  Multi-stage Sampling  Judgment Sampling  Quota Sampling  Convenience Sampling
  • 8.
  • 9.
    Simple Random sampling This method refers to the sampling technique in which each and every item of the population is given a chance of being included in the sample;  The selection is free from personal bias;  This method is also known as method of chance selection.  It is sometimes also referred to as “representative sampling” (if the sample is chosen at random and if the size of the sample is sufficiently large, it’ll represent all groups in the population)
  • 10.
    Contd..  It isa probability sampling because every item of the population has an equal opportunity of being selected in the sample;  Methods of obtaining a Simple Random Sample: 1. Lottery method 2. Table of random numbers ( a number of random tables are available such as Tippets table; Fisher and Yates numbers; Kendall and Babington Smith numbers)
  • 11.
    Stratified Sampling  Itis one of the restricted random methods which by using available information concerning the data attempts to design a more efficient sample than that obtained by the simple random procedure;  The process of stratification requires that the populationmay be divided into homogeneous groups or classes called strata  then a sample may be taken from each group by simple random method  And the resulting sample is called a stratified sample
  • 12.
    Contd..  A stratifiedsample may be either proportional or disproportionate.  In a proportional stratified sampling plan, the number of items drawn from each stratum is proportional to the size of the strata.  For example, if the population is divided into 4 strata, their respective sizes being 15, 10,20 ,55 % of the population and a sample of 1000 is to be drawn, the desired proportional sample may be obtained in the following manner:
  • 13.
    Contd.. From stratum one1000 (0.15) 150 items From stratum two 1000 (0.10) 100 From stratum three 1000 (0.20) 200 From stratum four 1000 (0.55) 550 Sample Size 1000 Disproportionate Stratified sampling includes procedures of taking an equal number of items from each stratum irrespective of its size.
  • 14.
    Systematic Sampling  Thismethod is popularly used in such cases where a complete list of the population from which sampling is to be drawn is available;  The method is to select every kth item from the list where ‘k’ refers to the sampling interval;  k = size of population / sample size (N/n);  The starting point between the first & the kth is selected at random
  • 15.
    Contd..  For example,if a complete list of 1000 students is available and we want to draw a sample of 200 students; this means we must take every 5th item.  But the first item between one and five shall be selected at random.  Let it be three, now we shall go on adding 5 & obtain numbers of desired sample.
  • 16.
    Cluster Sampling  Itis different from stratified sampling in a way that each strata consists of homogeneous items but the groups in clusters are mutually exclusive and not exactly homogeneous;  Multi- stage sampling is a type of cluster sampling;
  • 17.
    Multi-stage Sampling  Asthe name suggests this method refers to a sampling procedure which is carried out in several stages;  The material is regarded as made up of a number of first stage sampling units, each made up of a number of second stage units;  At first the first stage units are sampled by some suitable method such as random sampling, then, a sample of second stage is selected from each of the selected first stage units again by some suitable method which may be the same or different from the method employed for the first stage units.
  • 18.
  • 19.
    Judgment Sampling  Inthis method of sampling the choice of sample items depends exclusively on the judgment of the of the investigator;  This method, though simple , is not scientific;  This method is used in solving many types of economic & business problems such as i. When sample size is small; ii. With the help of Judgment sampling, estimation can be made available quickly;
  • 20.
    Quota Sampling  Itis a type of judgment sampling;  In a quota sample, quotas are set up according to given criteria but within quotas the selection of sample items depends on personal judgment.
  • 21.
    Convenience Sampling  Itis also known as the Chunk;  A Chunk is a fraction of one population taken for investigation because of its convenient availability;  Hence chunk is selected neither by probability nor by judgment but by convenience;  Convenience samples are sometimes called accidental samples because those entering into the sample enter by ‘accident’;
  • 22.
    Errors in Sampling:Discrepancies in Statistical measure of population (Parameter) & of the sample drawn from the same population (Statistic). Sampling Errors Non Sampling Errors  These are of two types a. Biased arise due to any bias in selection , estimation, tec b. Unbiased errors arise due to chance factors  Occurs primarily due to the following reasons: 1. Faulty selection of the sample 2. Substitution  May arise in the following ways: 1. Due to negligence & carelessness on the part of investigator; 2. Due to incomplete investigation & sample survey; 3. Due to negligence & non response on the part of the respondents; 4. Errors in data processing.
  • 23.
    Principles of Sampling Principle of “Statistical Regularity”: This principle lays down that a moderately large number of items chosen at random from a large group are almost sure on an average to possess the characteristics of the large group.  Principle of “Inertia of Large Numbers”: this is principle is corollary of the above principle. It states that, other things being equal, larger the size of sample, more accurate the results are likely to be.
  • 24.
    Theory of Estimation Statistical estimation is the procedure of using a sample statistic to estimate a population parameter.  A Statistic is used to estimate a parameter is called an estimator, and  The value taken by the estimator is called an estimate.  for example, the sample mean(say 7.65) is an estimator of the population mean.
  • 25.
    Statistical estimation isdivided into two major categories: Point Estimation Interval Estimation  In point estimation, a single statistic is used to provide an estimate of the population parameter;  Change in sample will cause deviation in estimate;  An interval estimate is a range of values within which a researcher can say with some confidence that the population parameter falls;  This range is called confidence interval;
  • 26.
    Qualities of agood estimator:  A good estimator is one which is close to the true value of the parameter as possible.  A good estimator must possess the following characteristics: i. Unbiasedness ii. Consistency iii. Efficiency and iv. Sufficiency
  • 27.
    Contd..  Unbiasedness: thisis a desirable property for a good estimator to have; “unbiasedness” refers to the fact that a sample mean is an unbiased estimator of a population mean because the mean of the sampling distribution of a sample means taken from the same population is equal to the population mean itself;  Efficiency: it refers to the size of the standard error of the statistic; if two statistic are compared from a sample of the same size & try to decide which is a good estimator; the statistic that has a smaller standard error or standard deviation of the sampling distribution will be selected.
  • 28.
    Contd..  Consistency: astatistic is a consistent estimator if the sample size increases, it becomes almost certain that the value of statistic comes very close to the value of the population parameter;  Sufficiency: an estimator is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample additional information about the population estimator being estimated;
  • 29.
    Hypothesis Testing  Hypothesistesting is based on hypothesis;  “Hypothesis” is an assumption about an unknown population parameter;  Hypothesis testing is a well defined procedure which helps in deciding objectively whether to accept or reject the hypothesis based on the information available from the sample;
  • 30.
    Hypothesis Testing Procedure STEP1: SET NULL & ALTERNATIVE HYPOTHESIS:  The assumption which we want to test is called the NULL hypothesis;  It is symbolized as Ho;  Null hypothesis is set with no difference (i.e. status quo) & considered true, unless and until it is proved by the collected sample data;  Example, Ho :µ =500 “the null hypothesis is that the population mean is equal to 500”
  • 31.
    Contd..  The Alternativehypothesis, generally referred by H1 or Ha is the logical opposite of the null hypothesis;  H1 :µ ≠500; ( Ho :µ >500; or H1 :µ <500)  In other words, when null hypothesis is found to be true, the alternative hypothesis must be false; or vice versa;  Rejection in null hypothesis indicates that the difference have statistical significance & acceptance in null hypothesis indicates that the difference are due to chance;
  • 32.
    STEP2: SET UPA SUITABLE SIGNIFICANCE  The level of significance, generally denoted by ‘α’ is the probability, which is attached to a null hypothesis, which may be rejected even when it is true;  The level of significance is also known as the size of rejection region or size of critical region;  It is generally specified before any samples are drawn, so that results obtained will not influence the direction to be taken;  Any level of significance can be adopted in practice we either take 5% or 1% level of significance;
  • 33.
    Contd..  When wetake 5% level of significance then there are about 5 chances out of 100 that we would reject the null hypothesis when it should be accepted i.e. we are about 95% confident that we have made the right decision;  When the null hypothesis is rejected at α=0.5, test result is said to be significant;  When the null hypothesis is rejected at α=0.01, test result is said to be highly significant;
  • 34.
    STEP3: DETERMINATION OFA SUITABLE TEST STATISTIC  Many of the test statistic that we shall encounter will have the following form:  Test statistic = Sample Statistic- hypothesized population parameter Standard Error of the sample statistic
  • 35.
    STEP4 : SETTHE DECISION RULE  The next step for the researcher is to establish a critical region  Acceptance region : when null hypothesis is accepted;  Rejection region ; when null hypothesis is rejected;
  • 36.
    STEP5: COLLECT THESAMPLE DATA  Data is now collected;  Appropriate sample statistic are computed;
  • 37.
    STEP6: ANALYSE THEDATA  This involves selection of an appropriate probability distribution for a particular test;  For example, when the sample is small (n<30) the use of normal probability distribution (Z) is not an accurate choice, (t) distribution needs to be used in this case;  Some commonly used testing procedures are Z, t, F & Chi square
  • 38.
    STEP7: ARRIVE ATA STATISTICAL CONCLUSION & BUSINESS IMPLICATION  Statistical conclusion is a decision to accept or reject a null hypothesis;  This depends on whether the computed test statistic falls in acceptance region or rejection region;
  • 39.
    Types of Errorsin Hypothesis Testing Correct Decision Type I error (α) Type II error (β) Correct Decision Decision Condition Ho: true Ho: false Accept Reject
  • 40.
    Z-test  Hypothesis testingfor large samples i.e. n>= 30;  Based on the assumption that the population , from which the sample is drawn, has a normal distribution;  As a result, the sampling distribution of mean is also normally distributed; Application: 1. For testing hypothesis about a single population mean; 2. Hypothesis testing for the difference between two population means; 3. Hypothesis testing for attributes.
  • 41.
    Formula for singlepopulation mean (finite population)  Z = x - µ σ √n Where , µ = population mean x = sample mean σ = population standard deviation n = sample size
  • 42.
    Q A marketingresearch firm conducted a survey 10 yrs ago & found that an average household income of a particular geographic is Rs 10000. Mr. gupta who recently joined the firm a VP expresses doubts. For verifying the data, firm decides to take a random sample of 200 households that yield a sample mean of Rs 11000. assume that the population S.D is Rs 1200. verify Mr. Gupta’s doubts using α=0.05?  Step 1: set null & alternative hypothesis Ho: µ=10000 H1: µ≠10000  Step2: Determine the appropriate statistical test Since sample size >=30, so z-test can be used for hypothesis testing  Step3: set the level of significance The level of significance is known (α=0.05)  Step4: Set the decision rule Acceptance region covers 95% of the area & rejection region 5% Critical area can be calculated from the table ( + 1.96)
  • 43.
     Step5:collectthesampledata Asampleof200respondentsyieldasamplemeanofRs11000  Step6:Analyzethe data n=200 µ=10 000 x=11000 σ=1200  Z = x - µ = 11000-10000 = 11.79 σ 1200 √n √ 200  Step7:Arrive at a statistical conclusion & business implication Z value is 11.79 which is greater than +1.96, hence null hypothesis is rejected and alternative hypothesis is accepted. Hence Mr. Gupta’s doubt about household income was right.
  • 44.
    Formula for singlepopulation mean (infinite population)  Z = x - µ σ x √N-n √n √N-1 When population Standard deviation is not known:  Z = x - µ s √n where s= sample standard deviation
  • 45.
    Hypothesis testing forthe difference between two population means  Z = (x1 – x2) – (µ1 - µ2) √ σ1 2 + σ2 2 √n1 + n2
  • 46.
    Hypothesis for attributes Z = x- µ √ npq Where, n=sample size µ= np p=probability of happening q=chance of not happening
  • 47.
    Q In 600throws of 6-faced dice, odd points appeared 360 times, would you say that the dice is fair at 5% level of significance  Ho=dice is fair  P=q=½  n=600  np=300  x=360 Z = x-np = 360-300 =4.9 √ npq √ 600* ½*½ Z is greater than 1.96(at 5%), Ho is rejected. Hence, dice is not fair.
  • 48.
    t-test  Given byW.S. Gosset in 1908 under the pen name of student’s test  t-test can be applied when: 1. When a researcher draws a small random sample (n<30) to estimate the population (µ); 2. When the population standard deviation (σ) is unknown; 3. The population is normally distributed
  • 49.
    Application of t-test Hypothesis testing for single population mean;  Hypothesis testing for the difference between two independent population means;  Hypothesis testing for the difference between two dependent population means;
  • 50.
    Hypothesis testing forsingle population mean  t = x - µ s √n With degree of freedom (n-1) Where , µ = population mean x = sample mean s = sample standard deviation n = sample size
  • 51.
    Q: Royal tyrehas launched a new brand of tyres for tractors & claims that under normal circumstances the average life of tyres is 40000 km. a retailer wants to test this claim & has taken a random sample of 8 tyres. He tests the life of tyres under normal circumstances. The results obtained are: Tyres 1 2 3 4 5 6 7 8 Km 35 000 38 000 42 000 41 000 39 000 41 500 43 000 38 500 Use α = 0.05 for testing the hypothesis Step1: Set null & alternative hypothesis Null hypothesis: Ho: µ = 40 000 Alternative hypothesis: Ho: µ ≠ 40 000 Step2:Determine the appropriate statistical test The sample size is less than 30, so t test will be an appropriate test Step3:Set the level of significance The level of significance, i.e. α = 0.05 Step4: Set the decision rule The t distribution value for a two-tailed test is t0.025 = 2.365 for degrees of freedom 7. so if computed t value is outside the + 2.365 range, the null hypothesis will be rejected; otherwise accepted.
  • 52.
     Step 5:Collect the sample data:  Step 6: Analyze the data X=39750; µ=40000; s=2618.61 n=8; df=n-1=7 ; Table value of t0.025,7=2.365  t = x - µ =39750-40000 = -0.27 s 2618.61 √n √ 8  Step 7: Arrive at a statistical conclusion & Business implication The observed t value is -0.27 which falls within the acceptance region & hence null hypothesis is accepted i.e. Ho: µ = 40 000 Tyres 1 2 3 4 5 6 7 8 Km 350000 38000 42000 41000 39000 41500 43000 38500
  • 53.
    Hypothesis testing forthe difference between two independent population means  t= (x1 – x2) – (µ1 - µ2) σ √ 1 + 1 √n1 + n2  σ can be estimated by pooling two sample variances & computing pooled standard deviation  σ= s pooled = √ s1 2 (n1 -1) + s2 2 (n2 -1) n1 + n2– 2
  • 54.
    F-test  Is namedafter R.A. Fisher who first studied it in 1934;  This distribution is usually defined in terms of the ratio of the variances of two normally distributed populations  The quantity s1 2 / σ1 2 s2 2 / σ2 2 is distributed as F-distributed with (n1 – 1) & (n2 -1) degree of freedom
  • 55.
    Contd..  Where s1 2 =Σ (x1 – x1)2 (n1 – 1) s2 2 = Σ (x2 – x2)2 (n2 – 1)
  • 56.
    Chi Square test Chi square is related to categorical data (as counting of frequencies from one or more variables);  Some researchers place chi-square in the category of Non-parametric tests  X2 test was developed by Karl Pearson in 1900;  the symbol X stands for the Greek letter “chi”;  X2 is a function of its degree of freedom;
  • 57.
    Contd..  Being asum of square quantities X2 distribution can never be a negative value;  X2 is a continuous probability distribution with range zero to infinity;  X2 = Σ (O-E)2 E With df =(r-1)(c-1) E= row total x column total Grand total
  • 58.
    Decision rule  IfX2 calculated > X2 critical, reject the null hypothesis;  If X2 calculated < X2 critical, accept the null hypothesis;
  • 59.
    Conditions to applychi- square test  Data should not be in % or ratios rather they should be expressed in original units;  The sample should consist of atleast 50 observations & should be drawn randomly & individual observation in a sample should be independent from each other;