2. What is Chi Square test ?
The χ²(pronounced as Chi-square) was first
used by Karl Pearson in the year 1900.
CHI SQUARE TEST is a non parametric test
not based on any assumption or distribution
of any variable.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
3. What is Chi Square test ?
This statistical test follows a specific
distribution known as chi square distribution.
In general the χ² test, is used to measure the
differences between what is observed and
what is expected according to an assumed
hypothesis is called the chi-square test.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
4. The formula for computing chi-square is:
χ² =
𝑂 − 𝐸 )2
𝐸
where’s O = observed frequency, E = expected or theoretical frequency.
The quantity χ² describes the magnitude of discrepancy
between theory and observation,
i.e., with the help of χ² test we can know whether a
given discrepancy between theory and observation can
be attributed to fit the observed facts.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
5. APPLICATIONS OF A CHI SQUARE TEST
This test can be used for :
1. Tests of Hypothesis Concerning Variance
2. Significance test
3. Test of independence of attributes
4. Goodness of fit of distributions
5. Test of homogeneity.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
6. 1) Tests of Hypothesis Concerning Variance
For a normally distributed population which has
population variance of σ2 , if we draw a random
sample of n size, with sample variance of s2 , then
i.e. the sample follows a χ² with n-1 degree of
freedom. The d.f. for χ² distribution is denoted by ʋ.
Or ʋ = n-1
χ2
=
𝑛 − 1)𝑠2
𝜎2
=
𝑥 − 𝑥 )2
𝜎2
~χ𝑛−1
2
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
7. 1) Tests of Hypothesis Concerning Variance
In testing hypothesis about the variance of a
normally distributed population,
the null hypothesis is
H0: 𝜎2
= 𝜎0
2
where 𝜎0
2
is some specified value of the population
variance.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
8. Illustration 1:
Weights in kilograms of 10 shipments are given
below:
38, 40, 45, 53, 47, 43, 55, 48, 52, 49.
Can we say that variance of the distribution of
weight of all shipments from which the above
sample of 10 shipments was drawn is equal to 20
square kilogram?
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
9. Solution :
Let the null hypothesis be that the variance of the distribution of shipments
weight is 20 square kilogram,
Or H0: 𝜎2 = 20
Mean value of sample 𝑥=
𝑥
𝑛
=
470
10
= 47
Weight (in Kg)
x
(𝑥 − 𝑥 ) 𝑥 − 𝑥 2
38 -9 81
40 -7 49
45 -2 4
53 +6 36
47 0 0
43 -4 16
55 +8 64
48 +1 1
52 +5 25
49 +2 4
∑𝑥 = 470 ∑(𝑥 − 𝑥 )= 0 ∑(𝑥 − 𝑥 )2= 280
χ2 =
𝑛 − 1)𝑠2
𝜎2
=
𝑥 − 𝑥 )2
𝜎2
=
280
20
= 14
From χ² table value, the value for 9 d.f. at 5%
level of significance = 16.919.
And degree of freedom for χ² distribution
ʋ = n−1 = 10 − 1 = 9
Since the calculated value of χ² is less than the tabulated
value of χ², it is insignificant and the null hypothesis is
accepted. That the variance of the distribution of weights
of all shipments in the population is 20 kilograms
10. 2) SIGNIFICANCE TEST
This test enables us to test the significance of null
hypothesis using χ² test.
Illustration 2 :-
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
12. 3) TEST OF INDEPENDENCE OF ATTRIBUTES
This test enables us to explain whether or not two
attributes are associated.
>For instance, we may be interested in knowing
whether a new medicine is effective in controlling fever
or not, χ² test is useful.
In such a situation, we proceed with the null hypothesis
that the two attributes (viz., new medicine and control
of fever) are independent which means that new
medicine is not effective in controlling fever.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
13. 2) TEST OF INDEPENDENCE OF ATTRIBUTES
In the test of independence, the population and
sample are classified according to some
attributes.
To test the independence of attributes we uses a
contingency table, as :
Let us designate the two attributes as A and B
where, attribute A is assumed to have r categories
and attribute B is assumed to have c categories.
Furthermore, assume the total number of
observations in the problem is N.
A representation of these observations is shown in
a table where O, represents the observation in the
ith row and jth column. Such a table in the matrix
form is called a contingency table, as shown
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
16. fe : for male student choosing French
language = 55 x 60 / 90 = 36.67
Similarly fe : for Female student choosing
French language = 35 x 60 / 90 = 23.33
fe : for male student choosing Russian
language = 55 x 30 / 90 = 18.33
fe : for female student choosing Russian
language = 35 x 30 / 90 = 11.67
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
17. O E (O-E)2 (O-E)2 / E
39 36.67 5.4289 0.148047
21 23.33 5.4289 0.2327
16 18.33 5.4289 0.296176
14 11.67 5.4289 0.465201
Σ(O-E)2 / E 1.142125
χ² =
𝑂−𝐸 )2
𝐸
= 1.142
For contingency table, the degree of freedom is calculated as :
ʋ = (r−1)(c−1) = (2−1) (2−1) = 1
From 𝜒² table value, the value for ʋ = 1 (d.f. = 1) at 5% level of
significance = 3.84
Since The calculated value of 𝜒² is less than the table value. The null
hypothesis is accepted. Hence, there is no relationship b/w choice of
language and gender. @Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
18. Exercise:
1. A sample of 200 persons with a particular disease was selected. Out of
these, 100 were given a drug and the others were not given any drug.
The results are as follows:
Test, whether the drug is effective or not.
No. of persons
Drug No Drug Total
Cured 65 55 120
Not cured 35 45 80
Total 100 100 200
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
19. Exercise:
2. A certain drug is claimed to be effective in curing cold. In an
experiment on 500 persons with cold, half of them were given the drug
and half of them were given the sugar pills. The patients' reactions to
the treatment are recorded in the following table:
On the basis of this data, can it be concluded that there is a
significant difference in the effect of the drug and sugar pills
Helped Harmed No effect Total
Drug 150 30 70 250
Suger pills 130 40 80 250
Total 280 70 150 500
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
20. 4) TEST OF GOODNESS OF FIT OF DISTRIBUTIONS:
Tests of goodness of fit are used when we want
to determine whether an actual sample
distribution matches a known theoretical
distribution.
it enables us to ascertain how well the
theoretical distribution such as Binomial,
Poisson, Normal, etc., fit empirical distribution,
i.e., how does the sample data fit a distribution
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
21. 4) TEST OF GOODNESS OF FIT OF DISTRIBUTIONS:
The χ² test formula for goodness of fit is:
where’s O = observed frequency, E = expected or theoretical frequency
The null hypothesis is
H0: = the sample is drawn from the theoretical population distribution,
The alternative hypothesis is
Ha: = the sample is not drawn from the theoretical population distribution.
χ² =
𝑂 − 𝐸 )2
𝐸
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
22. If χ² (calculated) < χ² (tabulated) with (n-1) d.f,
then null hypothesis is accepted.
If χ² (calculated) > χ² (tabulated) with (n-1) d.f,
then null hypothesis is rejected.
if null hypothesis is accepted, then it can be
concluded that the given distribution follows
theoretical distribution.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
23. Illustration 4:
The number of spare parts requires in a factory
was found to vary from day to day. In a sample
study, the following information was obtained:
Test the hypothesis that the number of parts
demanded does depend on the day of the week.
Days Mon. Tue. Wed. Thu Fri Sat Total
No. of parts
demanded : 1124 1125 1110 1120 1125 1115 6720
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
24. Illustration 5:
A survey of 320 families with 5 children each,
revealed the following distribution regarding birth
of Childs:
Is this result consistent with the hypothesis that
boys and girls births are equally probable?
No. of boys
born : 5 4 3 2 1 0
No. of girls
born : 0 1 2 3 4 5
Families : 14 56 110 88 40 12
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
25. 5) TEST OF HOMOGENITY
This test explore the proposition that several
populations are homogeneous with respect to some
characteristic of interest.
i.e. used to test whether the occurrence of events
follow uniformity or not
e.g. the admission of patients in government hospital in
all days of week is uniform or not can be tested with
the help of chi square test.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
26. 5) TEST OF HOMOGENITY
The analytical procedure is same as that discussed for
test of independence.
It is different from independence test in following ways:
In Independence test we are concerned with the
problem whether the two attributes are independent or
not.
while in tests of homogeneity, we are concerned
whether the different samples come from the same
population or same or not
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
27. 5) TEST OF HOMOGENITY
We use contingency table and make calculations just as
independence test.
If χ² (calculated) < χ² (tabulated), then null hypothesis
is accepted,
and it can be concluded that there is a homogeneity or
uniformity in the occurrence of the events.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
28. Illustration 6:
A random sample of 400 persons was selected from three
age groups, and each person was asked to specify which
TV program they prefer out of 3 type of Programs. The
results are shown in the following table:
Test the hypothesis that the populations are homogeneous
with respect to the types of television program, people
prefer.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
29. Conditions for the Application of χ² Test
The following five basic conditions must be met in order for
chi-square analysis to be applied:
(1) The experimental data (sample observation) must be
independent of each other.
(2) The sample data must be drawn at random from the
target population.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
30. (3) The data should be expressed in original units for
convenience of comparison and not in percentage or ratio
form.
(4) There should not be less than five observations.
(5) For less than 5 observations, the value of χ² shall be
overestimated and result in too many rejections of the null
hypothesis.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
31. YATE'S CORRECTION
The chi-square distribution is continuous distribution
used with discrete data from a contingency table.
When the expected frequencies are large, this
approximate procedure is appropriate.
In a 2 × 2 table, when expected frequencies are small,
a correction was proposed by F. Yates in the year 1934
called "Yates's correction for continuity". The correction
consists of:
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
32. YATE'S CORRECTION
The correction consists of:
where |O-E| means the absolute difference.
In general, the correction is made only when the
number of degrees of freedom is ʋ = 1.
For large samples, this yields practically the same
results as the uncorrected χ².
χ2
corrected = .
𝑂1 − 𝐸1 − 0.5)2
𝐸1
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
33. IMPORTANT CHARACTERISTICS OF A CHI SQUARE TEST
This test (as a non-parametric test) is based on
frequencies and not on the parameters like mean
and standard deviation.
The test is used for testing the hypothesis and is
not useful for estimation.
This test can also be applied to a complex
contingency table with several classes and as
such is a very useful test in research work.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
34. IMPORTANT CHARACTERISTICS OF A CHI SQUARE TEST
This test is an important non-parametric test as
no rigid assumptions are necessary in regard to
the type of population, no need of parameter
values and relatively less mathematical details
are involved.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
35. STEPS INVOLVED IN CALCULATING 𝛘²
1. Calculate the expected frequencies and the observed
frequencies:-
A. For Independence & Homogeneity test use
contingency table and formula for Expected
frequencies fe as :
B. For Goodness of fit :
a) If no distribution given use Uniform distribution =
1
𝑛
𝑓𝑜)
b) If particular distribution is given(binomial, poisson etc.) use
formula of that distribution to calculate expected frequency.
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
36. STEPS INVOLVED IN CALCULATING 𝛘²
2. Then χ² is calculated as follows:
3. Calculate degree of freedom for χ² distribution
4. Compare calculate χ² with critical value of χ² at ʋ d.f.
χ² =
𝑂 − 𝐸 )2
𝐸
for goodness of fit : ʋ = n−1
for independence and homogeneity = (r-1 ) (c- 1)
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
37. Exercise:
1. The figures given below are (a) the theoretical frequencies of a
distribution and (b) the frequencies of the distribution having the same
mean, standard deviation and total frequency as in (a):
Test, whether the normal distribution provides a good fit to the data ?
(a) 1 12 66 220 495 792 924 792 495 220 66 12 1
(b) 2 15 66 210 484 799 943 799 484 210 66 15 2
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM
38. Exercise:
2. 1,000 workers, are there in a factory. They get exposed to an
epidemic. The workers get affected and remain unaffected is given. The
factory administration launched a vaccination campaign. The data of
epidemic attack and vaccination is given as :
On the basis of this information, can it be said that vaccination and
epidemic contamination(affected by epidemic) are independent?
Vaccinated workers Not- vaccinated Total
Epedemic effect 200 500 700
not affected 200 100 300
Total 400 600 1000
@Ravindra Nath Shukla (PhD Scholar) ABV-IIITM