1. Kuvempu University
Presented by
SHIFA NAZ
1st Year (1st Sem)
Dept. of Microbiology
Kuvempu University,
Shankaraghatta.
Dept. of PG Studies and Research in Microbiology
Kuvempu University, JananaSahyadri , Shankaraghatta.
Seminar on
Chi-Square Test
Guided by
Dr. N. B. Thippeswamy
Chairman
Dept. of Microbiology
Kuvempu University,
Shankaraghatta.
2. Content
• INTRODUCTION
• DEGREES OF FREEDOM
• CHARACTERISTICS OF CHI-SQUARE TEST
• ASSUMPTIONS FOR VALIDITY OF CHI-SQUARE TEST
• APPLICATIONS OF CHI-SQUARE TEST
• EXAMPLES
• SUMMARY
• CONCLUSION
• REFERENCE
3. Chi- square test
INTRODUCTION
The application of various tests of significance, such as z, t and F test are
applied to the quantitative data like the plant height, the plant yield etc.
These tests are based on the assumption that the samples are drawn from the
normally distributed populations. However, there are many situations in
which it is not possible to make any dependable assumption about the
parent distribution from which samples have been drawn. Therefore, in the
agricultural, as well as the biological research, apart from the quantitative
characters, one has to deal with the qualitative data, like the flower colour
or the seed colour in which observations are classified in a particular
category, class or group.
4. • The results of breeding experiments and genetical analysis come
under this type of analysis. The observations that fall into different
frequency classes; one has to calculate the expected values, to find
out the deviation between the observed and expected frequencies. In
all genetical studies it becomes necessary to test the significance of
overall deviation between the observed and the expected
frequencies. The significance test of this deviation is known as Chi-
square test or X² test.
5. • The X² test (pronounced as chi-square test) is particularly useful in
genetical studies as a means of testing whether the recorded data are in
agreement or not with the hypothesis, generally based on the Mendelian
theory.
• The symbol X² is the Greek letter, chi. The x² test was first used by Karl
Pearson in the year 1900. The quantity ‘ x²’ describes the magnitude of
differences between the observed and the expected frequencies. This test is
helpful to find out whether such differences are significant or not. It is
defined as:
x²= Ʃ (fo-fe)²/fe
Where ; fo =observed frequency
fe =expected frequency.
X²
6. If the observed frequencies and the expected frequencies are identical, the
computed value will be zero. To determine the value of x², the steps required
are ;
i. Calculate the expected frequencies (fe).
ii. Find out the difference between, the observed frequencies (fo) and the
expected frequencies (fe). If the deviation (fo-fe) is large, the square
deviation (fo-fe)² is also large.
iii. Square the values of (fo-fe)² and divide each value by respective value of
fe and obtain the total Ʃ (f⁰-fe)²/fe. This will be the value of x² which
ranges from zero to infinity.
7. iv. The calculated value of x² is compared with the table value for the
given degrees of freedom at either 5% or 1% level of significance. If
the calculated value of x² is less than the tabulated value at a
particular level of significance, the difference between the observed
and the expected frequencies is not significant, and could have arisen
due to fluctuations of sampling. On the other hand, when the
calculated value is more than the tabulated value, the difference
between the observed and the expected values is significant.
8. Degrees of freedom (d.f.): In X² analysis, while comparing the calculated
value of x² with the tabulated value, we have to calculate the degrees of
freedom. The degrees of freedom are calculated from the number of classes.
Therefore, the number of degrees of freedom in a x² test is equal to the
number of classes minus one. In most of the Mendelian problems, if there are
two classes, three classes and four classes: the degrees of freedom would be
2-1, 3-1 and 4-1, respectively. In a contingency table, the degree of freedom
are calculated in a different manner: d.f. = (r-1)(c-1)
Where; r= number of rows in a table
c= number of columns in a table.
9. • Thus in a 2 x 2 contingency table, the degrees of freedom are (2-1)(2-1)=1.
Similarly in a 3x3 contingency table, the number of degrees of freedom are
(3-1)(3-1) = 4 and 3 x 4 contingency table, the degrees of freedom are (3-
1)(4-1) = 6. However, in a 6 x 4 contingency table, the degrees of freedom
will be equal to (6-1)(4-1) = 15. The tables given below show such type of
contingency tables for various d.f.
• A 2x2 contingency table illustrating the determination of the number of
degrees of freedom.
d.f. = (r-1)(c-1)
= (2-1)(2-1)
= 1x1
= 1
Column 1 Column 2 Rows total
Row 1 x RT 1
Row 2 x * RT 2
Column total CT 1 CT 2
10. • A 3x4 contingency table illustrating the determination of the number of
degrees of freedom.
Col.
1
Col. 2 Col. 3 Col. 4 Rows total
Row 1 X RT 1
Row 2 X RT 2
Row 3 X X X * RT 3
Colum
n total
CT 1 CT 2 CT 3 CT 4
d.f. = (r-1)(c-1)
= (3-1) (4-1)
= 2x3
= 6
11. CHARACTERISTICS OF CHI-SQUARE TEST
The chi-square distribution has some important characteristics:
i. This test is based on frequencies, whereas, in theoretical distribution the
test is based on mean and standard deviation.
ii. The other distributions can be used for testing the significance of the
difference between a single expected value and observed proportion.
However, this test can be used for testing the difference between the entire
set of the expected and the observed frequencies.
iii. A new chi-square distribution is formed for every increase in the number
of degrees of freedom.
iv. This test is applied for testing the hypothesis but is not useful for
estimation.
12. ASSUMPTIONS FOR VALIDITY OF CHI-SQUARE TEST
There are a few assumptions for the validity of chi-square test:
i. All the observations must be independent. No individual item should be
included twice or a number of times in the sample.
ii. The total number of observations should be large. The chi-square test
should not be used if n> 50.
iii. All the events must be mutually exclusive.
iv. For comparison purposes, the data must be in original units.
v. If the theoretical frequency is less than 5, then we pool it with the
preceding or the succeeding frequency, so that the resulting sum is greater
than 5.
13. APPLICATIONS OF CHI-SQUARE TEST
The chi-square test is applicable to varied problems in agriculture and
biology besides other statistical analyses. They are:
To test the goodness of fit.
To test the independence of attributes.
To test the homogeneity of independent estimates of the population
variance.
14. CHI-SQUARE TEST AS A TEST OF GOODNESS OF FIT
• The chi-square test can be applied in various problems of
biostatistics. Karl Pearson developed a test of significance in the year
1900. This test is known as chi-square test of goodness of fit and is
used to test whether there is a significant difference between an
observed frequency distribution and the theoretical frequency
distribution. In this way one can determine the goodness of fit of a
theoretical distribution.
15. • If the observed frequencies are close to the corresponding expected
frequencies, the chi-square value will be small indicating good fit.
However, if the observed frequencies differ considerably from the expected
frequencies, the chi-square value will be large and the fit is poor. Therefore,
a good fit leads to the acceptance of null hypothesis, whereas a poor fit
leads to its rejection. Under the null hypothesis, it is assumed that there is
no significant difference between the observed and the theoretical values.
In case the computed chi-square value is less than the tabulated value at a
certain degree of level of significance, the discrepancy between the
observed and the expected frequencies may be due to fluctuations in
sampling or may be due to some inadequacy of the theory to fit the
observed data.
16. • It is important to note that no theoretical frequency should be small It should be
larger than 10 but in any case not less than 5. In that case we use a technique of
"pooling" the frequencies which are less than 5 with the preceding or succeeding
frequency, so that the resulting sum is 5 or more. In those cases a correction is to
be applied. This can be adjusted by substracting the correction value of 0.5 from
the deviation values (fo-fe) before squaring. Symbolically;
• X²(corrected) = Ʃ [(f⁰-fe)-0.5]² / fe or Ʃ[|f⁰-fe|-0.5]² / fe
• This correction is applied only when the number of degrees of freedom is small.
One should compare both the corrected and uncorrected values of X². If both the
values lead to the same conclusion regarding the hypothesis, either the
acceptance or the rejection of 5% level of probability; difficulties are rarely
encountered. In the case of the different conclusion, one should increase the
sample.
17. The following examples would illustrate the application of chi-
square test.
Example 1: In F2 generation, Mendel obtained 621 tall plants and 187 dwarf
plants out of the total of 808. Test whether these two types of plants are in
accordance with the Mendelian monohybrid ratio of 3:1 or that they deviate
from this ratio.
Solution: Null hypothesis: Ho: It is presumed that the tall and the dwarf
plants are segregating in a 3:1 ratio.
• The data are set out in the table. The expected number of plants that would
fall in each class are calculated as (3/4)x(808) = 606 for tall plants, and
(1/4)x(808) = 202 for dwarf plants according to the hypothetical ratio of 3
tall: 1 dwarf.
18. • Formula applied:
x²= Ʃ (f⁰-fe)²/fe
= (15) ²/606 + (-15) ²/202
=225/606 + 225/202
=0.3713 + 1.1139
x² =1.4852
The calculated x² value 1.4852 is less than the tabulated value 3.84 at 5% level
of probability for one degree of freedom. Since there are two phenotypes, the
degree of freedom would be 2-1=1. Therefore, the difference between the
observed and the expected frequencies is not significant. The null hypothesis is
true. The results of two groups of plants are in agreement with the theoretical
ratio of 3: 1.
Tall plants Dwarf plants Total
Observed
frequency(fo)
621 187 808
Expected
frequency(fe)
606 202 808
Deviation(fo-fe) 15 -15
19. • Correction: It has been stated that if the degree of freedom is less than 5, one
should use this correction. In the above example, the degree of freedom is one.
Therefore, there is every possibility of underestimating those probabilities that
are given in the tabulated form. A correction is to be applied here. This can be
adjusted by substracting the correction value of 0.5 from the deviation values
before squaring.
Symbolically ;
x²(corrected)= Ʃ [(f⁰-fe)-0.5]² /fe or Ʃ [|f⁰-fe|-0.5]² / fe
Thus in the above example, the X² value should be:
X² = = (15-0.5)² / 606+ (-15-0.5)² /202
=(14.5)² /606 + (-15.5)² /202
=210.25/606 + 2240.25/202
=0.3469 + 1.1894
x² =1.5363
After applying correction, the X² value doesn’t alter the results.
20.
21. Example 2: In a Mendelian experiment on breeding, four types of plants are
expected to occur in the proportion of 9:3:3: 1. The observed frequencies are: 891
round and yellow, 316 wrinkled and yellow, 290 round and green, and 119
wrinkled and green. Find the chi-square and examine the correspondence
between the theory, and the experiment.
Solution: Null hypothesis: Ho: It is assumed that the theoretical values
correspond to the experimental values.
Expected frequencies
(Total number of observed plants: 891+316+ 290 + 119 = 1616)
• Round and yellow = 9/16 x1616 = 909
• Wrinkled and yellow = 3/16 x 1616 = 303
• Round and green = 3/16 x 1616 = 303
• Wrinkled and green = 1/16 x 1616 = 101
22. Formula applied: x²= Ʃ (fo-fe)²/fe
= (-18) ²/909 + [(13)² /303] [(-13)² /303] + (18)²/101
=324/909+ 169/303 + 324/101
=0.3565 + 0.5578 + 0.5578 + 3.2079
x² =4.6799
Conclusion: The calculated x² value is 4.6799. The tabulated x² value at 5% level of
probability is 7.80 for 3 degrees of freedom. This shows that the calculated x² value is less
than the tabulated x² value Therefore, the difference between the observed and the expected
frequencies is not significant. Hence, the null hypothesis stating that the observed and the
expected frequencies have no difference may be true and it can be accepted.
Round
yellow
Wrinkle
d yellow
Round
green
Wrinkled
green
Total
Observed frequency(fo) 891 316 290 119 1616
Expected frequency(fe) 909 303 303 101 1616
Deviation(fo-fe) -18 13 -13 18
23. The observations that fall into different frequency classes; one has to calculate the
expected values, to find out the deviation between the observed and expected frequencies.
In all genetical studies it becomes necessary to test the significance of overall deviation
between the observed and the expected frequencies. The significance test of this deviation
is known as Chi-square test or X² test.
The degrees of freedom are calculated from the number of classes. Therefore, the number
of degrees of freedom in a x² test is equal to the number of classes minus one.
The chi-square test is applicable to varied problems in agriculture and biology besides
other statistical analyses. Such as it is applicable to test the goodness of fit, population
variance, etc.
SUMMARY
24. CONCLUSION
The Chi-square test is a non-parametric (distribution free) tool designed to
analyze group differences. Like all non-parametric statistics, the Chi-
square is robust with respect to the distribution of the data. Specifically, it
does not require equality of variances among the study groups data.
This test is flexible in handling data from both two group and multiple
group studies.
This test have certain limitations too that it requires an appropriate sample
size . It shows difficulty in interpretation when there are large numbers of
categories (20 or more) .
25. References
• IRFAN A. KHAN and ATIYA KHANUM; Fundamentals of biostatistics. Page
315-325.