Chi-square distribution

CHI-SQUARE DISTRIBUTION
Bipul Kumar Sarker
Lecturer
BBA Professional
Habibullah Bahar University College
Chapter-07, Part-02

Introduction:
• The Chi-square test is one of the most commonly used non-parametric
test, in which the sampling distribution of the test statistic is a chi-square
distribution, when the null hypothesis is true.
• It was introduced by Karl Pearson as a test of association. The Greek
Letter χ2 is used to denote this test.
• It can be applied when there are few or no assumptions about the
population parameter.
• It can be applied on categorical data or qualitative data using a
contingency table.
• Used to evaluate unpaired/unrelated samples and proportions.
Bipul Kumar Sarker, Lecturer (BBA Professional), HBUC

Definition:
The Chi- square (𝜒2
) test is one of the simplest and most widely used non
parametric tests in statistical work.
The 𝜒2 test was first used by Karl Pearson in the year 1900. The quantity 𝜒2
describes the magnitude of the discrepancy between theory and observation.
It is defined as:
𝝌 𝟐
=
𝑂𝑖 − 𝐸𝑖
2
𝐸𝑖
Where,
O refers to the observed frequencies
And E refers to the expected frequencies.

Note:
 If 𝜒2 is zero, it means that the observed and expected frequencies coincide with
each other.
 The greater the discrepancy between the observed and expected frequencies the
greater is the value of 𝜒2.

Chi-square Distribution:
The square of a standard normal variate is a Chi-square variate with 1 degree of
freedom i.e.
If X is normally distributed with mean 𝜇 and standard deviation 𝜎, then (
𝑋−𝜇
𝜎
)2
is a
Chi-square variate (𝜒2) with 1 d.f. The distribution of Chi-square depends on the
degrees of freedom. There is a different distribution for each number of degrees of
freedom.

Properties of Chi-square distribution:
1. The Mean of 𝜒2 distribution is equal to the number of degrees of freedom (n)
2. The variance is equal to two times the number of degrees of freedom. i.e The variance of
𝜒2 distribution is equal to 2n
3. The median of 𝜒2 distribution divides, the area of the curve into two equal parts, each part
being 0.5
4. The mode of 𝜒2 distribution is equal to (n-2)

5. Since Chi-square values always positive, the Chi-square curve is always positively
skewed.
6. Since Chi-square values increase with the increase in the degrees of freedom, there
is a new Chi-square distribution with every increase in the number of degrees of
freedom.
7. The lowest value of Chi-square is zero and the highest value is infinity ie 𝜒2 0

#5. The 2 distribution is not symmetrical and all the values are positive. The distribution
is described by degrees of freedom. For each degrees of freedom we have asymmetric
curves.

8. As the degrees of freedom increase, the chi-square curve approaches a normal
distribution.

8. When Two Chi- squares 𝜒2
1and 𝜒2
2 are independent 𝜒2
distribution with 𝑛1and
𝑛2 degrees of freedom and their sum 𝜒2
1 + 𝜒2
2 will follow 𝜒2
distribution with
(𝑛1 + 𝑛2) degrees of freedom.
9. When n (d.f) > 30, the distributionn of 2𝜒2 approximately follows normal
distribution. The mean of the distribution 2𝜒2 is 2n − 1 and the standard
deviation is equal to 1.

Applications of a chi-square test:
i. Goodness of fit of distributions
ii. Test of independence of attributes
iii.Test of homogeneity
This test can be used in,

Conditions for applying 𝜒2
test:
1. The data must be in the form of frequencies
2. All the items in the sample must independent
3. N, the total frequency should be reasonably large, say greater than 50.
4. No theoretical cell-frequency should be less than 5. If it is less than 5, the
frequencies should be pooled together in order to make it 5 or more than 5.
5. 𝜒2 test is wholly dependent on degrees of freedom.

Chi Square formula:
The chi-squared test is used to determine whether there is a significant difference
between the expected frequencies and the observed frequencies in one or more
categories.
The value of 𝜒2 is calculated as:
𝜒2
=
𝑂𝑖 − 𝐸𝑖
2
𝐸𝑖

The observed frequencies are the frequencies obtained from the observation,
which are sample frequencies.
The expected frequencies are the calculated frequencies.
Chi Square formula:

Steps in solving problems related to Chi-Square test
STEP 1
• Calculate the expected frequencies
STEP 2
• Take the difference between the observed and expected
frequencies and obtain the squares of these differences
(O-E)2
STEP 3
• Divide the values obtained in Step 2 by the respective
expected frequency, E and add all the values to get the
value according to the formula given by:
𝝌 𝟐
=
𝑶𝒊 − 𝑬𝒊
𝟐
𝑬𝒊

Degree of Freedom:
It denotes the extent of independence (freedom) enjoyed by a given set of observed
frequencies. Suppose we are given a set of n observed frequencies which are
subjected to k independent constraints (restrictions) then,
)𝒅. 𝒇 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 − (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒊𝒏𝒅𝒆𝒑𝒆𝒏𝒅𝒆𝒏𝒕 𝒄𝒐𝒏𝒔𝒕𝒓𝒂𝒊𝒏𝒕𝒔 𝒐𝒏 𝒕𝒉𝒆𝒎
In other terms,
)𝒅. 𝒇 = 𝒓 − 𝟏 (𝒄 − 𝟏
Where,
𝑟 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠
𝑐 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠

Example:
)𝒅. 𝒇 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒊𝒆𝒔 − (𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝒄𝒐𝒏𝒔𝒕𝒓𝒂𝒊𝒏𝒕𝒔 𝒐𝒏 𝒕𝒉𝒆𝒎
)𝒅. 𝒇 = 𝟔 − (𝟏 = 𝟓

Example:
In other terms,
𝑑. 𝑓 = 𝑟 − 1 𝑐 − 1 = 2 − 1 2 − 1 = 1
Where,
𝑟 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠
𝑐 = 𝑇ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠

Contingency table
When the table is prepared by enumeration of qualitative data by entering the actual
frequencies and if that table represents occurrence of two sets of events, that table is
called the contingency table. It is also called as an association table.

Contingency table
• A contingency table is a type of table in a matrix format that displays
the frequency distribution of the variables.
• They provide a basic picture of the interrelation between two variables
and can help find interactions between them.
• The chi-square statistic compares the observed count in each table cell
to the count which would be expected under the assumption of no
association between the row and column classifications.

Table/Critical values of 2

Exercise-01:
Two random samples drawn from two normal populations are:
Sample -I 20 16 26 27 22 23 18 24 25
Sample -II 27 33 42 35 32 34 38 28 41 43 30 37
Obtain the estimates of the variance of the population and test 5% level of significance
whether the two populations have the same variance.
Testing the ratio of variance

Solution:
We want to test the null hypothesis,
𝐻0∶ The two samples are drawn from two populations having the same variance
i.e 𝜎1
2 = 𝜎2
2
VS 𝐻1 ∶ The two samples are drawn from two populations having the different variance
i.e 𝜎1
2
≠ 𝜎2
2

𝑻𝒂𝒃𝒍𝒆 𝒇𝒐𝒓 𝒏𝒆𝒄𝒆𝒔𝒔𝒂𝒓𝒚 𝒄𝒂𝒍𝒄𝒖𝒍𝒂𝒕𝒊𝒐𝒏
𝑋𝑖 (𝑋𝑖− 𝑋𝑖) (𝑋𝑖− 𝑋𝑖)2
𝑋𝑗 (𝑋𝑗− 𝑋𝑗) (𝑋𝑗− 𝑋𝑗)2
20 -2 4 33 -2 4
16 -6 36 42 7 49
26 4 16 35 0 0
27 5 25 32 -3 9
22 0 0 34 -1 1
23 1 1 38 3 9
18 -4 16 28 -7 49
24 2 4 41 6 36
19 -3 9 43 8 64
25 3 9 30 -5 25
37 2 4
33 -2 4
𝑥𝑖 = 220 (𝑋𝑖− 𝑋𝑖)2 = 120 𝑥𝑗 = 420 (𝑋𝑗− 𝑋𝑗)2 = 314

Now, 𝑥𝑖 =
𝑥 𝑖
𝑛1
=
220
10
= 22
𝑥𝑗 =
𝑥𝑗
𝑛2
=
420
12
= 35

We know, the test statistic,
The statistic F is defined by the ratio
𝐹 =
𝑆2
2
𝑆1
2 ; 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 𝐹 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑑. 𝑓 = (𝑣2 − 1)(𝑣1 − 1)

Level of Significance:
Let us consider the level of significance 𝛼 = 5% = 0.05
Critical Value Or Expected Value:
At 5% level of significance, the critical value of F-distribution is 𝐹 𝑣2−1 , 𝑣1−1 = 𝐹9,11 = 2.896
Comment:
Since 𝐹𝑐𝑎𝑙 < 𝐹𝑡𝑎𝑏 we accept null hypothesis at 5% level of significance and conclude that the
two samples may be regarded as drawn from the populations having same variance.

Solution:Exercise-02:
Two random samples drawn from normal populations are:
1st Sample 22 24 34 36 45 18
2nd Sample 27 28 33 24 47 17 16 20
Test whether the two populations have the same variance.
BBA Professional - 2016

Solution:Exercise-03:

𝐻0∶ There is no difference in the variance of yield of wheat
i.e 𝜎1
2
= 𝜎2
2
VS 𝐻1 ∶ There is difference in the variance of yield of wheat
i.e 𝜎1
2 ≠ 𝜎2
2

We know, the test statistic,
The statistic F is defined by the ratio
𝐹0 =
𝑆1
2
𝑆2
2 ; 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 𝐹 − 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑤𝑖𝑡ℎ 𝑑. 𝑓 = (𝑣1 − 1)(𝑣2 − 1)

Level of Significance:
Let us consider the level of significance 𝛼 = 5% = 0.05
Critical Value Or Expected Value:
At 5% level of significance, the critical value of F-distribution is 𝐹 𝑣1−1 , 𝑣2−1 = 𝐹7,5 = 4.88
Comment:
Since 𝐹𝑐𝑎𝑙 < 𝐹𝑡𝑎𝑏 we accept null hypothesis at 5% level of significance and conclude that
the there is no difference in the variances of yield of wheat.

Example-04: BBA Professional - 2004
By random sampling one thousand families from Dhaka city are selected to test the belief
that high income families usually send their children to private school and the low income
families send their children to government schools. The following results were obtained:
Income School Total
Private Governme
nt
Low 370 430 800
High 130 70 200
Total 500 500 1000
Test whether income and type of school are independent.

Solution:
𝐻0 = 𝐼𝑛𝑐𝑜𝑚𝑒 𝑎𝑛𝑑 𝑡𝑦𝑝𝑒𝑠 𝑜𝑓 𝑠𝑐ℎ𝑜𝑜𝑙 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
𝐻1 = 𝐼𝑛𝑐𝑜𝑚𝑒 𝑎𝑛𝑑 𝑡𝑦𝑝𝑒𝑠 𝑜𝑓 𝑠𝑐ℎ𝑜𝑜𝑙 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
Test statistic:
We define our test statistics define as follow,
𝜒2 =
𝑖=1
𝑛
(𝑂𝑖 − 𝐸𝑖)2
𝐸𝑖
Where,
O refers to the observed frequencies
E refers to the expected frequencies.

The expected frequencies are computed as follows:
𝑬 =
𝑹𝒐𝒘 𝑻𝒐𝒕𝒂𝒍 × 𝑪𝒐𝒍𝒖𝒎𝒏 𝑻𝒐𝒕𝒂𝒍
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑶𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝐸11 =
𝑅1 × 𝐶1
𝑁
=
800 × 500
1000
= 400
𝐸12 =
𝑅1 × 𝐶2
𝑁
=
800 × 500
1000
= 400
𝐸21 =
𝑅2 × 𝐶1
𝑁
=
200 × 500
1000
= 100
𝐸22 =
𝑅2 × 𝐶2
𝑁
=
200 × 500
1000
= 100
Income School Total
Private
(𝑪 𝟏)
Government
(𝑪 𝟐)
Low (𝑹 𝟏) 370 430 800
High (𝑹 𝟐) 130 70 200
Total 500 500 1000

Table for calculation of 𝝌 𝟐
Observed Frequencies
(O)
Expected Frequencies
(E)
(𝑶 − 𝑬) 𝟐
(𝑶 − 𝑬) 𝟐
𝑬
370 400 900 2.25
430 400 900 2.25
130 100 900 9.00
70 100 900 9.00
Total (𝑂 − 𝐸)2
𝐸
= 22.50
Now, 𝝌 𝟐 = 𝒊=𝟏
𝒏 (𝑶 𝒊−𝑬 𝒊) 𝟐
𝑬 𝒊
= 𝟐𝟐. 𝟓𝟎

Significance Level:
Let us consider, significance level, 𝛼 = 5% = 0.05
Critical Value:
At 5% level of significance, the critical value of 𝜒2 = 3.84 , where df = (2-1)(2-1)=1
Comment:
Since the calculated value of 𝜒2 = 22.5 is greater than the tabulated value of 𝜒2 = 3.84 i.e
𝜒2
𝑐𝑎𝑙 > 𝜒2
𝑡𝑎𝑏 and we may reject the null hypothesis at 5% level of significance.
Therefore, we conclude that there is association between income and type of school.

Chi-square distribution

More Related Content

What's hot

Similar to Chi-square distribution

More from Habibullah Bahar University College

Recently uploaded

Chi-square distribution