1. 28-08-2012
A newspaper turned to a survey firm in order to investigate
whether there is any relationship between the level of education
and the frequency of its readership.
Level of Education
Chi-square Test for Frequency of Did not pass Passes only Xth graduate Post-graduate
readership Xth grade grade or professional
homogeneity/independence Never 22 12 8
degree holder
3
Goodness of fit Tests: Chi-square & K-S Sometimes
Almost always
25
18
10
15
12
9
6
5
Nonparametric methods
Is there any significant evidence for dependence between
Session XVIII the level of education and readership?
Chi-square test of Independence Degrees of freedom
• Independence or homogeneity
• Useful to gauge dependence between qualitative • Joint distribution has (rc-1) parameters (joint
or categorical variables
• Find the expected frequency of each cell under probabilities) in general.
independence (null hypothesis). • Under independence between categorical variables,
• Exp freq in each cell should be at least 5 ---
Merge groups o.w. f e in (1,1)cell = n P[X = 1, Y = 1]
the Joint distribution is completely specified by (r-1)
row total × column total (if indep.) = nP[X = 1]P[Y = 1] + (c-1) parameters (marginal probabilities) .
fe = first row total first column total • So “degrees of freedom” eaten by “independence”
grand total = n×
n
×
n
is given by (rc-1)-(r-1)-(c-1)=(r-1)*(c-1)
Test statistic is χ 2
= ∑
( fo − fe )2
fe
= ( ∑
fo
2
fe
)−n
• This is indeed the degrees of freedom associated
which has a chi - square distribution with (r - 1) × (c - 1) d.f under H 0 , with the Test (Chi-square distribution of the Test
where r and c are the no. of groups in row and column respectively.
Statistics when H0 is true)
Observed Frequency Table
Level of Education The following win-loss table shows India’s performance against
Freq. of readership
<X X graduate PG row-total four top teams in one-day games played in India. Does it show any
Never 22 12 8 3 45 significant evidence that India does not play equally well against strong
Sometimes 25 10 12 6 53 teams at home?
Almost always 18 15 9 5 47
column total 65 37 29 14 145
win Loss
Expected Frequency Table Australia 9 8
Level of Education
Freq. of readership
<X X graduate PG row-total Pakistan 4 10
Never 20.17 11.48 9.00 4.34 45
Sometimes 23.76 13.52 10.60 5.12 53 S.Africa 9 6
H0: π1= π2= π3= π4
Almost always 21.07 11.99 9.40 4.54 47
column total 65 37 29 14 145 W.Indies 9 17
test statistic 3.3016 Total 31 41
Level of Education
Freq. of readership
<X X graduate PG
d.f. 6
p-value 0.770151 Pooled estimate of proportion is 31/72.
Never 0.165576 0.023299 0.111111 0.416256
Expected no. of wins at home against Aus (under H0) is 17*31/72.
Sometimes 0.064862 0.918325 0.184906 0.152282
Same argument in new cap!
Almost always 0.447034 0.753886 0.017021 0.04705
1
2. 28-08-2012
Breakdown in vacation: re-visited
Validating
Distributional assumptions Q. Is Poisson distribution justified?
#breakdown #of months prob f-expected
0 3 0.082 4.9251
1 14 0.205 12.31275
2 16 0.257 15.39094
Tests for Goodness of Fit: 3 13 0.214 12.82578
4 9 0.134 8.016113
•Chi-square Test 5 2 0.067 4.008057
•Kolmogorov-Smirnov Test 6
7
2
1
0.028 1.670024
0.851239
0.014
Total 60 1.000 60
x f-observed prob f-expected
0
f0
3 0.082
fe
4.9251
(f0-fe)^2/fe
0.752474
Chi-square goodness of fit test
( f − f )2 f 2
1
2
14
16
0.205
0.257
12.31275
15.39094
0.231209
0.024102 Test statistic is χ 2
= ∑
o e = (∑ o ) − n
f f
3 13 0.214 12.82578 0.002367 e e
4 9 0.134 8.016113 0.120761
which has a chi - square distribution with (k - 1) d.f under H 0 , where k is the no. of cells.
>= 5 5 0.109 6.529319 0.358202
• The data need to be discrete or grouped
Total 60 1.000 60 1.489115 • expected frequency in each cell must be at
least 5
p-value 0.828568
• If some of the parameters of the null
Degrees of freedom = 6-1-1=4 hypothesized distribution are not specified,
they may be estimated from the data.
However the d.f. need to be further
reduced by the no. of parameters being
Conclusion: yes, Poisson distribution seems very reasonable
estimated
Goodness of Fit Tests Kolmogorov-Smirnov Test
expected observed
• To check for the validity of distributional
assumption (discrete or continuous) Test statistics is max | Fe − Fo |
F(x)
– EXAMPLES
• Population normal (to use t-tests)
x
• breakdown in vacation (Poisson distribution) • Based on comparing (expected) cumulative
• Based on the difference between expected and probability with (observed) cumulative rel. freq.
observed (relative) frequencies • Cut-off values for the T.S. can be looked up from
• Always 1-tailed test (reject for large value of the Appendix Table 8
TS) • The data need not be grouped; for ungrouped
data need to consider the cumulative relative
frequency only at the data points
2
3. 28-08-2012
Comparison between
Re-do checking for Poisson Distribution
Chi-square and K-S as G.O.F. Tests
x f-observed F-observedF-expecteddifference
• Both can be applied for grouped or ungrouped data;
0 3 0.05 0.082 0.032085 however, natural choice
1 14 0.283333 0.287 0.003964 – Chi-square test for Grouped data
2 16 0.55 0.544 0.006187
3 13 0.766667 0.758 0.009091 – K-S test for ungrouped data
4 9 0.916667 0.891 0.025489 • K-S is applicable for small sample size also
5 2 0.95 0.958 0.007979
6 2 0.983333 0.986 0.002479 • Chi-square test can be also applied for
7 1 1 0.996 0.004247 – qualitative random variables
Total 60 TS 0.032085
– when parameters are not specified
1.07 • K-S test is a nonparametric procedure
At 20% level of significance, the C.R. is D n > = 0.1381
60 • Chi-square test is more powerful
So fail to reject H 0 at 20% level of significance
3