Successfully reported this slideshow.
Upcoming SlideShare
×

# Session 18

205 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Session 18

1. 1. 28-08-2012 A newspaper turned to a survey firm in order to investigate whether there is any relationship between the level of education and the frequency of its readership. Level of Education Chi-square Test for Frequency of Did not pass Passes only Xth graduate Post-graduate readership Xth grade grade or professional homogeneity/independence Never 22 12 8 degree holder 3 Goodness of fit Tests: Chi-square & K-S Sometimes Almost always 25 18 10 15 12 9 6 5 Nonparametric methods Is there any significant evidence for dependence between Session XVIII the level of education and readership? Chi-square test of Independence Degrees of freedom • Independence or homogeneity • Useful to gauge dependence between qualitative • Joint distribution has (rc-1) parameters (joint or categorical variables • Find the expected frequency of each cell under probabilities) in general. independence (null hypothesis). • Under independence between categorical variables, • Exp freq in each cell should be at least 5 --- Merge groups o.w. f e in (1,1)cell = n P[X = 1, Y = 1] the Joint distribution is completely specified by (r-1) row total × column total (if indep.) = nP[X = 1]P[Y = 1] + (c-1) parameters (marginal probabilities) .fe = first row total first column total • So “degrees of freedom” eaten by “independence” grand total = n× n × n is given by (rc-1)-(r-1)-(c-1)=(r-1)*(c-1)Test statistic is χ 2 = ∑ ( fo − fe )2 fe = ( ∑ fo 2 fe )−n • This is indeed the degrees of freedom associatedwhich has a chi - square distribution with (r - 1) × (c - 1) d.f under H 0 , with the Test (Chi-square distribution of the Testwhere r and c are the no. of groups in row and column respectively. Statistics when H0 is true) Observed Frequency Table Level of Education The following win-loss table shows India’s performance againstFreq. of readership <X X graduate PG row-total four top teams in one-day games played in India. Does it show anyNever 22 12 8 3 45 significant evidence that India does not play equally well against strongSometimes 25 10 12 6 53 teams at home?Almost always 18 15 9 5 47column total 65 37 29 14 145 win Loss Expected Frequency Table Australia 9 8 Level of EducationFreq. of readership <X X graduate PG row-total Pakistan 4 10Never 20.17 11.48 9.00 4.34 45Sometimes 23.76 13.52 10.60 5.12 53 S.Africa 9 6 H0: π1= π2= π3= π4Almost always 21.07 11.99 9.40 4.54 47column total 65 37 29 14 145 W.Indies 9 17 test statistic 3.3016 Total 31 41 Level of EducationFreq. of readership <X X graduate PG d.f. 6 p-value 0.770151 Pooled estimate of proportion is 31/72.Never 0.165576 0.023299 0.111111 0.416256 Expected no. of wins at home against Aus (under H0) is 17*31/72.Sometimes 0.064862 0.918325 0.184906 0.152282 Same argument in new cap!Almost always 0.447034 0.753886 0.017021 0.04705 1
2. 2. 28-08-2012 Breakdown in vacation: re-visited Validating Distributional assumptions Q. Is Poisson distribution justified? #breakdown #of months prob f-expected 0 3 0.082 4.9251 1 14 0.205 12.31275 2 16 0.257 15.39094 Tests for Goodness of Fit: 3 13 0.214 12.82578 4 9 0.134 8.016113 •Chi-square Test 5 2 0.067 4.008057 •Kolmogorov-Smirnov Test 6 7 2 1 0.028 1.670024 0.851239 0.014 Total 60 1.000 60 x f-observed prob f-expected 0 f0 3 0.082 fe 4.9251 (f0-fe)^2/fe 0.752474 Chi-square goodness of fit test ( f − f )2 f 2 1 2 14 16 0.205 0.257 12.31275 15.39094 0.231209 0.024102 Test statistic is χ 2 = ∑ o e = (∑ o ) − n f f 3 13 0.214 12.82578 0.002367 e e 4 9 0.134 8.016113 0.120761 which has a chi - square distribution with (k - 1) d.f under H 0 , where k is the no. of cells. >= 5 5 0.109 6.529319 0.358202 • The data need to be discrete or grouped Total 60 1.000 60 1.489115 • expected frequency in each cell must be at least 5 p-value 0.828568 • If some of the parameters of the null Degrees of freedom = 6-1-1=4 hypothesized distribution are not specified, they may be estimated from the data. However the d.f. need to be further reduced by the no. of parameters being Conclusion: yes, Poisson distribution seems very reasonable estimated Goodness of Fit Tests Kolmogorov-Smirnov Test expected observed• To check for the validity of distributional assumption (discrete or continuous) Test statistics is max | Fe − Fo | F(x) – EXAMPLES • Population normal (to use t-tests) x • breakdown in vacation (Poisson distribution) • Based on comparing (expected) cumulative• Based on the difference between expected and probability with (observed) cumulative rel. freq. observed (relative) frequencies • Cut-off values for the T.S. can be looked up from• Always 1-tailed test (reject for large value of the Appendix Table 8 TS) • The data need not be grouped; for ungrouped data need to consider the cumulative relative frequency only at the data points 2
3. 3. 28-08-2012 Comparison between Re-do checking for Poisson Distribution Chi-square and K-S as G.O.F. Tests x f-observed F-observedF-expecteddifference • Both can be applied for grouped or ungrouped data; 0 3 0.05 0.082 0.032085 however, natural choice 1 14 0.283333 0.287 0.003964 – Chi-square test for Grouped data 2 16 0.55 0.544 0.006187 3 13 0.766667 0.758 0.009091 – K-S test for ungrouped data 4 9 0.916667 0.891 0.025489 • K-S is applicable for small sample size also 5 2 0.95 0.958 0.007979 6 2 0.983333 0.986 0.002479 • Chi-square test can be also applied for 7 1 1 0.996 0.004247 – qualitative random variables Total 60 TS 0.032085 – when parameters are not specified 1.07 • K-S test is a nonparametric procedureAt 20% level of significance, the C.R. is D n > = 0.1381 60 • Chi-square test is more powerfulSo fail to reject H 0 at 20% level of significance 3