Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Session 18

205 views

Published on

  • Be the first to comment

  • Be the first to like this

Session 18

  1. 1. 28-08-2012 A newspaper turned to a survey firm in order to investigate whether there is any relationship between the level of education and the frequency of its readership. Level of Education Chi-square Test for Frequency of Did not pass Passes only Xth graduate Post-graduate readership Xth grade grade or professional homogeneity/independence Never 22 12 8 degree holder 3 Goodness of fit Tests: Chi-square & K-S Sometimes Almost always 25 18 10 15 12 9 6 5 Nonparametric methods Is there any significant evidence for dependence between Session XVIII the level of education and readership? Chi-square test of Independence Degrees of freedom • Independence or homogeneity • Useful to gauge dependence between qualitative • Joint distribution has (rc-1) parameters (joint or categorical variables • Find the expected frequency of each cell under probabilities) in general. independence (null hypothesis). • Under independence between categorical variables, • Exp freq in each cell should be at least 5 --- Merge groups o.w. f e in (1,1)cell = n P[X = 1, Y = 1] the Joint distribution is completely specified by (r-1) row total × column total (if indep.) = nP[X = 1]P[Y = 1] + (c-1) parameters (marginal probabilities) .fe = first row total first column total • So “degrees of freedom” eaten by “independence” grand total = n× n × n is given by (rc-1)-(r-1)-(c-1)=(r-1)*(c-1)Test statistic is χ 2 = ∑ ( fo − fe )2 fe = ( ∑ fo 2 fe )−n • This is indeed the degrees of freedom associatedwhich has a chi - square distribution with (r - 1) × (c - 1) d.f under H 0 , with the Test (Chi-square distribution of the Testwhere r and c are the no. of groups in row and column respectively. Statistics when H0 is true) Observed Frequency Table Level of Education The following win-loss table shows India’s performance againstFreq. of readership <X X graduate PG row-total four top teams in one-day games played in India. Does it show anyNever 22 12 8 3 45 significant evidence that India does not play equally well against strongSometimes 25 10 12 6 53 teams at home?Almost always 18 15 9 5 47column total 65 37 29 14 145 win Loss Expected Frequency Table Australia 9 8 Level of EducationFreq. of readership <X X graduate PG row-total Pakistan 4 10Never 20.17 11.48 9.00 4.34 45Sometimes 23.76 13.52 10.60 5.12 53 S.Africa 9 6 H0: π1= π2= π3= π4Almost always 21.07 11.99 9.40 4.54 47column total 65 37 29 14 145 W.Indies 9 17 test statistic 3.3016 Total 31 41 Level of EducationFreq. of readership <X X graduate PG d.f. 6 p-value 0.770151 Pooled estimate of proportion is 31/72.Never 0.165576 0.023299 0.111111 0.416256 Expected no. of wins at home against Aus (under H0) is 17*31/72.Sometimes 0.064862 0.918325 0.184906 0.152282 Same argument in new cap!Almost always 0.447034 0.753886 0.017021 0.04705 1
  2. 2. 28-08-2012 Breakdown in vacation: re-visited Validating Distributional assumptions Q. Is Poisson distribution justified? #breakdown #of months prob f-expected 0 3 0.082 4.9251 1 14 0.205 12.31275 2 16 0.257 15.39094 Tests for Goodness of Fit: 3 13 0.214 12.82578 4 9 0.134 8.016113 •Chi-square Test 5 2 0.067 4.008057 •Kolmogorov-Smirnov Test 6 7 2 1 0.028 1.670024 0.851239 0.014 Total 60 1.000 60 x f-observed prob f-expected 0 f0 3 0.082 fe 4.9251 (f0-fe)^2/fe 0.752474 Chi-square goodness of fit test ( f − f )2 f 2 1 2 14 16 0.205 0.257 12.31275 15.39094 0.231209 0.024102 Test statistic is χ 2 = ∑ o e = (∑ o ) − n f f 3 13 0.214 12.82578 0.002367 e e 4 9 0.134 8.016113 0.120761 which has a chi - square distribution with (k - 1) d.f under H 0 , where k is the no. of cells. >= 5 5 0.109 6.529319 0.358202 • The data need to be discrete or grouped Total 60 1.000 60 1.489115 • expected frequency in each cell must be at least 5 p-value 0.828568 • If some of the parameters of the null Degrees of freedom = 6-1-1=4 hypothesized distribution are not specified, they may be estimated from the data. However the d.f. need to be further reduced by the no. of parameters being Conclusion: yes, Poisson distribution seems very reasonable estimated Goodness of Fit Tests Kolmogorov-Smirnov Test expected observed• To check for the validity of distributional assumption (discrete or continuous) Test statistics is max | Fe − Fo | F(x) – EXAMPLES • Population normal (to use t-tests) x • breakdown in vacation (Poisson distribution) • Based on comparing (expected) cumulative• Based on the difference between expected and probability with (observed) cumulative rel. freq. observed (relative) frequencies • Cut-off values for the T.S. can be looked up from• Always 1-tailed test (reject for large value of the Appendix Table 8 TS) • The data need not be grouped; for ungrouped data need to consider the cumulative relative frequency only at the data points 2
  3. 3. 28-08-2012 Comparison between Re-do checking for Poisson Distribution Chi-square and K-S as G.O.F. Tests x f-observed F-observedF-expecteddifference • Both can be applied for grouped or ungrouped data; 0 3 0.05 0.082 0.032085 however, natural choice 1 14 0.283333 0.287 0.003964 – Chi-square test for Grouped data 2 16 0.55 0.544 0.006187 3 13 0.766667 0.758 0.009091 – K-S test for ungrouped data 4 9 0.916667 0.891 0.025489 • K-S is applicable for small sample size also 5 2 0.95 0.958 0.007979 6 2 0.983333 0.986 0.002479 • Chi-square test can be also applied for 7 1 1 0.996 0.004247 – qualitative random variables Total 60 TS 0.032085 – when parameters are not specified 1.07 • K-S test is a nonparametric procedureAt 20% level of significance, the C.R. is D n > = 0.1381 60 • Chi-square test is more powerfulSo fail to reject H 0 at 20% level of significance 3

×