Lecture 07 Category Shaoqi Rao Rev

492 views

Published on

Published in: Business, Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
492
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Lecture 07 Category Shaoqi Rao Rev

  1. 1. <ul><li>Chapter 6 </li></ul><ul><li>Chi-Square Test for Categorical Variable </li></ul><ul><li>Shaoqi Rao, PhD </li></ul><ul><li>2009.11.9 </li></ul>Slides adapted from Dr. Zhang Jinxin’s
  2. 2. 6.1 Basic logic of  2 test <ul><li>Given a set of observed frequency distribution </li></ul><ul><li>A 1 , A 2 , A 3 … </li></ul><ul><li>to test whether the data follow certain theory. </li></ul><ul><li>If the theory is true, then we will have a set </li></ul><ul><li>of theoretical frequency distribution: </li></ul><ul><li>T 1 , T 2 , T 3 … </li></ul><ul><li>Comparing A 1 , A 2 , A 3 … and T 1 , T 2 , T 3 … </li></ul><ul><li>If they are quite different, then the theory might not be true; </li></ul><ul><li>Otherwise, the theory is acceptable. </li></ul>
  3. 3. 6.1.1 Chi-square distribution <ul><li>~  2 distribution </li></ul><ul><li>—— Agreement between observed and expected frequencies </li></ul>DF=k-1-# parameters estimating f i For a contingency table, DF=(# rows-1)(# columns-1 )
  4. 4.  2 distribution
  5. 5. 6.1.2 χ 2 Test for Goodness of Fit (Large Sample) Table1 Frequency distribution and goodness of fit based on 136 measurements to the phantom( 体模 ) 6.26692 - - - - - 合 计 0.19130 1.5434 0.01135 0.99744 0.98610 1 1.282- 0.43858 5.5618 0.04090 0.98610 0.94520 4 1.276- 0.24906 14.1244 0.10386 0.94520 0.84134 16 1.270- 0.00322 25.2855 0.18592 0.84134 0.65542 25 1.264- 0.80961 31.9167 0.23468 0.65542 0.42074 37 1.258- 0.40892 28.4083 0.20888 0.42074 0.21186 25 1.252- 0.03859 17.8294 0.13110 0.21186 0.08076 17 1.246- 0.10016 7.8889 0.05801 0.08076 0.02275 7 1.240- 0.08605 2.4601 0.01809 0.02275 0.00466 2 1.234- 3.94143 0.5405 0.00397 0.00466 0.00069 2 1.228- (A-T) 2 /T T=n* P (X) P (X) Φ (X 2 ) Φ (X 1 ) A intervals
  6. 6. <ul><li>1. Setting up hypotheses </li></ul><ul><li>H 0 : the population follows N (1.26,0.01 2 ) </li></ul><ul><li>H 1 : the population doesn’t follow N (1.26,0.01 2 ) α =0.05 </li></ul><ul><li>Calculation of the statistic : </li></ul><ul><li>3. P -value : ν = k -1-2=10-1-2=7 </li></ul><ul><li>4. Conclusion : With significance level α =0.05, H 0 is not rejected. The measurement follows the normal distribution. </li></ul>
  7. 7. <ul><li>6.2 Comparison between Two Independent </li></ul><ul><li>Sample Proportions </li></ul><ul><li>In chapter 4 the Z test can only be used </li></ul><ul><li>for comparing  with a given  0 (one sample) </li></ul><ul><li>or comparing  1 with  2 (two samples). </li></ul><ul><li>If we need to compare more than two </li></ul><ul><li>samples, Chi-square test is widely used. </li></ul>
  8. 8. Example 6.1 <ul><li>In a clinical survey, 215 patients with pulmonary heart disease ( 肺心病 ) in a hospital were collected , of which 164 patients have taken digitalis ( 洋地黄 ) and 51 patients haven’t taken it. Each of them received an ECG examination. The results are listed in Table 6.2. </li></ul>
  9. 11. ν = 1
  10. 12.  2 test and Z test <ul><li>According to (4.25) </li></ul>
  11. 13. Correction for continuity <ul><li>When n ≥40, if there happens 1≤ e ij <5, </li></ul>
  12. 14. Fisher’s exact test <ul><li>When n <40, or e ij <1, with SPSS,  2 test is not proper then. An exact P value will be obtained for us to give conclusion. </li></ul><ul><li>This can be easily fulfilled in SPSS. </li></ul>
  13. 15. Example 6.9
  14. 16. Statistical description
  15. 17. Statistical inference
  16. 18. 6.3 The  2 Tests for Binary Variable under a Paired Design <ul><li>Example 6.2 There are 260 serum ( 血清 ) samples. Each sample is divided into two and tested by two different methods of immunological test of rheumatoid factor( 类风湿因子 ) respectively. The results are listed in Table 6.4. Now the question is that results of two methods are independent or not. </li></ul>
  17. 19. test for independence between two binary variables  2 =173.74 Example 6.2 12/80=15% 172/180=95%
  18. 20. 6.3.2 Comparison between two sample proportions <ul><li>McNemar test </li></ul> 2 =
  19. 21. <ul><li>H 0 :  1 =  2 , H 1 :  1 ≠  2 , α =0.05 </li></ul><ul><li>When H 0 is true, </li></ul><ul><li>For large sample (b+c>40) </li></ul><ul><li>If the  2 >  2 , then reject H 0 </li></ul>0.05
  20. 22. The Probability Expressions H 0 :  c1 =  r1 H 1 :  c1   r1 Since  c1 =  11 +  21,  r1 =  11 +  12 , This test becomes: H 0 :  12 =  21 , H 1 :  12   21 1.0  c2  c1 Total  r2  22 (d)  21 (c) -  r1  12 (b)  11 (a) + - + Total Trt B Trt A
  21. 23. Correction to McNemar test ( f 12 + f 21 <40)  2 =  2 = =0.45
  22. 24. 6.4 The  2 Test for R×C Contingency Table
  23. 25. The statistic for hypothesis test  2 = =9.488
  24. 26. 6.4.2 Multiple comparison for R×C Table control … … VI … … … … … … … … … … I II III IV V - + group
  25. 27. 6.4.3 Measurement of association for R×C table
  26. 28. Pearson contingency coefficient
  27. 29. <ul><li>Pre-requisite for  2 test </li></ul><ul><li>By experience, </li></ul><ul><li>The theoretical frequencies should be greater than 5 in more than 4/5 cells; </li></ul><ul><li>The theoretical frequency in any cell should be greater than 1. </li></ul><ul><li>Otherwise, we need to use Fisher exact test. </li></ul>

×