Stats chapter 14

2,695 views

Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Stats chapter 14

  1. 1. Chapter 14<br />Inference for Distributions of Categorical Variables: <br />Chi-Square Procedures<br />
  2. 2. 14.1 Test for Goodness of Fit<br />
  3. 3. The problem<br />Suppose we open a bag of M&M’s and count the number of M&M’s of each color.<br />How would we know if our color counts are at normal levels?<br />How would we know if our color counts were abnormal?<br />
  4. 4. Chi-Square Distribution<br />When we want to test the proportion of many counts (i.e. a two-way table or an array), we need to use a new distribution-<br />The Chi-Square Distribution (Chi =  = “KAI”)<br />As you might suspect, this is another (the last of the year) PHANTOMS procedure.<br />The 2 distribution is found at table D and the [2nd] -> [Vars] (DIsT) menu on your calculator<br />
  5. 5. Chi-Square Distribution<br />When we want to test the proportion of many counts (i.e. a two-way table or an array), we need to use a new distribution-<br />The Chi-Square Distribution (Chi =  = “KAI”)<br />As you might suspect, this is another (the last of the year) PHANTOMS procedure.<br />The 2 distribution is found at table D and the [2nd] -> [Vars] (DIsT) menu on your calculator<br />
  6. 6. The 2 distribution<br />Like the t-distribution, the 2 distribution is variable. i.e. the distribution also has degrees of freedom.<br />It is single peaked, right skewed.<br />As the df increases, the peak decreases in height, moves to the right and becomes more symmetric/Normal.<br />As df increases, the 2 statistic needed for statistically significant results also increases<br />
  7. 7. The 2 distribution<br />
  8. 8. Chi-Square Goodness of Fit<br />When we want to check whether a distribution fits a hypothesized distribution, we use the “2goodness of fit test”<br />This is procedure is frequently used to see if a distribution is not in equal proportions <br />No, this will not be much different than what we have already been doing for the last 3 chapters.<br />
  9. 9. 2GOF Test<br />Parameter<br />Unlike previous tests, you will not need to state a  or a p.<br />You need to state where the distribution come from.<br />EXWe are investigating the proportions of all 15 oz. bags ofchocolate M&M’s of M&M’s<br />
  10. 10. 2GOF Test<br />HypothesesThere are two styles for stating hypothesis<br />Style 1<br />In this style, you will refer to a written table-or- state that all proportions are “equal”<br />H0: the proportions of M&M’s are the same as the table providedHa: at least one color count is different than the table<br /> H0: the proportions of accidents for each day is equalHa: at least one day has a count that is not equal<br />
  11. 11. 2GOF Test<br />Hypotheses (cont.)<br />Style 2<br /> In this style, you will write out the expected proportions<br /> H0: pred = pblue = pyel = pbrn = pgrn = porg = 1/6Ha: at least one probability is different that stated above.<br />
  12. 12. 2GOF Test<br />Hypotheses (cont)<br /> Notice that the alternative hypothesis in each case is that at least one proportion is different than hypothesized<br />
  13. 13. 2GOF Test<br />Assumptions<br />1. All expected cell counts are greater than 1<br />2. No more than 20% of the cell counts is less than 5<br />(that’s a whole lot easier, yeah?)<br />Name of the Test<br />“2Goodness Of Fit Test”<br />
  14. 14. 2GOF Test<br />Test Statistic<br />Observed Count (O) is the count for each cell that we observed. <br />The sum of each observed count is ‘n’<br /> Expected Count (E) is the expected frequency of each cell times the sample size ‘n’ <br />
  15. 15. 2GOF Test<br />Test Statistic (cont)<br />If we opened up a bag of M&M’s and found the following count:<br />RedBlueBrwnYelGrnOrng<br />O : 5 3 10 6 4 3 n = 31<br />E: 5.17 5.17 5.17 5.17 5.17 5.17<br /> Note: expected counts are all equal to 31/6We are testing to see if M&M’s come in equal proportions<br />
  16. 16. 2GOF Test<br />Test Statistic (cont)<br />The test statistic is 2(“kai squared”):<br />Degrees of freedom (df) = # of classes – 1<br />
  17. 17. 2GOF Test<br />Test Statistic (cont.)<br />
  18. 18. 2GOF Test<br />P Value<br /> p val = P(2(df) > test statistic )<br />on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf<br /> Usage: “2-cdf( lower, upper, df )<br />pval = P(2(5) > 6.739)<br />
  19. 19. 2GOF Test<br />P Value<br /> p val = P(2(df) > test statistic )<br />on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf<br /> Usage: “2-cdf( lower, upper, df )<br />pval = P(2(5) > 6.739)<br />
  20. 20. 2GOF Test<br />P Value<br /> p val = P(2(df) > test statistic )<br />on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf<br /> Usage: “2-cdf( lower, upper, df )<br />pval = P(2(5) > 6.739)<br />
  21. 21. 2GOF Test<br />P Value<br /> p val = P(2(df) > test statistic )<br />on the calculator, [2nd] -> [VARS] (DIST) -> 2-cdf<br /> Usage: “2-cdf( lower, upper, df )<br />pval = P(2(5) > 6.739)<br />pval = 0.2409<br />
  22. 22. 2GOF Test<br />Decision<br /> Similarly to the other tests, reject the null hypothesis when the p-value is below the accepted level<br />Summary<br /> Use the same 3 part summary:<br /> 1) Interpret the p value w.r.t. sampling distribution<br /> 2) Make decision with reference to an alpha level<br /> 3) Summarize the results in context of the problem<br />
  23. 23. 2GOF Test<br />Summary (cont.)<br /> “The given proportions in a sample of 31 would appear in approximately 24% of all random samples.”<br /> “Because this p value is greater than any acceptable alpha levels, we fail to reject the null hypothesis.”<br /> “We do not have sufficient evidence to conclude that the color distribution in M&M’s is not equally distributed”<br />
  24. 24. Calculator methods <br />TI83/84<br />
  25. 25. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />
  26. 26. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />
  27. 27. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />
  28. 28. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />
  29. 29. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />
  30. 30. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />This is the value of 2.<br />
  31. 31. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />This is the value of 2.<br />Use the 2-cdf from the “Dist Menu” to find p-value<br />“2-cdf (lower, upper, df)<br />
  32. 32. Calculator methods <br />TI83/84<br />Begin by storing the observed counts in “L1”<br />Store the expected counts in “L2”<br />From the Home Screen evaluate:“sum((L1 – L2)2/L2)”<br />This is the value of 2.<br />Use the 2-cdf from the “Dist Menu” to find p-value<br />“2-cdf (lower, upper, df)<br />
  33. 33. 14.2 Inference For Two-Way Tables<br />
  34. 34. Comparing two-groups<br /><ul><li>The table above compares the background music with the # of bottles of wine purchased.
  35. 35. Not that information is presented in a two-way table with marginal distributions
  36. 36. Is there a relationship between these two categorical variables??</li></li></ul><li>Comparing two-groups<br />The test for relationship presented in the preceding page is a 2 test.<br />In particular, this is a 2 test for homogeneity. It measures whether any one expected cell count is drastically different than the observed cell count.<br />
  37. 37. Expected cell count for 2-way tables<br />
  38. 38. Expected cell count for 2-way tables<br />% of population that are in the column<br />
  39. 39. Expected cell count for 2-way tables<br />Count of cell if the rows “obeyed”the column percentages<br />
  40. 40. Expected cell count for 2-way tables<br />Even for a small table, these calculations get cumbersome<br />
  41. 41. Expected Counts<br />30<br />99<br />243<br />84<br />Row total<br />x<br />Column Total<br />Expected =<br />Total<br />
  42. 42. Expected Counts<br />30<br />99<br />243<br />84<br />99<br />x<br />84<br />Expected =<br />243<br />
  43. 43. Expected Counts<br />30<br />99<br />243<br />84<br />99<br />x<br />84<br />Expected =<br />= 34.22<br />243<br />
  44. 44. Expected Counts<br />34.44<br />99<br />243<br />84<br />99<br />x<br />84<br />Expected =<br />= 34.22<br />243<br />
  45. 45. Expected Counts<br />34.44<br />99<br />243<br />84<br />99<br />x<br />84<br />Expected =<br />= 34.22<br />243<br />
  46. 46. Expected Counts<br />34.44<br />99<br />243<br />84<br />99<br />x<br />84<br />Expected =<br />= 34.22<br />243<br />Let’s start with the PHANTOMS procedure<br />
  47. 47. 2 Test for Homogeneity<br />Parameter<br />State where each proportion comes from and what each count represents<br /> “We are investigating the proportions of customers in the store who purchase French, Italian or other wine while listening to French, Italian or other music.”<br />
  48. 48. 2 Test for Homogeneity<br />Hypotheses<br />The null hypothesis is always “the distributions of (group A) are the same in all population of (group B)”<br />The alternative hypothesis is always “the distribution of (group A) are not all the same<br /> “H0: the distributions of wine types are the same in all populations of music types<br />Ha: the distributions of wine types are not all the same”<br />
  49. 49. 2 Test for Homogeneity<br />Assumptions<br />(1) No more than 20% of the expected cell counts are less than 5<br />(2) All expected cell counts are > 1<br />(3) In a 2 x 2 table, all expected counts are greater than 5<br />
  50. 50. 2 Test for Homogeneity<br />“All expected cell counts are greater than 5”<br />
  51. 51. 2 Test for Homogeneity<br />Test Statistic<br />
  52. 52. 2 Test for Homogeneity<br />P Value<br />Decision<br />
  53. 53. 2 Test for Homogeneity<br />P Value<br />Decision<br />
  54. 54. 2 Test for Homogeneity<br />P Value<br />Decision<br />Reject null hypothesis<br />
  55. 55. 2 Test for Homogeneity<br />Summary<br />Approximately 0.1% of the time, a random sample of 243 will produce the distribution given.<br />Because the p value is less than an  of 0.05, we will reject the null hypothesis.<br />We have sufficient evidence at the 5% significance level to conclude that the distribution of wine types purchased is not the same in all music types.<br />
  56. 56. Calculator Methods<br />Methods on the TI84<br />
  57. 57. Calculator Methods<br />Methods on the TI84<br />Before you begin the test, you must enter the “observed counts” into MATRIX [A]<br />[2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]<br />
  58. 58. Calculator Methods<br />Methods on the TI84<br />Before you begin the test, you must enter the “observed counts” into MATRIX [A]<br />[2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]<br />
  59. 59. Calculator Methods<br />Methods on the TI84<br />Before you begin the test, you must enter the “observed counts” into MATRIX [A]<br />[2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]<br />Input the correct matrix size and cell counts(Use [ENTER] or the Cursor Keys to switch between fields.)<br />
  60. 60. Calculator Methods<br />Methods on the TI84<br />Before you begin the test, you must enter the “observed counts” into MATRIX [A]<br />[2ND] -> [x-1] (MATRIX) -> “EDIT” -> [1]<br />Input the correct matrix size and cell counts(Use [ENTER] or the Cursor Keys to switch between fields.)<br />
  61. 61. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />
  62. 62. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />
  63. 63. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />Ensure that “Observed” is set to [A] and“Expected” is set to [B]<br />“Calculate”<br />
  64. 64. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />Ensure that “Observed” is set to [A] and“Expected” is set to [B]<br />“Calculate”<br />
  65. 65. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />Ensure that “Observed” is set to [A] and“Expected” is set to [B]<br />“Calculate”<br />
  66. 66. Calculator Methods<br />Methods on the TI84 (cont.)<br />IMPORTANT: after inputting the observed matrix, quit and go to the home screen<br />[STAT] -> “TESTS” -> “2 Test”<br />Ensure that “Observed” is set to [A] and“Expected” is set to [B]<br />“Calculate”<br /> The expected cell counts will be calculated and stored in Matrix [B] (go back to the Matrix menu to see the expected Counts)<br />
  67. 67. 2 Tests<br />Occasionally, you will be asked to find the cell that “contributed the most to the 2 statistic.”<br />When this is asked, you must calculate the 2 statistic by hand and find the largest value of(O – E)2 / E.<br />This is usually the cell that differs the most from the expected count<br />Since this is a percent calculation, it is not always predictable.<br />
  68. 68. 2 Test for Independence<br />A similar test for two way tables is the “2 Test for Independence” sometimes called“2 Test for Association”<br />This test is asks the question, “do the two variables influence each other?”<br />When there is no association, the observed two-way table is close to the expected table<br />
  69. 69. 2 Test for Independence<br /> This test really only differs from the test for homogeneity in the hypotheses and the conclusion.<br />Hypotheses<br />The null hypothesis is “there is no association between (group 1) and (group 2)”<br />The alternative hypothesis is “there is an association between (group 1) and (group 2)”<br />
  70. 70. 2 Test for Independence<br />Conclusion<br />Phrase your conclusions similar to the ones we have been constructing.<br />When failing to reject H0:After interpreting the p value and comparing the p value to alpha, state that there is “no evidence to conclude that an association exists between (group 1) and (group 2)”<br /> Likewise, when rejecting H0, state that “there is sufficient evidence to conclude that an association exists between (group 1) and (group 2)”<br />
  71. 71. Assignment 14.2<br />Page 877 #29, 31, 32, 33<br />

×