Mann Whitney U Test And Chi Squared


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Mann Whitney U Test And Chi Squared

  1. 1. Mann-Whitney U Test and Chi Squared.. More tests of significance- Yipee.
  2. 2. The Mann Whitney U Test. <ul><li>This test should be used when you are comparing two places. </li></ul><ul><li>For example- The micro climate of a wood compared with a field. </li></ul><ul><li>There are 3 versions, their use of which depends on the number of observations you have carried out. </li></ul>
  3. 3. Fewer than 9 readings. <ul><li>Stage1- Call one sample A and the other B </li></ul><ul><li>Stage 2 Call the number of values in the smaller sample n1 and the number of samples in the larger sample n2. </li></ul><ul><li>Stage 3 Place all the values together in rank order. (from smallest to largest) </li></ul><ul><li>Stage 4 Inspect each B in turn and count the number of A’s which precede it. Add up the total to get a U value. </li></ul><ul><li>Stage 4 repeat stage 4, but this time inspect each A in turn and count the number of B’s which precede it. Add up the total to get a second U value. </li></ul><ul><li>Taking the smaller of the two U values look up the probability value associated with it in the appropriate table. Multiply this by 100 to get the percentage probability figure that the difference between your two sample sets could have occurred by chance. </li></ul>
  4. 4. Example. <ul><li>A Temps out a wood 3.5, 3.7, 4.4, 4.0, 4.6, 4.5 </li></ul><ul><li>B Temps in a wood 3.2, 3.3, 4.2, 3.4, 3.6, 4.3 </li></ul><ul><li>N1 =6 </li></ul><ul><li>N2 = 6 </li></ul><ul><li>B B B A B A A B B A A A </li></ul><ul><li>3.2 3.3 3.4 3.5 3.6 3.7 4.0 4.2 4.3 4.4 4.5 4.6 </li></ul><ul><li>STAGE 4 U= 0+0+0+1+3+3=7 </li></ul><ul><li>STAGE 5 U= 3+4+4+6+6+6= 29. </li></ul><ul><li>STAGE 6 U= 7. The correct table is D. The critical value = 0.047X100 = 4.7 </li></ul><ul><li>The chance that the temperatures measured out of the wood were warmer just by chance is only 4.7% </li></ul>
  5. 5. 2 <ul><li>Fewer than 20 readings (n1) in one sample and 9-20 readings in the other. </li></ul><ul><li>Stage 1-Tabulate the readings in two separate columns. Rank the readings, treating them as one group. The smallest value is rank 1. </li></ul><ul><li>Stage 2 – Add the ranks for n1 (to get ∑ R1). Add the ranks for n2 (∑ R2). </li></ul><ul><li>Stage 3 Calculate the formula </li></ul><ul><li>U1= n1n2+ 1 n1 (n1+1) - ∑r1. </li></ul><ul><li> 2 </li></ul>
  6. 6. <ul><li>N1= number of sample readings in one area. </li></ul><ul><li>N2= number of sample readings in the other area, with n1 representing the smaller of the two numbers, if they are different. </li></ul><ul><li>∑R1 = sum of the ranks of readings. </li></ul>
  7. 7. <ul><li>Stage 4 Calculate the similar formula </li></ul><ul><li>U2= n1n2+ 1 n2 (n2 + 1) - ∑R2. </li></ul><ul><li> 2 </li></ul><ul><li>∑ R2 =sum of the ranks of readings of n2. </li></ul><ul><li>Stage 5 You now have two U values. Select the smallest of the two. </li></ul><ul><li>Using your values of n1 and n2, look up the critical value of U in table J. </li></ul><ul><li>If the critical value U is more than your calculated value of U then there is a 5% or less probability that the difference between the two samples sets could have occurred by chance. </li></ul><ul><li>If the critical value of U is less than your calculated value of U, then there is a more than 5% probability that the difference between the two sample sets could have occurred by chance alone. </li></ul>
  8. 8. Example. <ul><li>Aim to see if office rents are higher in west London than London Docklands. </li></ul>26.2 30.6 36.4 30.0 33.9 30.9 34.9 32.9 34.2 28.3 34.5 33.2 28.4 32.0 36.1 31.1 35.6 31.0 36.6 26.6 31.2 30.4 32.6 Docklands West London
  9. 9. Stage 1 and 2. 1 26.2 30.6 36.4 30.0 30.9 32.9 28.3 33.2 32.0 31.1 31.0 26.6 30.4 Docklands ∑ R2= 114 ∑ R1 =162 7 22 5 16 33.9 8 19 34.9 14 17 34.2 3 18 34.5 15 4 28.4 12 21 36.1 10 20 35.6 9 23 36.6 2 11 31.2 6 13 32.6 Rank (R2) Rank (R1) West London
  10. 10. Stage 3 <ul><li>U1= n1n2+ 1 n1 (n1+1) - ∑r1. </li></ul><ul><li> 2 </li></ul><ul><li>10X 13 + 1 10(10+1) -162= 23 </li></ul><ul><li>2 </li></ul><ul><li>=23 </li></ul>
  11. 11. Stage 4 <ul><li>U2= n1n2+ 1 n2 (n2 + 1) - ∑R2. </li></ul><ul><li> 2 </li></ul><ul><li>10X 13 + 1 X 13(13+1) – 114= 107 </li></ul><ul><li>2 </li></ul><ul><li>=107 </li></ul>
  12. 12. Stage 5 <ul><li>N1 =10 </li></ul><ul><li>N2 = 13 </li></ul><ul><li>Critical value = 37. </li></ul><ul><li>37 is more than 23, therefore there is a 5% or smaller probability that property values are higher in west London than in the Docklands. </li></ul>
  13. 13. Samples of more than 20 <ul><li>Stages 1-3 as we already discussed. This will give you the value of U, n1 and n2. </li></ul><ul><li>Stage 4 Calculate Z </li></ul><ul><li>Z= U – n1+n2 </li></ul><ul><li>(n1)(n2)(n1+n2+1) </li></ul><ul><li>12 </li></ul>
  14. 14. Stage 5 <ul><li>Look up your critical value using your Z value. Multiply this by 100 to find the percentage probability that the difference between the two sample sets could be due to chance. </li></ul>
  15. 15. Chi Squared. <ul><li>The Chi Squared test can only be used on data which has the following characteristics. </li></ul><ul><li>The data must be in the form of frequencies counted in each of a number of categories. Data on the interval scale (Data which have a precise numerical meaning - height above sea level, population of a town etc) can be grouped into categories to enable this test. </li></ul><ul><li>Total observations need to be more than 20. </li></ul><ul><li>The expected frequency should not be less than 5. </li></ul><ul><li>The observations should not be such that one influences the other. </li></ul>
  16. 16. Chi- Squared for 2 variables. <ul><li>Stage 1- Tabulate the data- I will show you how to do this in a few slides- labelled 0 </li></ul><ul><li>Stage 2- Calculate the number of counts you would expect to find in each category if the categories had no impact on these. </li></ul><ul><li>Stage 3 calculate the formula… </li></ul>
  17. 17. <ul><li>Xsquared= ∑ (O-E)squared </li></ul><ul><ul><li>E. </li></ul></ul><ul><ul><li>X squared= Chi Squared Figure. </li></ul></ul><ul><ul><li>O=Observed Frequency. </li></ul></ul><ul><ul><li>E= Expected frequency. </li></ul></ul>
  18. 18. Stage 4 and 5 <ul><li>Stage 4- Calculate the degrees of freedom. This is simply one less than the total number of categories. </li></ul><ul><li>Df= n – 1. </li></ul><ul><li>Where Df = degrees of freedom. </li></ul><ul><li>N= Number of categories in the test. </li></ul><ul><li>Stage 5- Using the worksheet, the calculated value of Xsquared and the degrees of freedom, read off the probability that the data frequencies you are testing could have occurred by chance. </li></ul>
  19. 19. Example. <ul><li>You have visited four equal size areas, each on a different rock type. You counted the number of streams in each area. </li></ul><ul><li>The results were as follows. </li></ul><ul><li>Rock Type. No. of streams. </li></ul><ul><li>Chalk 7 </li></ul><ul><li>Granite 58 </li></ul><ul><li>Limestone 15 </li></ul><ul><li>Sandstone 20 </li></ul>
  20. 20. What do we want to know <ul><li>You now wish to know if these results are a true reflection of the nature of each rock type or whether they could be simply the result of chance. </li></ul><ul><li>The Chi Squared test can be used because… </li></ul><ul><li>The data is in the form of counts. </li></ul><ul><li>The total number of streams observed exceeds 20. </li></ul><ul><li>The expected frequency in any one fraction exceeds 4. This is the number of streams you would expect if rock type had no influence on stream densities. In this case it is the total number of streams (100) divided by the number of rock types (4). </li></ul><ul><li>The observations are independent (the number of streams on one rock type does not influence the number of streams on another). </li></ul><ul><li>The hypothesis we are trying to prove is that there is a significant difference between the sample data sets i.e rock type does influence stream density. </li></ul>
  21. 21. The Data. 13 4.0 43.6 1.0 25 20 Sandstone 25 15 Limestone 25 58 Granite 25 7 Chalk (O-E)squared E Expected frequency (e) Observed Frequency O Rock type
  22. 22. Formula <ul><li>Xsquared= ∑ (O-E)squared </li></ul><ul><ul><li>E. </li></ul></ul><ul><li>=61.6 </li></ul><ul><li>Df=n-1 </li></ul><ul><li>Df=4-1 </li></ul><ul><li>=3. </li></ul>
  23. 23. Stage 5 <ul><li>From the graph we read off the degrees of freedom (3) on the horizontal axis against the xsqaured value (61.6) on the vertical. </li></ul><ul><li>The resulting point is above the line marked 0.1 chance in 100. This means that the probability that the data given above could be due to chance alone is less than 1 in 100. </li></ul>
  24. 24. <ul><li>That is the end of the Mann Whitney U test and Chi Squared- now go and lie down. </li></ul><ul><li>Next week Correlation and Spearman’s Rank Correlation Coefficient. </li></ul>