Statistical Methods


Published on

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Statistical Methods

  1. 1. Statistical Methods.
  2. 2. Why Statistics. <ul><li>Statistics is used to take the analysis of data one stage beyond what can be achieved with maps and diagrams. </li></ul><ul><li>You can gain a primitive insight into patterns at a glance but mathematical manipulation usually gives greater precision. </li></ul><ul><li>This allows us to discover things which might otherwise go unnoticed. </li></ul>
  3. 3. The need for justification. <ul><li>Justifying mathematical manipulation is vital. </li></ul><ul><li>It is vital to be aware that statistics is an aid to analysis and no more. </li></ul><ul><li>Too often students make statistical calculations in geographical projects without adequate justification. </li></ul><ul><li>Before statistics is used it is essential to ask yourself two questions. </li></ul>
  4. 4. Question 1. <ul><li>Why am I using this technique? </li></ul><ul><li>In the exam be absolutely clear what it is a statistical test can prove and how a statistical test can do this. </li></ul>
  5. 5. Question 2. <ul><li>Is the data appropriate to this particular technique? </li></ul><ul><li>Each technique requires data to be arranged in a particular form. </li></ul><ul><li>If they aren’t the technique cannot be used. </li></ul><ul><li>If your data is not good in the first place the use of a complex statistical technique will not help you </li></ul><ul><li>“ R ubbish in- Rubbish out” </li></ul>
  6. 6. Mean, Mode, Median. <ul><li>To be used when faced with a large amount of data </li></ul><ul><li>For example- average temperature of a place every day for two years. </li></ul><ul><li>It makes things far easier when we can summarise it. </li></ul><ul><li>This is relatively easy to do and there are three common methods to achieve this. </li></ul>
  7. 7. 1- Mean <ul><li>What most people call the average is the mean. </li></ul><ul><li>You find it by adding all the numbers together and then divide by the total number of data values. </li></ul><ul><li>The mean is shown by the symbol- x </li></ul><ul><li>The mean is distorted if you have just one extreme value which can be a problem. </li></ul><ul><li>However, it is the most commonly used as it can be used for further mathematical processing. </li></ul>
  8. 8. Find the mean of these data values- <ul><li>3, 4, 4, 4, 6, 6, 9. </li></ul><ul><li>36 = 5.1 </li></ul><ul><li>7 </li></ul><ul><li>x = 5.1 </li></ul>
  9. 9. 2- The Mode. <ul><li>The mode is simply the most frequently occurring event. </li></ul><ul><li>If we are using simple numbers then the mode is the most frequently occurring number. </li></ul><ul><li>If we are looking at data on the nominal scale (grouped into categories) the mode is the most common category. </li></ul><ul><li>The mode is very quick to calculate, but it cannot be used for further mathematical processing. </li></ul><ul><li>It is not effected by extreme values. </li></ul>
  10. 10. Find the mode of this data set. <ul><li>3, 4, 4, 4, 6, 9. </li></ul><ul><li>Mode (most frequently occurring number)= 4 </li></ul>
  11. 11. Find the mode of this nominal data. Mode (Most frequently occurring category)= wheat. 17 Pasture 18 Barley 29 Wheat 3 Fruit 15 Vegetables 12 Rye 10 Clover Hectares Land Use
  12. 12. 3- The Median. <ul><li>The Median is the central value in a series of ranked values. </li></ul><ul><li>If there is an even number of values, the median is the mid point between the two centrally placed values. </li></ul><ul><li>The median is not effected by extreme values but it cannot be used for further mathematical processing. </li></ul>
  13. 13. Find the median of this data set. <ul><li>3, 4, 4, 4, 6, 9. </li></ul><ul><li>Median (central value)= 4. </li></ul>
  14. 14. Now find the median of this data set. <ul><li>3, 4, 4, 6, 6, 9. </li></ul><ul><li>Median (central value)= 5 </li></ul>
  15. 15. Spread around the median and mean. <ul><li>The median, mean and mode all give us a summary value for a set of data. </li></ul><ul><li>On their own, however, they give us no idea of the spread of data around the summary value, which can be misleading. </li></ul><ul><li>For example… </li></ul>
  16. 16. <ul><li>I collected the following rainfall data. </li></ul><ul><li>The mean for this data is 20mm. </li></ul><ul><li>But that gives an untrue picture of what really happened. </li></ul><ul><li>There is a great “deviation about the mean”. </li></ul><ul><li>Deviation can be measured statistically as follows. </li></ul>0 1993 3 1992 0 1991 0 1990 Rainfall (mm) Year 97 1994
  17. 17. Spread around the median: the interquartile range. <ul><li>The Interquartile range is a measure of the spread of the values around their median. </li></ul><ul><li>The greater the spread the higher the interquartile range. </li></ul>
  18. 18. Method. <ul><li>Stage 1- Place the variables in rank order, smallest to largest. </li></ul><ul><li>Stage 2- Find the upper quartile. This is found by taking the 25% highest values and finding the mid-point between the lowest of these and the next lowest number. </li></ul><ul><li>Stage 3- Find the lower quartile. This is obtained by taking the 25% lowest values and finding the mid-point between the highest of these and the next highest value. </li></ul><ul><li>Stage 4- Find the difference between the upper and lower quartiles. This is the interquartile range, a crude index of the spread of the values around the median. </li></ul><ul><li>The higher the range the greater the spread. </li></ul>
  19. 19. Over to you. <ul><li>Copy out the data on the next slide </li></ul><ul><li>Then find the interquartile range, remembering to follow all the four stages. </li></ul>
  20. 20. 5 December 7 November 11 October 15 September 17 August 17 July 15 June 12 May 9 April 7 March 5 February 4 January Average temperature Month
  21. 21. Answer <ul><li>Ranked the data looks like this. </li></ul><ul><li>5 5 7 7 9 11 12 15 15 17 17 </li></ul><ul><li>Lower Quartile Median Upper Quartile </li></ul><ul><li>6 10 15 </li></ul><ul><li>Interquartile range: (15-6) = 9. </li></ul>
  22. 22. Spread about the mean: Standard deviation. <ul><li>If we want to obtain some measure of the spread of our data about its mean we calculate its standard deviation. </li></ul><ul><li>Two sets of figures can have the same mean but very different standard deviations. </li></ul>
  23. 23. <ul><li>Stage 1- Tabulate the values (x) and their squares (x ² ). Add these values (∑x and ∑x ² ). </li></ul><ul><li>Find the mean of all the values of x (x ) and square it (x ² ). </li></ul><ul><li>Stage 3- Calculate the formula </li></ul><ul><li> = ∑x² - x ² </li></ul><ul><li> n </li></ul>Method.
  24. 24. <ul><li>= standard deviation. </li></ul><ul><li> = the square root of. </li></ul><ul><li>∑ = the sum of. </li></ul><ul><li>n = the number of values. </li></ul><ul><li>x = the mean of the values. </li></ul>
  25. 25. Over to you. <ul><li>Number of vehicles passing a traffic count point. </li></ul><ul><li>Calculate the standard deviation of the following data. </li></ul>
  26. 26. 82 10 75 9 42 8 63 7 70 6 60 5 92 4 80 3 75 2 50 1 Number of vehicles. Day
  27. 27. Answer. 6 724 82 5 625 75 1 764 42 3 969 63 4 900 70 3 600 60 8 464 92 6 400 80 5 625 75 2 500 50 x² x
  28. 28. Answer <ul><li>∑ X = 689 </li></ul><ul><li>∑ x² = 49 571. </li></ul><ul><li>x = 689 divided by 10 = 68.9 </li></ul><ul><li>x ² = (68.9) ² = 4747.2 </li></ul><ul><li>= ∑x² - x ² = 49 571 – 4747.2 </li></ul><ul><li> n 10 </li></ul><ul><li>= 14.5 </li></ul>
  29. 29. Phew!!!!!! <ul><li>The higher the standard deviation, the greater the spread of data around the mean. </li></ul><ul><li>The standard deviation is the best of the measures of spread as it takes into account all of the values under consideration. </li></ul>
  30. 30. Homework. <ul><li>Research the following tests of significance to find out their meaning. </li></ul><ul><li>The Mann-Whitney U test. </li></ul><ul><li>The Chi- Squared (x²) test. </li></ul>