Describing quantitative data with numbers


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Describing quantitative data with numbers

  1. 1. Describing Quantitative Data with Numbers
  2. 2. Summarizing distributions of univariate data 1. Measuring center: median, mean 2. Measuring spread: range, interquartile range, standard deviation 3. Measuring position: quartiles, percentiles, standardized scores (z-scores) 4. Using boxplots 5. The effect of changing units on summary measures
  3. 3. Measuring Center  When describing the “center” of a set of data, we can use the mean or the median.  Mean: “Average” value  Median: “Center” value (Q2)
  4. 4. Where is the Center of the Distribution?  If you had to pick a single number to describe all the data what would you pick?  It’s easy to find the center when a histogram is unimodal and symmetric—it’s right in the middle.  On the other hand, it’s not so easy to find the center of a skewed histogram or a histogram with more than one mode.
  5. 5. Mean To find the mean of a set of observations, add their values and divide by the number of observations. x = xi∑ n
  6. 6. Find the mean of: 2 3 4 6 8 12 6 1286432 +++++ 833.5=x
  7. 7.  Although the mean is the most popular measure of center, it is not always the most appropriate.  The mean is very sensitive to extreme observations (outliers).  Because outliers affect the mean, we say that the mean is NOT a resistant measure of center.  So if the mean is not a resistant measure of center, what is? Median
  8. 8. Median The median is the value with exactly half the data values below it and half above it.  It is the middle data value once the data values have been ordered) that divides the histogram into two equal areas  It has the same units as the data  The median is not influenced by extreme observations, so we say that the median is a resistant measure of center.
  9. 9. Finding the Median First sort the values (arrange them in order), then follow one of these: 1. If the number of data values is even, the median is found by computing the mean of the two middle numbers. 2. If the number of data values is odd, the median is the number located in the exact middle of the list.
  10. 10. 5.40 1.10 0.42 0.73 0.48 1.10 0.42 0.48 0.73 1.10 1.10 5.40 (in order - even number of values – no exact middle shared by two numbers) 0.73 + 1.1 MEDIAN is 0.915 2 5.40 1.10 0.42 0.73 0.48 1.10 0.66 0.42 0.48 0.66 0.73 1.10 1.10 5.40 (in order - odd number of values) exact middle MEDIAN is 0.73
  11. 11. Mean vs Median Mean Median Average value of variable Typical value of variable Not resistant to outliers Resistant to outliers A good measure when the data is symmetric A reliable measure regardless of the shape of the distribution Farther out in the long tail than the median when data is skewed Close to the center even when the data is skewed Easy to find Less prone to mistakes
  12. 12. Check For Understanding
  13. 13. Check For Understanding
  14. 14. Measuring Spread  Range  Interquartile Range (IQR)  Standard Deviation
  15. 15. Range  Distance between largest and smallest values.  Range = Maximum – Minimum  Range is useful if there are no outliers.
  16. 16. Interquartile Range How to find the IQR: 1. Find median 2. Find the median of both halves of data the lower median is 1st Quartile the upper median is 3rd Quartile 3. Subtract the two quartile scores
  17. 17. Outliers One general rule of thumb for identifying outliers is finding any data points that lie: Lower than 1.5 * IQR below Q1 OR Higher than 1.5 * IQR above Q3
  18. 18. Check For Understanding • The “Descriptive Statistics” of test grades for a certain class are listed below. Mean = 74.71 Median = 76 Standard Deviation = 12.61 Minimum = 35 Maximum = 94 Q1 = 68 Q3 = 84 • (a) Determine the IQR for this data. • (b) Using the answer from part (a), determine whether the lowest and highest values in the data are outliers.
  19. 19. Standard Deviation A standard deviation is a measure of the average deviation from the mean. sx = 1 n −1 (xi − x)2 ∑
  20. 20. If the data is uniform or symmetric use: If the data is skewed, use: MeanCenter: Spread:standard deviation MedianCenter: Spread:Five-number summary, Range, IQR
  21. 21. Distributions with Outliers  Since outliers affect mean and standard deviation, it is usually better to use median and IQR  However, if the distribution is unimodal—use mean and median and just report outliers separately  However, if you find a simple reason for outlier, eliminate it and use mean and standard devation—if symmetric
  22. 22. Measuring Position  Quartiles  Percentiles  Z-scores • We can either use z- Scores or percentiles to declare the location of an observation in a distribution. • z-Scores use the mean and standard deviation. • Percentiles use a position relative to the starting point.
  23. 23. Percentiles/Quartiles • is the notation for the kth percentile • is the notation for the nth quartile P Q25 1= P Q50 2= = median P Q75 3=
  24. 24. Finding Percentiles If you are trying to find the percentile corresponding to a certain score x: number of scores < 100 total number of scores x Percentile = × • Percentiles are used often when reporting academic scores such as SAT scores. Let’s say you get a 620 on the math portion of the SAT. It might also indicate that you are in the “78th percentile”. That means that you scored better than 78% of all students taking that particular SAT.
  25. 25. Measuring Relative Standing With Standardized Values (z-Scores) • One way to compare an individual to the whole distribution is to describe it’s location in the distribution relative to the mean. • Let’s do this by describing how many standard deviations an individual is away from the mean value. • We call this the “standardized value,” or, the “z- Score.”
  26. 26. Here is how to interpret z-scores:  A z-score less than 0 represents an element less than the mean.  A z-score greater than 0 represents an element greater than the mean.  A z-score equal to 0 represents an element equal to the mean.  A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.  A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.
  27. 27. Five-Number Summary The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q1 Median Q3 Maximum
  28. 28. Boxplots The five-number summary divides the distribution roughly into quarters. This leads to a new way to display quantitative data, the boxplot.
  29. 29. How to make a boxplot: 1. Draw and label a number line that includes the range of the distribution. 2. Draw a central box from Q1 to Q3. 3. Note the median M inside the box. 4. Extend lines (whiskers) from the box out to the minimum and maximum values that are not outliers.
  30. 30. Comparing Boxplots
  31. 31. Check For Understanding
  32. 32. Effect of Changing Units  If you add a constant to every value, the mean and median increase by the same constant. Example: Suppose you have a set of scores with a mean equal to 5 and a median equal to 6. If you add 10 to every score, the new mean will be 5 + 10 = 15; and the new median will be 6 + 10 = 16.  If you multiply every value by a constant. Then, the mean and the median will also be multiplied by that constant. Example: Assume that a set of scores has a mean of 5 and a median of 6. If you multiply each of these scores by 10, the new mean will be 5 * 10 = 50; and the new median will be 6 * 10 = 60. Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of central tendency are affected when we change units:
  33. 33. Check For Understanding The average score on a test is 150 with a standard deviation of 15. Each score is then increased by 25. What are the new mean and standard deviation?
  34. 34. Check For Understanding The test grades from a college statistics class are shown below. 85 72 64 65 98 78 75 76 82 80 61 92 72 58 65 74 92 85 74 76 77 77 62 68 68 54 62 76 73 85 88 91 99 82 80 74 76 77 70 60 (a) Construct two different graphs of these data (b) Calculate the five-number summary and the mean and standard deviation of the data. (c) Describe the distribution of the data, citing both the plots and the summary statistics found in questions (a) and (b).