Upcoming SlideShare
×

# Lesson 7 measures of dispersion part 1

2,312 views

Published on

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,312
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
66
0
Likes
1
Embeds 0
No embeds

No notes for slide
• Updated Version 12/12/2012
• ### Lesson 7 measures of dispersion part 1

1. 1. Introduction to Statistics for Built Environment Course Code: AED 1222 Compiled by DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED) CENTRE FOR FOUNDATION STUDIES (CFS) INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
2. 2. Lecture 8 Today’s Lecture:  The range  Quartiles & the Interquartile range.  Percentiles  Percentiles Rank  The five number summary Measures of variability/dispersion Part I
3. 3. What is/are Measures of Variation/Dispersion? ●If the dispersion are widely dispersed, the central location is said to be less representative of the data as a whole. ●If the dispersion are closely dispersed, the central location is considered more reliable. Measures of Variation/Dispersion ●Measures of Variation / Dispersion is an information on the spread or variability of the data values.
4. 4. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • Describes how the data is spreading or the variability Same center, different variation / dispersion Shape of a Distribution and Variability / Dispersion Measures of variation/dispersion cont. Close dispersion Wide dispersion
5. 5. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. Right-SkewedLeft-Skewed Symmetric Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Shape of a Distribution and Variability / Dispersion • Shapes either Symmetric or skewed Measures of variation/dispersion cont.
6. 6. The range ●The range is simply the difference between the largest and the smallest observed values in a data set. Thus, range, including any outliers, is the actual spread of data. ●A great deal of information is ignored when computing the range, since only the largest and smallest data values are considered. Range = difference between highest and lowest observed values What is/are Range?
7. 7. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • Simplest measure of variation • Difference between the largest and the smallest observations: Range = xmaximum – xminimum 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example: The range cont. Find The Range
8. 8. ●The range value of a data set is greatly influenced by the presence of just one unusually large or small value (outlier). ●The range can be expressed as an interval such as 4–10, where 4 is the lowest value and 10 is highest. ●Often, it is expressed as interval width. For example, the range of 4–10 can also be expressed as a range of 6. The range cont.
9. 9. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • Ignores the way in which data are distributed • Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119 The range cont.
10. 10. ●Other disadvantages of using range is that it does not measure the spread of the majority of values in a data set —it only measures the spread between highest and lowest values. ●As a result, other measures are required in order to give a better picture of the data spread. ●The range is an informative tool used as a supplement to other measures such as the standard deviation or semi-interquartile range, but it should rarely be used as the only measure of spread. The range cont.
11. 11. 1, 2, 4, 6, 12, 15, 19, 26 Smallest Value Largest Value Does not take into account how clumped together the scores are Range = 26 - 1 = 25 Question: Is range a good measure of spread/dispersion? Question: Is value 25 a good representative value? The range cont. Example 1:
12. 12. 425425 430430 430430 435435 435435 435435 435435 435435 440440 440440 440440 440440 440440 445445 445445 445445 445445 445445 450450 450450 450450 450450 450450 450450 450450 460460 460460 460460 465465 465465 465465 470470 470470 472472 475475 475475 475475 480480 480480 480480 480480 485485 490490 490490 490490 500500 500500 500500 500500 510510 515515 525525 525525 525525 535535 549549 550550 570570 570570 575575 575575 580580 590590 600600 600600 600600 600600 615615 615615 510510 Smallest value Largest value does not take into account how clumped together the scores are Range = 615 – 425 = 190 Question: Is range a good measure of spread/dispersion? The range cont. Example 2:
13. 13. The quartiles ●In descriptive statistics, a quartile is any of the three values which divide the sorted (arrayed) data set into four equal parts, so that each part represents one fourth of the sampled population. ●The median divides the data into two equal sets. ●The lower quartile is the value of the middle of the first set, where 25% of the values are smaller than Q1 and 75% are larger. This first quartile takes the notation Q1. ●The upper quartile is the value of the middle of the second set, where 75% of the values are smaller than Q3 and 25% are larger. This third quartile takes the notation Q3. What is/are Quartiles?
14. 14. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • Quartiles split the ranked data into 4 equal groups: 25% 25% 25% 25% Q1 Q2 Q3 The quartiles cont. > first quartile (designated Q1) = lower quartile = cuts off lowest 25% of data = 25th percentile > second quartile (designated Q2) = median = cuts off data set in half (50% of data) = 50th percentile > third quartile (designated Q3) = upper quartile = cuts off highest 25% of data, or lowest 75% = 75th percentile • Note that the second quartile Q2 (the 50th percentile) is the median
15. 15. The formula for locating the position of the observation at a given percentile, y, with n data points sorted in ascending order is: i = Ly = (y/100)n •Case 1: If L is a whole number, then the value will be found halfway between positions L and L+1. •Case 2: If L is a decimal, round up to the nearest whole number. (for example, L = 1.2 becomes 2). Locating the position of the quartiles The quartiles cont.
16. 16. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example:  Find the first quartile (Q1) n = 9 Q1 = 25th percentile, so find i : i = (9) = 2.25 so round up and use the value in the 3rd position: Q1 = 13 25 100 The quartiles cont.
17. 17. The Interquartile Range ●The interquartile range is another range used as a measure of the spread. ●The difference between upper and lower quartiles (Q3– Q1), which is called the interquartile range, also indicates the dispersion of a data set. ●The interquartile range spans 50% of a data set, and eliminates the influence of outliers because, in effect, the highest and lowest quarters are removed. What is/are Interquartile Range?
18. 18. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • Can eliminate some outlier problems by using the interquartile range • Eliminate some high-and low-valued observations and calculate the range from the remaining values. • Interquartile range = 3rd quartile – 1st quartile The interquartile range cont. Interquartile range = difference between upper quartile (Q3) and lower quartile (Q1)
19. 19. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. Median (Q2) X maximumX minimum Q1 Q3 Example: 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27 The interquartile range cont.
20. 20. An exercise A year ago, Ali began working at a computer store. His supervisor asked him to keep a record of the number of sales he made each month. The following data set is a list of his sales for the last 12 months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37. •Use Ali's sales records to find:  the median  the range  the upper and lower quartiles  the interquartile range
21. 21. The values in an ascending array are: 1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57. Median = (12 + 1) ÷ 2 = 6.5th value = (6th + 7th observations) ÷ 2 = (24 + 28) ÷ 2 = 26 Range = difference between the highest and lowest values = 57 – 1 = 56 Exercise cont.
22. 22. Lower quartile = value of middle of first half of data Q1 = the median of 1, 11, 15, 19, 20, 24 = (3rd + 4th observations) ÷ 2 = (15 + 19) ÷ 2 = 17 Upper quartile = value of middle of second half of data Q3 = the median of 28, 34, 37, 47, 50, 57 = (3rd + 4th observations) ÷ 2 = (37 + 47) ÷ 2 = 42 Interquartile range = Q3–Q1 = 42 – 17 = 25 Exercise cont.
23. 23. Exercise cont.
24. 24. Percentiles ●The ath percentile is a value so that roughly a% of the data are smaller and (100-a)% of the data are larger. ●There are three steps for computing a percentile. – Organize the data into an ascending array; – Count the number of values (n); – Select the a*(n+1) observation. What is/are Percentile?
25. 25. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. • The pth percentile in an ordered array of n values is the value in ith position, where  Example: Find the 60th percentile in an ordered array of 19 values. (n) 100 p i = 11.4(19) 100 60 (n) 100 p i === If i is not an integer, round up to the next higher integer value So use value in the i = 12th position Percentiles cont.
26. 26. You can't always be so lucky to have a*(n+1) be a nice whole number. Here are some scenarios: •If a*(n+1) is not a whole number, then go halfway between the two adjacent numbers. •If a*(n+1) < 1, select the smallest observation. •If a*(n+1) > n, select the largest observation. Percentiles cont.
27. 27. • Arrayed data: 18, 33, 58, 67, 73, 93, 147 • There are 7 observations (n=7). • Select 0.50*(7+1) = 4th observation. • Therefore, the 50th percentile equals 67. -Notice that there are three observations larger than 67 and three observations smaller than 67. Percentiles cont. Example:  Compute the 50th percentile for the following data set 73, 58, 67, 93, 33, 18, 147 50th percentile
28. 28.  Suppose we want to compute the 20th percentile… • Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a whole number so we select halfway between 1st and 2nd observation or 25.5.  Suppose we want to compute the 10th percentile… • Since 0.10*(7+1)=0.8, we should select the smallest observation which is 18. Percentile cont. 18, 33, 58, 67, 73, 93, 14710th percentile 20th percentile25.5Example: (Cont.)
29. 29. • Percentile rank of a score is the percentage of scores in its frequency distribution which are lower than it. • Percentile ranks are commonly used to clarify the interpretation of scores on standardized tests. • Given formula: Pr = (number of value below the score+ 0.5) ÷ ( total number of values) (100%). Percentiles Rank What is/are Percentile Rank?
30. 30. Business Statistics: A Decision- Making Approach, 7e © 2008 Prentice-Hall, Inc. Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22 Example:  Find the percentile rank for (17) n = 9 Pr17 = [(no. of values below the score of 17+0.5)/no. of scores]*100 = [(5+0.5)/9]*100 = 61st percentile. This value indicated that the student, who obtained a score of 17 marks, scored higher than 61% of the members of his/her class. Percentile Rank cont. 61%
31. 31. The five number summary A five number summary uses percentiles to describe a set of data. The five number summary consists of •MAX - the maximum value •75% - the 75th percentile •50% - the 50th percentile (or the median) •25% - the 25th percentile •MIN - the minimum value The five number summary splits the data into four regions, each of which contains 25% of the data. What is/are Five Number Summary?
32. 32. • The minimum value is = 1 • The lower half is {1, 3, 4}, and the median of that half (the 25th percentile) is = 3 • The median (the 50th percentile) is = 5 • The upper half is {6, 7, 9}, and the median of that half (the 75th percentile) is = 7 • The maximum value is = 9 • The minimum value is = 1 • The lower half is {1, 3, 4}, and the median of that half (the 25th percentile) is = 3 • The median (the 50th percentile) is = 5 • The upper half is {6, 7, 9}, and the median of that half (the 75th percentile) is = 7 • The maximum value is = 9 The five number summary cont. Example:  Find the five number summary for the data set. 1, 3, 4, 5, 6, 7, 9
33. 33. Next class… The following topics will be discussed:  Measures of variability / dispersion (Part II):  The average absolute deviation  The Variance  The Standard deviation  Coefficient of Variation (CV)