Upcoming SlideShare
Loading in...5
×

# Maths A - Chapter 10

1,331

Published on

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total Views
1,331
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Maths A - Chapter 10

1. 1. 10syllabussyllabusrrefefererenceence Strand: Statistics and probability Core topic: Data collection and presentation In thisIn this chachapterpter 10A Calculating and interpreting the mean 10B Mean, from frequency distribution tables 10C Mean, from grouped data 10D Median and mode 10E Best summary statistics 10F Range and interquartile range 10G Standard deviation 10H Comparing sets of data Describing, exploring and comparing data MQ Maths A Yr 11 - 10 Page 381 Wednesday, July 4, 2001 5:58 PM
2. 2. 382 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Introduction Archie is an archeologist. He is passionate about his job, which involves digging for buried artefacts, classifying his ﬁndings and piecing them together to unravel and record the history of past civilizations. Imagine his excitement when he uncovered a site of buried skulls in Egypt! Further investigation conﬁrmed that these were male skulls which had orig- inated from a race residing in Egypt. He was keen to place their existence in time. Delving into existing records, he uncovered measurements on male Egyptian skulls recorded for two time periods – one around 4000 BC and the other around AD 150. These measurements conﬁrmed a change in skull shape over the time period and this was taken as evidence of interbreeding of the Egyptians with migrant populations over the years. If Archie compared the measurements on record with those he made on his recently excavated skulls, he could possibly identify a time in history when this race existed. The measurements of male Egyptian skulls on record for 4000 BC and AD 150 were: 1. breadth of skull 2. height of skull and 3. length of skull. The recorded data for the measurements (in mm) of 30 male Egyptian skulls are collated in the table on the following page. Where should Archie start? Statistical techniques enable us to summarise sets of data, which can then be compared. If Archie can summarise these two data sets, he could then compare them with his own measurements. In this chapter, we shall investigate the main methods available to describe data sets such as these. These methods employ measures of central tendency, in particular the mean, median and mode. We shall also examine the range and interquartile range, the standard deviation, and stem plots and boxplots. We shall then see how these measures can be used to compare sets of data. In the previous chapter we investigated boxplots as a tool for comparing data sets. We now explore this tool further, endeavouring to place Archie’s skull at some period in history. Combining this with other statistical tools may enable us to provide a solution for Archie. Height Breadth Length MQ Maths A Yr 11 - 10 Page 382 Wednesday, July 4, 2001 5:58 PM
3. 3. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 383 4000 BC AD 150 Breadth Height Length Breadth Height Length 131 138 89 137 123 91 125 131 92 136 131 95 131 132 99 128 126 91 119 132 96 130 134 92 136 143 100 138 127 86 138 137 89 126 138 101 139 130 108 136 138 97 125 136 93 126 126 92 131 134 102 132 132 99 134 134 99 139 135 92 129 138 95 143 120 95 134 121 95 141 136 101 126 129 109 135 135 95 132 136 100 137 134 93 141 140 100 142 135 96 131 134 97 139 134 95 135 137 103 138 125 99 132 133 93 137 135 96 129 136 96 133 125 92 132 131 101 145 129 89 126 133 102 138 136 92 135 135 103 131 129 97 134 124 93 143 126 88 128 134 103 134 124 91 130 130 104 132 127 97 138 135 100 137 125 85 128 132 93 129 128 81 127 129 106 140 135 103 131 136 114 147 129 87 124 138 101 136 133 97 MQ Maths A Yr 11 - 10 Page 383 Wednesday, July 4, 2001 5:58 PM
4. 4. 384 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d You may not be familiar with some of the following statistical terms. We shall investigate them further, in this chapter. 1 A set of test results is shown below. 8, 3, 6, 4, 5, 4, 9, 7, 4, 6, 5 a Arrange the scores in ascending order. b How many scores are in the set? c In what position does the middle score lie? d What is the value of the middle score (the median)? e What is the range of the data? f Calculate the average (mean). g How many scores are below the mean? How many above? h Give the most frequently occurring score (mode) of the set of data. i Comment on any difference in value between the mean, median and mode. j Determine values for the lower and upper quartiles. 2 The mean, median and mode are measures of ‘central tendency’. Explain what this term ‘central tendency’ means. 3 The spread of the scores can be determined using a number of statistical measures. Name some measures of ‘spread’ with which you are familiar. 4 What is the relationship between the median and the quartiles? 5 In a boxplot, which of the following are true? a The quartiles divide the data into four sections of equal length. b The median is the score with an equal number of data values above it and below it. c If the ‘whiskers’ are longer than the ‘box’, it means that there are more scores in the whiskers than there are in the box. d The whole ‘box’ contains the same number of scores as the two ‘whiskers’ together. e It is possible to calculate the mean of the set of data by observing the values in the boxplot. 6 For those statements in question 5 that are incorrect, explain why this is so. Adjust the statements to make them correct. Calculating the mean If you were to survey a group of people about what they believe is meant by the word ‘average’, you would ﬁnd a variety of answers. When looking at a set of statistics we are often asked for the average. The average is a ﬁgure that describes a typical score. In statistics, the correct term for the average is the mean. The mean is the ﬁrst of three measures of central tendency that we shall be studying. The others are the median and the mode. The statistical symbol for the mean is x–. The formula for the mean is x– = x∑ n -------- MQ Maths A Yr 11 - 10 Page 384 Wednesday, July 4, 2001 5:58 PM
5. 5. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 385 In mathematics, the symbol Σ (sigma) means sum or total, x represents each individual score in a list and Σ x is therefore the sum of the scores. The sum is divided by n, which represents the number of scores. A graphics calculator can be used to calculate and display many statistical functions. There are several brands of graphics calculator, but the Texas Instrument T83 will be the model referred to in illustrations. Other brands of calculator allow calculations and displays with similar instructions. Many of the exercises lend themselves to either manual working or graphics calculator use. Find the mean of the scores 17, 16, 13, 15, 16, 20, 10, 15. THINK WRITE Find the total of all scores. Total = 17 + 16 + 13 + 15 + 16 + 20 + 10 + 15 Σx = 122 Divide the total by 8 (the number of scores). Mean = x– = 15.25 1 2 122 8 --------- Σx n ------     1WORKEDExample Calculate the mean of the set of data below, using a graphics calculator. 10, 12, 15, 16, 18, 19, 22, 25, 27, 29 THINK WRITE/DISPLAY Enter the data in L1. (Press , select 1:Edit... and press to access the screen.) Calculate the mean. (a) Press . (b) Highlight CALC in the top line. (c) Highlight 1:1–Var Stats and press . (d) Type L1 and press . (e) A number of values are given. The top entry = 19.3 gives us the mean. = 19.3 1 STAT ENTER 2 STAT ENTER ENTER x x 2WORKEDExample MQ Maths A Yr 11 - 10 Page 385 Wednesday, July 4, 2001 5:58 PM
6. 6. 386 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Interpreting the mean When we use the mean, we are attempting to represent the central value of the data. Let us investigate what affects its value. Consider ﬁve scores: 1, 2, 3, 4 and 5. The value of the mean is the total (15), divided by the number of scores (5). The answer, 3, clearly lies in the centre. What would be the value of the mean if the last score had been 20 instead of 5? The answer of 6, where only one score lies above the mean and four lie below it, clearly demonstrates the inﬂuence of extreme values on the mean. Since the calculation takes into account the values of all scores, a check must be applied to deter- mine whether the resulting value is a reasonable representation of the centre of the data. Calculating and interpreting the mean Use a graphics calculator or manual working for the following. 1 Copy and complete the following: Another word commonly used for ‘mean’ is __________. The mean is calculated by ﬁnding the __________ of the scores, then dividing by the __________ of scores. The mean is a measure of __________ tendency. Two other measures are __________ and __________. 2 Calculate the mean of each of the following sets of scores. a 4, 8, 3, 5, 5 b 16, 24, 30, 35, 23, 11, 45, 28 c 65, 92, 56, 84 d 9.2, 9.7, 8.8, 8.1, 5.6, 7.5, 8.5, 6.4, 7.0, 6.4 e 356, 457, 182, 316, 432, 611, 299, 355 3 Majid sits for ﬁve tests in mathematics. His percentages on the tests were 45%, 90%, 67%, 86% and 75%. Calculate Majid’s mean percentage on the ﬁve tests. How many of his percentages were above the mean, and how many below? remember 1. The mean is the statistical term for ‘average’. 2. The mean is calculated by adding all scores then dividing by the number of scores. That is, x– = 3. As a measure of central tendency, the mean represents a value for the ‘centre’ of the scores. 4. Check to determine the number of scores above and below the mean. 5. The value of the mean is affected by extremes in scores. 6. Remember to include correct units in your ﬁnal answer. x∑ n -------- remember 10A WORKED Example 1 WORKED Example 2 SkillS HEET 10.1 MQ Maths A Yr 11 - 10 Page 386 Wednesday, July 4, 2001 5:58 PM
7. 7. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 387 4 An oil company surveys the price of petrol in eight Brisbane suburbs. The results are listed below. Manly 76.9 c/L Kenmore 72.9 c/L Bardon 73.4 c/L Nundah 70.9 c/L Springwood 72.3 c/L Mansﬁeld 75.8 c/L Oxley 73.9 c/L Boondall 71.1 c/L Based on these results, calculate the mean price of petrol in cents per litre in Brisbane. Is this mean a realistic representation of the central value? Explain. 5 The seven players on a netball team have the following heights: 1.65 m, 1.81 m, 1.75 m, 1.78 m, 1.88 m, 1.92 m and 1.86 m. Calculate the mean height of the players on this team, correct to 2 decimal places. How many of the players have heights above the mean height? 6 A golf ball manufacturer randomly tests the mass of 10 golf balls from a batch. The batch will be considered satisfactory if the average mass of the balls is between 44.8 g and 45.2 g. The masses, in grams, of those tested are: 45.19, 45.06, 45.35, 44.78, 45.47, 44.68, 44.95, 45.32, 44.60, 44.95. a Will the batch be passed as satisfactory? b Which ball has a mass which is furthest from the mean — the lightest one or the heaviest one? 7 Consider the ﬁve values, 1, 2, 3, 4 and 5. The mean is calculated as 3. a What happens to the value of the mean if 10 is added to each score? b What effect does multiplying each score by 10 have on the mean’s value? Means of skull measurements Refer to the table of skull measurements for 4000 BC and AD 150 displayed earlier in the chapter. 1 Using the breadth, height and length measurements (in mm) for 4000 BC, calculate the mean for each set of data. 2 Draw up the table shown below and include the means calculated above. The means for the corresponding measurements for AD 150 have been included for comparison. 3 Note the difference between the means for the 4000 BC measurements and the corresponding ones for AD 150. Do you notice a trend? 4 Examine the breadth data for 4000 BC. How many scores are above the mean? How many scores are below the mean? 5 Examine the height and length data sets for 4000 BC and determine the number of scores above and below the mean in each set. 6 In your opinion, does the mean appear to represent a value close to the centre of each data set? inv estigat ioninv estigat ion 4000 BC AD 150 Breadth Height Length Breadth Height Length Mean 136.2 130.3 93.5 MQ Maths A Yr 11 - 10 Page 387 Wednesday, July 4, 2001 5:58 PM
8. 8. 388 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Frequency distribution tables In the last section, we dealt with easily manageable quantities of data. However, more commonly we are confronted with the task of processing much larger data sets. Making sense of large quantities of data is best achieved by using a frequency distribution table. The headings for this table are Score (x), Tally (optional), Frequency ( f ) and a fourth column, ( fx), which contains the score (x) multiplied by the frequency ( f ). The total of this fourth column indicates the total of all the scores. The mean is then calcu- lated by dividing this total of all scores by the sum of the frequency column (which represents the total number of scores). Written as a formula, this is: x– = fx∑ f∑ ----------- Complete the frequency table below, then calculate the mean. Score (x) Tally Frequency (f ) fx 4 | | | 5 | | | | | | 6 | | | | | | | | | 7 | | | | | | | | | | | 8 | | | | | | | | 9 | | | | | Σ f = Σ fx = THINK WRITE Complete the frequency column from the tally column. Complete the fx column by multiplying each score by the frequency. Sum the frequency column. Sum the fx column. Use the formula to calculate the mean. x– = x = x = 6.76 1 2 3 4 Score (x) Tally Frequency (f ) fx 4 | | | 3 12 5 | | | | | | 7 35 6 | | | | | | | | | 11 66 7 | | | | | | | | | | | 13 91 8 | | | | | | | | 10 80 9 | | | | | 6 54 Σ f = 50 Σ fx = 338 5 fx∑ f∑ ----------- 338 50 --------- 3WORKEDExample MQ Maths A Yr 11 - 10 Page 388 Wednesday, July 4, 2001 5:58 PM
9. 9. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 389 To enlist the aid of a graphics calculator in determining the mean in worked example 3: 1. Enter the data. (a) To clear any previous equations press and clear any functions. (b) Press , select 1:EDIT and press . (c) Enter the scores in L1 and the frequencies in L2. 2. Set up the calculator to calculate the mean. (a) Press , select CALC, then the 1-Var Stats option. Type L1 and L2 separated by a comma. (b) Press to display the number of statistical measures. (c) Amongst other statistical data you can read off the number of scores, the sum of the scores and the mean. GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the mean Y= STAT ENTER STAT ENTER remember 1. The mean for a large number of scores is generally calculated from a frequency distribution table. A graphics calculator can also be used. 2. The formula for the mean is x– = fx∑ f∑ ----------- remember MQ Maths A Yr 11 - 10 Page 389 Wednesday, July 4, 2001 5:58 PM
10. 10. 390 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Mean, from frequency distribution tables 1 Using our skull measurements for breadth for 4000 BC, draw up a frequency distribution table as shown below. The tallies for each score have been included. Copy and complete the frequency ( f ) column and the ( fx) column; total the last two columns; then calculate the mean. Notice that its value is the same as that calculated before, using the individual scores. a Same value Same value as n. as total of scores. b x– = = ———? Breadth (x) Tally Frequency ( f ) fx 119 | 124 | 125 | | 126 | | 127 | 128 | | 129 | 130 | 131 | | | | 132 | | | 134 | | | 135 | | 136 | 138 | | 139 | | 141 | Σf = Σfx = 10B WORKED Example 3 EXCE L Spreadshe et Mean fx∑ f∑ ----------- MQ Maths A Yr 11 - 10 Page 390 Wednesday, July 4, 2001 5:58 PM
11. 11. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 391 2 A class’s marks (out of 10) on a spelling test are recorded in the frequency table below. a Copy and complete the table. b Use the formula to calculate the class’s mean. c How many scores are greater than the mean? 3 An electrical store records the number of television sets sold each week over a year. The results are shown in the table below. a Copy and complete the table. b Calculate the mean number of television sets sold each week over the year. Give your answer correct to one decimal place. Score (x) Tally Frequency ( f ) fx 4 | | 5 | | | | 6 | | | | 7 | | | | | | | | 8 | | | 9 | | | | 10 | | Σ f = Σ fx = No. of television sets sold (x) No. of weeks ( f ) fx 16 4 17 4 18 3 19 6 20 7 21 12 22 8 23 2 24 4 25 2 Σ f = Σ fx = Mean fx∑ f∑ -----------= E XCEL Spread sheet Mean DIY MQ Maths A Yr 11 - 10 Page 391 Wednesday, July 4, 2001 5:58 PM
12. 12. 392 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 4 In a soccer season a team played 50 matches. The number of goals scored in each match is shown in the table below. a Redraw this table in the form of a frequency distribution table. b Use your table to calculate the mean number of goals scored each game. c By calculating the number of scores below and above the mean, decide whether its value is suitable as a measure of central tendency. Justify your decision. 5 A clothing store records the dress sizes sold during a day. The results are shown below. 12 14 10 12 8 12 16 10 8 12 10 12 18 10 12 14 16 10 12 12 12 14 18 10 14 12 12 14 14 10 a Present this information in a frequency table. b Calculate the mean dress size sold this day. c Comment on your answer. 6 There are eight players in a Rugby forward pack. The mean mass of the players is 104 kg. The total mass of the forward pack is: 7 A small business employs ﬁve people on a mean wage of \$380 per week. A manager is then employed and receives \$500 per week. What is the mean wage of the six employees? 8 The mean height of ﬁve starting players in a basketball match is 1.82 m. During a time out, a player who is 1.78 m tall is replaced by a player 1.88 m tall. What is the mean height of the players after the replacement has been made? Grouping data and using grouped data In some cases, the range of data values is so great that grouping the data into classes makes the data more manageable. For example, consider the following data set of people with ages ranging from 25 to 49. We might group the ages in intervals of 5 in the form 25–29, 30–34 etc. This means that all the values (25, 26, 27, 28 and 29) would be grouped in one class. The centre of this class would be 27, and this is the value used No. of goals 0 1 2 3 4 5 No. of matches 4 9 18 10 5 4 A 13 kg B 104 kg C 112 kg D 832 kg A \$380 B \$400 C \$480 D \$2400 A 1.78 m B 1.82 m C 1.84 m D 1.88 m Mat hcad Mean mmultiple choiceultiple choice mmultiple choiceultiple choice mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 392 Wednesday, July 4, 2001 5:58 PM
13. 13. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 393 as the score (x). This class centre is then multiplied by the frequency, ( f ). In this case, the value obtained for the mean is an estimate rather than an exact value. Sometimes the choice of the size of the class intervals also has an effect on the accuracy of the mean. Complete the frequency distribution table and use it to estimate the mean of the distribution. Class Class centre (x) Tally Frequency (f ) fx 25–29 | | | | 30–34 | | | | | | | | 35–39 | | | | | | | | | | | 40–44 | | | | | | | | | | 45–49 | | | | | | Σ f = Σ fx = THINK WRITE Calculate the class centres. Complete the frequency column from the tally column. Multiply each class centre by the frequency to complete the fx column. Sum the frequency column. Sum the fx column. Use the formula to calculate the mean. x– = x = x = 38 1 2 3 Class Class centre (x) Tally Frequency (f ) fx 25–29 27 | | | | 4 108 30–34 32 | | | | | | | | 9 288 35–39 37 | | | | | | | | | | | 13 481 40–44 42 | | | | | | | | | | 12 504 45–49 47 | | | | | | 7 329 Σ f = 45 Σ fx = 1710 4 5 6 fx∑ f∑ ----------- 1710 45 ------------ 4WORKEDExample MQ Maths A Yr 11 - 10 Page 393 Wednesday, July 4, 2001 5:58 PM
14. 14. 394 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d A graphics calculator can be used to calculate the mean from a grouped data frequency distribution. In such cases, the class centre can be entered as L1 and the frequency as L2. Remember to set up the 1-Var Stats to recognise these two lists (Xlist: L1 and Freq: L2). Mean, from grouped data 1 a Using our skull measurements for breadth for 4000 BC (shown previously as an ungrouped frequency distribution), draw up the table below, using class intervals 119–121, 122–124 etc. Complete the columns and calculate the mean. b Does the mean differ from the two previous calculations? Explain any difference. Compare with Σfx from exercise 10B, question 1. x– = = __________? Class Class centre (x) Tally Frequency (f) fx 119–121 120 122–124 125–127 128–130 131–133 134–136 137–139 140–142 Σf = Σfx = GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the mean from grouped data remember 1. The mean is the statistical term for average. 2. The mean is calculated by adding all scores then dividing by the number of scores. 3. When calculating the mean from a frequency distribution table, a column for frequency × score ( fx) is added. The mean is then calculated using the formula x– = . 4. If the frequency distribution uses grouped data, the fx column is calculated using class centres for the x-value. 5. The mean can also be calculated using a graphics calculator. fx∑ f∑ ----------- remember 10C WORKED Example 4 fx∑ f∑ ----------- MQ Maths A Yr 11 - 10 Page 394 Wednesday, July 4, 2001 5:58 PM
15. 15. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 395 2 The table below shows a set of class marks on a test out of 100. a Copy and complete the frequency distribution table. b Use the table to calculate the mean class mark. c In which class interval does the mean lie? 3 In the heats of the 100-m freestyle at a swimming meet, the times of the swimmers were recorded in the table below. a Copy and complete the frequency distribution table. b Use the table to calculate the mean time. c How many swimmers swam faster than this mean time? 4 A cricketer played 50 innings in test cricket for the following scores. 23 65 8 112 54 0 84 12 21 4 25 105 74 40 1 15 33 45 21 47 16 70 22 33 21 8 34 36 5 7 69 104 57 78 158 0 51 16 6 16 0 49 0 14 28 52 21 3 3 7 a Put the above information into a frequency distribution table using appropriate groupings. b Use the table to estimate the batting average for this player. c Repeat the exercise using a different size class interval. Compare your answers. Class Class centre (x) Tally Frequency (f ) fx 31–40 | 41–50 | | | 51–60 | | | | 61–70 | | | | | | 71–80 | | | | | | | | | 81–90 | | 91–100 | | Σ f = Σ fx = Time Class centre (x) No. of swimmers ( f ) fx 50.01–51.00 4 51.01–52.00 12 52.01–53.00 23 53.01–54.00 38 54.01–55.00 15 55.01–56.00 3 Σ f = Σ fx = MQ Maths A Yr 11 - 10 Page 395 Wednesday, July 4, 2001 5:58 PM
16. 16. 396 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 5 Use the statistics function on your calculator to ﬁnd the mean of each of the following scores, correct to 1 decimal place. a 11, 15, 13, 12, 21, 19, 8, 14 b 2.8, 2.3, 3.6, 2.9, 4.5, 4.2 c 41, 41, 41, 42, 43, 45, 45, 45, 45, 46, 49, 50 6 Use your calculator to ﬁnd the mean from each of the following tables. 7 The table below shows the heights of a group of people. Calculate the mean of this distribution. 8 Seventy students were timed on a 100-m sprint during their P.E. class. The results are shown in the table below. a Calculate the class centre for each group in the distribution. b Use your calculator to ﬁnd the mean of the distribution. a Score Frequency b Score Frequency 3 7 28 5 4 10 29 18 5 18 30 25 6 19 31 25 7 38 32 14 8 27 33 10 9 10 34 3 10 5 Height Class centre Frequency 150–154 152 7 155–159 157 14 160–164 162 13 165–169 167 23 170–174 172 24 175–179 177 12 Time (s) 12 to <13 13 to <14 14 to <15 15 to <16 16 to <17 Number 13 17 25 15 10 MQ Maths A Yr 11 - 10 Page 396 Wednesday, July 4, 2001 5:58 PM
17. 17. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 397 9 A drink machine is installed near a quiet beach. The number of cans sold over the ﬁrst 10 weeks after its installation is shown below. 4 39 31 31 50 43 70 45 57 71 18 26 3 52 51 59 33 51 27 62 30 90 3 30 97 59 33 44 99 62 72 6 42 83 19 49 11 6 63 4 53 20 45 58 1 9 79 41 2 33 97 71 52 97 69 83 39 84 92 43 71 98 8 97 18 89 21 9 4 17 a Put this information into a frequency distribution table using the classes 1–10, 11–20, 21–30 etc. b Calculate the mean number of cans sold per day over these 10 weeks. c Using the raw data above, calculate the number of days on which the sales were greater than the mean. Median and mode So far we have used the mean as a measure of the typical score in a data set. Consider the case of someone who is analysing the typical house price in an area. On a particular day, ﬁve houses are sold in the area for the following prices: \$175 000 \$149 000 \$160 000 \$211 000 \$850 000 For these ﬁve houses the mean price is \$309 000. The mean is much greater than most of the houses in the data set. This is because there is one score which is much greater than all the others. For such data sets, we need to use a different measure of central tendency. Median The median is the middle score in a data set (of n scores), when all scores are arranged in order. If the data set consists of an odd number of scores, there is one score which lies exactly in the middle. For a data set consisting of an even number of scores, the median will always occur half way between two scores. Work SHEET 10.1 MQ Maths A Yr 11 - 10 Page 397 Wednesday, July 4, 2001 5:58 PM
18. 18. 398 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Using single scores The position of the median can be found using the formula: Median position = th score The median becomes more complicated when there is an even number of scores because there are two scores in the middle. When there is an even number of scores, the median is the average of the two middle scores. Median from a frequency distribution table of ungrouped data The median can be calculated from a frequency distribution table if we extend the table by adding a cumulative frequency column. This column ‘cumulates’ or totals the fre- quencies as we descend the rows. It is then possible to determine which scores are in each position. Consider the frequency distribution table following. n 1+ 2 ------------ For the scores 3, 4, 8, 2, 2, 6, 9, 1, 6 calculate the median. THINK WRITE Rewrite the scores in ascending order. There are 9 scores here. 1, 2, 2, 3, 4, 6, 6, 8, 9 The median is the middle score, that is, the th score. Median = th score Median = 5th score Median = 4 1 2 n 1+ 2 ------------ 9 1+ 2 ------------ 5WORKEDExample Find the median of the scores 13, 13, 16, 12, 19, 18, 20, 18. THINK WRITE Write the scores in ascending order. 12, 13, 13, 16, 18, 18, 19, 20 There is an even number (8) scores, so average the two middle scores. Median = th score. Median = th score = 4.5th score that is, half way between 4th and 5th score. The 4th score is 16. The 5th score is 18. Median = Median = 17 1 2 n 1+ 2 ------------ 8 1+ 2 ------------ 16 18+ 2 ------------------ 6WORKEDExample MQ Maths A Yr 11 - 10 Page 398 Wednesday, July 4, 2001 5:58 PM
19. 19. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 399 There are 30 scores in this distribution and so the middle two scores will be the 15th and 16th scores. By looking down the cumulative frequency column we can see that these scores are both 6. Therefore, 6 is the median of this distribution. Score Frequency Cumulative frequency 4 1 1 The 1st score is 4. 5 6 7 The 2nd–7th scores are 5. 6 9 16 The 8th–16th scores are 6. 7 8 24 The 17th–24th scores are 7. 8 4 28 The 25th–28th scores are 8. 9 2 30 The 29th and 30th scores are 9. Find the median for the frequency distribution at right. THINK WRITE Redraw the frequency table with a cumulative frequency column. There are 45 scores and so the middle score is the 23rd score. Median = score Median = score Median = 23rd score Median = 36 Look down the cumulative frequency column to see that the 23rd score is 36. Score Frequency 34 3 35 8 36 12 37 9 38 8 39 5 1 Score Frequency Cumulative frequency 34 3 3 35 8 (3 + 8) 11 36 12 (11 + 12) 23 37 9 (23 + 9) 32 38 8 (32 + 8) 40 39 5 (40 + 5) 45 2 n 1+ 2 ------------ 45 1+ 2 ---------------3 7WORKEDExample MQ Maths A Yr 11 - 10 Page 399 Wednesday, July 4, 2001 5:58 PM
20. 20. 400 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Mode There are many examples where neither the mean nor the median is the appropriate measure of the typical score in a data set. Using single scores Consider the case of a clothing store. It needs to re-order a supply of dresses. To know what sizes to order it looks at past sales of this particular style and gathers the following data: 8 12 14 12 16 10 12 14 16 18 14 12 14 12 12 8 18 16 12 14 For this data set the mean dress size is 13.2. Dresses are not sold in size 13.2, so this has very little meaning. The median is 13, which also has little meaning as dresses are sold only in even-numbered sizes. What is most important to the clothing store is the dress size that sells the most. In this case size 12 occurs most frequently. The score that has the highest frequency is called the mode. When two scores share the ‘highest’ frequency, that is, occur an equal number of times, both scores are given as the mode. In this situation the scores are bimodal. If all scores occur an equal number of times, then the distribution has no mode. Mode from a frequency distribution table To ﬁnd the mode from a frequency distribution table, we simply give the score that has the highest frequency. Find the mode of the scores below. 4, 5, 9, 4, 6, 8, 4, 8, 7, 6, 5, 4. THINK WRITE The score 4 occurs most often and so it is the mode. Mode = 4 8WORKEDExample For the frequency distribution at right state the mode. THINK WRITE The highest frequency is 14 which belongs to the score 17 and so 17 is the mode. Mode = 17 Score Frequency 14 3 15 6 16 11 17 14 18 10 19 7 9WORKEDExample MQ Maths A Yr 11 - 10 Page 400 Wednesday, July 4, 2001 5:58 PM
21. 21. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 401 When a table is presented using grouped data, we do not have a single mode. In these cases, the class with the highest frequency is called the modal class. Median and mode 1 Copy and complete the following: The median score is the __________ one, when the scores are __________ __________. The formula for the position of the median score is __________. For an even number of scores, it is the __________ of the two middle ones. When using a frequency distribution table, the median is obtained from the __________ __________ column. 2 The scores of seven people on a spelling test are given below. 5 6 5 8 5 9 8 Calculate the median of these marks. 3 Below are the scores of eight people who played a round of golf. 75 80 81 76 84 83 81 82 Calculate the median for this set of scores. 4 Find the median for each of the following sets of scores. a 3, 4, 5, 5, 5, 6, 9 b 5.6, 5.2, 5.4, 5.3, 5.8, 5.4, 5.3, 5.4 c 45, 62, 39, 88, 75 d 102, 99, 106, 108, 101, 103, 102, 105, 102, 101 5 A factory has 80 employees. Over a two-week period the number of people absent from work each day was recorded and the results are shown below. 3, 1, 5, 4, 3, 25, 4, 2, 4, 5 a Calculate the median number of people absent from work each day. b Calculate the mean number of people absent from work each day. c Does the mean or the median give a better measure of the typical number of people absent from work each day? Explain your answer. remember 1. The median is the middle score in a data set or the average of the two middle scores. The scores must be arranged in order. 2. The median can be found using the cumulative frequency column of a frequency table. 3. The mode is the score that occurs the most. 4. Remember to include units in the ﬁnal answer. remember 10D WORKED Example 5 EXCE L Spreadshe et Median WORKED Example 6 SkillS HEET 10.2 EXCE L Spreadshe et Median DIY MQ Maths A Yr 11 - 10 Page 401 Friday, July 6, 2001 2:28 PM
22. 22. 402 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 6 The table below shows the number of cans of drink sold from a vending machine at a high school each day. 7 The table at right shows the number of accidents a tow truck attends each day over a three-week period. Calculate the median number of accidents attended by the tow truck each day. 8 The table at right shows the number of errors made by a machine each day over a 50-day period. Calculate the median number of errors made by the machine each day. 9 There are 25 scores in a distribution. The median score will be the: A 12th score B 12.5th score C 13th score D average of the 12th and 13th scores. Score Frequency Cumulative frequency 17 4 18 9 19 6 20 12 21 8 22 5 23 4 24 2 WORKED Example 7 a Copy and complete the frequency distribution table. b Use the table to calculate the median number of cans of drink sold each day from the vending machine. No. of accidents No. of days 2 4 3 12 4 3 5 1 6 1 No. of errors per day Frequency 0 9 1 18 2 13 3 6 4 3 5 1 mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 402 Wednesday, July 4, 2001 5:58 PM
23. 23. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 403 10 For the scores 4, 5, 5, 6, 7, 7, 9, 10 the median is: 11 Consider the frequency table at right. The median of these scores is: 12 The table below shows the number of sick days taken by each worker in a small busi- ness. a Copy and complete the frequency distribution table. b Calculate the median class for this distribution. 13 Copy and complete the following: The mode is the __________ __________ score. If two scores occur most frequently an equal number of times, we have two modes, and this is termed a __________ dis- tribution. In a frequency distribution table of grouped data, we generally do not attempt to ﬁnd a single mode, but give the __________ __________. 14 For each of the following sets of scores ﬁnd the mode. a 2, 5, 3, 4, 5 b 8, 10, 7, 10, 9, 8, 8 c 11, 12, 11, 15, 14, 13 d 0.5, 0.4, 0.6, 0.3, 0.2, 0.4, 0.6, 0.9, 0.4 e 110, 113, 100, 112, 110, 113, 110 15 Find the mode for each of the following. (Hint: Some are bimodal and others have no mode.) a 16, 17, 19, 15, 17, 19, 14, 16, 17 b 147, 151, 148, 150, 148, 152, 151 c 2, 3, 1, 9, 7, 6, 8 d 68, 72, 73, 72, 72, 71, 72, 68, 71, 68 e 2.6, 2.5, 2.9, 2.6, 2.4, 2.4, 2.3, 2.5, 2.6 A 5 B 6 C 6.5 D 7 A 2 B 3 C 8 D 13 Days sickness Frequency Cumulative frequency 0–4 10 5–9 12 10–14 7 15–19 6 20–24 5 25–29 3 30–34 2 mmultiple choiceultiple choice mmultiple choiceultiple choice Score Frequency 1 12 2 13 3 8 4 7 5 5 E XCEL Spread sheet Mode WORKED Example 8 E XCEL Spread sheet Mode DiY MQ Maths A Yr 11 - 10 Page 403 Wednesday, July 4, 2001 5:58 PM
24. 24. 404 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 16 Use the tables below to state the mode of the distribution. 17 Use the frequency histogram below to state the mode of the distribution. 18 For each of the following grouped distributions, state the modal class. 19 The weekly wage (in dollars) of 40 people is shown below. 376 592 299 501 375 366 204 359 382 274 223 295 232 325 311 513 348 235 329 203 556 419 226 494 205 307 417 204 528 487 543 532 435 415 540 260 318 593 592 393 a Use the classes \$200–\$249, \$250–\$299, \$300–\$349 etc. to display the information in a frequency distribution table. b From your table, calculate the median class. WORKED Example 9 a b cScore Frequency 1 2 2 4 3 5 4 6 5 3 Score Frequency 5 1 6 3 7 5 8 8 9 5 10 3 Score Frequency 38 2 39 4 40 1 41 5 42 6 43 3 44 6 45 2 12 0 10 20 30 13 14 15 16 17 18 Score Frequency 40 19 20 5 15 25 35 a bClass Frequency 1–4 6 5–8 12 9–12 30 13–16 23 17–20 46 21–24 27 25–28 9 Class Frequency 1–7 3 8–14 8 15–21 9 22–28 25 29–35 12 36–42 11 43–49 2 MQ Maths A Yr 11 - 10 Page 404 Wednesday, July 4, 2001 5:58 PM
25. 25. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 405 1 Copy the frequency table above and complete the class centre column. 2 Complete the cumulative frequency column. 3 How many scores in the data set were above 30? 4 How many scores in the data set were 40 or less? 5 Is the data set an example of grouped or ungrouped data? 6 Draw a frequency histogram for the data set. 7 On your histogram draw a frequency polygon for this data set. 8 Calculate the mean of the data. 9 In which class would the median lie? 10 Which is the modal class? Class Class centre Frequency Cumulative frequency 1–10 5 11–20 15 21–30 29 31–40 37 41–50 11 1 MQ Maths A Yr 11 - 10 Page 405 Wednesday, July 4, 2001 5:58 PM
26. 26. 406 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Best summary statistics Having now examined all three summary statistics, it is important to recognise when it is appropriate to use each one. In some circumstances, one summary statistic may be more appropriate than the others. For example, a shoe manufacturer notes that in a new style of sporting footwear: mean size sold is 8.63 median size is 8.75 mode size is 9. Summary statistics for skull measurements Looking back at the data on Egyptian skulls, we are now in a position to summarise the measurements with respect to the mean, median and mode for each set. 1 Draw the table below. (The values for AD 150 have been included for comparison.) 2 For the time period 4000 BC: a enter the values for the means calculated previously b calculate the median for each set c determine the mode for each set. 3 Compare the ﬁgures you obtained for 4000 BC with the corresponding values for AD 150. Jot down comments in the ﬁnal column. 4 Write a paragraph indicating what you feel has happened to the shape of the Egyptian skulls over the time period 4000 BC to AD 150. investigat ioninv estigat ion 4000 BC AD 150 Comment Mean x– Breadth 136.2 Height 130.3 Length 93.5 Median 15.5th score Breadth 137 Height 130 Length 94 Mode Breadth 137 Height 135 Length 92 MQ Maths A Yr 11 - 10 Page 406 Wednesday, July 4, 2001 5:58 PM
27. 27. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 407 In this case, the mode is the most useful measure as the manufacturer needs to know which size sells the most. The mean and median are of less use to the manufacturer. The term average is often used indiscriminately, being interpreted sometimes as the mean, sometimes as the median and sometimes as the mode. The ﬁgure that best sup- ports the cause of the author is the one which (unfortunately) tends to be promoted. We need to be aware of this, particularly when interpreting statistics. When we summarise and report statistical information, we need to act in a responsible manner and report ﬁgures that are not misleading. For each of these examples you will need to think carefully about the relevance of each summary statistic in terms of the particular example. Below are the wages of ten employees in a small business. \$220 \$230 \$290 \$275 \$265 \$250 \$1500 \$220 \$220 \$240 a Calculate the mean wage. b Calculate the median wage. c Calculate the mode wage. d Does the mean, median or mode give the best measure of a typical wage in this business? THINK WRITE a Total all the wages. a Total = \$3710 Divide the total by 10. Mean = \$3710 ÷ 10 = \$371 b Write the wages in ascending order. b \$220 \$220 \$220 \$230 \$240 \$250 \$265 \$275 \$290 \$1500 Average the 5th and 6th score to ﬁnd the median. Median = Median = \$245 c \$220 is the score that occurs most often and so this is the mode. c Mode = \$220 d The mean is larger than what is typical because of one very large wage, and the mode is the lowest wage and so this is not typical. Therefore, the median is the best measure. d The median is the best measure of the typical wage as the mode is the lowest score, which is not typical, and the mean is inﬂated by the \$1500 wage. 1 2 1 2 \$240 \$250+ 2 ------------------------------ 10WORKEDExample MQ Maths A Yr 11 - 10 Page 407 Wednesday, July 4, 2001 5:58 PM
28. 28. 408 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Best summary statistics 1 There are ten houses in a street. A real estate agent values each house with the following results. \$150 000 \$190 000 \$175 000 \$150 000 \$650 000 \$150 000 \$165 000 \$180 000 \$160 000 \$180 000 a Calculate the mean house valuation. b Calculate the median house valuation. c Calculate the mode house valuation. d Which of the above is the best measure of central tendency? 2 The table below shows the number of shoes of each size that were sold over a week at a shoe store. a Calculate the mean shoe size sold. b Calculate the median shoe size sold. c Calculate the mode of the data set. d Which measure of central tendency has the most meaning to the shoe store proprietor? Size Frequency 4 5 5 7 6 19 7 24 8 16 9 8 10 7 remember 1. The three summary statistics are: mean — calculated by adding all scores, then dividing by the number of scores median — the middle score or average of the two middle scores (when scores are arranged in order) mode — the score with the highest frequency. 2. Be careful when using the mean. One or two extreme scores can greatly increase or decrease its value. 3. When the mean is not a good measure of central tendency, the median is used. 4. The mode is the best measure in some examples where discrete data mean that the mean and median may have very little meaning. remember 10E WORKED Example 10 Mat hcad Median, mode and range MQ Maths A Yr 11 - 10 Page 408 Wednesday, July 4, 2001 5:58 PM
29. 29. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 409 3 The table below shows the crowds at football matches over a season. a Calculate the mean crowd over the season. b Calculate the median class. c Calculate the modal class. d Which measure of central tendency would best describe the typical crowd at foot- ball matches over the season? 4 Mr and Mrs Yousef research the typical price of a large family car. At one car yard they ﬁnd six family cars. Five of the cars are priced between \$30 000 and \$40 000, while the sixth is priced at \$80 000. What would be the best measure of the price of a typical family car? 5 Thirty men were asked to reveal the number of hours they spent doing housework each week. The results are given below. 1 5 2 12 2 6 2 8 14 18 0 1 1 8 20 25 3 0 1 2 7 10 12 1 5 1 18 0 2 2 a Represent the data in a frequency distribution table. (Use classes 0–4, 5–9, 10–14 etc.) b Find the mean number of hours that the men spend doing housework. c Find the median class for hours spent by the men at housework. d Find the modal class for hours spent by the men at housework. 6 The resting pulse rates of 20 female athletes were measured. The results are shown below. 50 62 48 52 71 61 30 45 42 48 43 47 51 52 34 61 44 54 38 40 a Represent the data in a frequency distribution table using appropriate groupings. b Find the mean of the data. c Find the median class of the data. d Find the modal class of the data. e Comment on the similarities and differences between the three values. Crowd Class centre Frequency 10 000 to <20 000 15 000 95 20 000 to <30 000 25 000 64 30 000 to <40 000 35 000 22 40 000 to <50 000 45 000 15 50 000 to <60 000 55 000 3 60 000 to <70 000 65 000 0 70 000 to <80 000 75 000 1 A Mean B Median C Mode D All are equally important. mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 409 Wednesday, July 4, 2001 5:58 PM
30. 30. 410 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 7 The following data give the age of 25 patients admitted to the emergency ward of a hospital. 18 16 6 75 24 23 82 74 25 21 43 19 84 72 31 74 24 20 63 79 80 20 23 17 19 a Represent the data in a frequency distribution table. (Use classes 1–15, 16–30, 31–45, etc.) b Find the mean age of patients admitted. c Find the median class of age of patients admitted. d Find the modal class for age of patients admitted. e Do any of your statistics (mean, median or mode) give a clear representation of the typical age of an emergency ward patient? f Give some reasons that could explain the pattern of the distribution of data in this question. 8 The batting scores for two cricket players over six innings are as follows: Player A 31, 34, 42, 28, 30, 41 Player B 0, 0, 1, 0, 250, 0 a Find the mean score for each player. b Which player appears to be better if the mean result is used? c Find the median score for each player. d Which player appears to be better when the decision is based on the median result? e Which player do you think would be more useful to have in a cricket team and why? How can the mean result sometimes lead to a misleading conclusion? 9 The following frequency table gives the number of employees in different salary brackets for a small manufacturing plant. a Workers are arguing for a pay rise but the management of the factory claims that workers are well paid because the mean salary of the factory is \$22 100. Is this a sound argument? b Suppose that you were representing the factory workers and had to write a short submission in support of the pay rise. How could you explain the management’s claim? Provide some other statistics to support your case. Position Salary (\$) No. of employees Machine operator 18 000 50 Machine mechanic 20 000 15 Floor steward 24 000 10 Manager 62 000 4 Chief Executive Ofﬁcer 80 000 1 MQ Maths A Yr 11 - 10 Page 410 Wednesday, July 4, 2001 5:58 PM
31. 31. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 411 Wage rise The workers in an ofﬁce are trying to obtain a wage rise. In the previous year, the ten people who work in the ofﬁce received a 2% rise while the company CEO received a 42% rise. 1 What was the mean wage rise received in the ofﬁce last year? 2 What was the median wage rise received in the ofﬁce last year? 3 What was the modal wage rise received in the ofﬁce last year? 4 The company is trying to avoid paying the rise. What statistic do you think they would quote about last year’s wage rises? Why? 5 What statistic do you think the trade union would quote about wage rises? Why? 6 Which statistic do you think is the most ‘honest’ reﬂection of last year’s wage rises? Explain your answer. Summary statistics for house prices Quoting different averages can give different impressions about what is normal. Try the following task. 1 Visit a local real estate agent and study the properties for sale in the window. Alternatively, retrieve the for-sale ads for a real estate company from the newspaper. 2 Calculate the mean, median and mode price for houses in the area. 3 If you were a real estate agent and a person wanting to sell his/her home asked what the typical property sold for in the area, which ﬁgure would you quote? 4 Which ﬁgure would you quote to a person who wanted to buy a house in the area? Best summary statistics and comparison of samples For this investigation, work in groups of 3 to 6 students. Examine each of the following statistics. • The typical mark in maths among Year 11 students. • The number of attempts taken by Years 11 and 12 students to get their driver’s licence. • The typical number of days taken off school by Year 11 students so far this year. 1 For each of the above, gather your data by selecting a random sample. 2 Calculate the mean, median and mode for each topic. 3 Compare your results with the results of other students who will have selected their samples from the same population. 4 In each case, state the best summary statistic and explain your answer to the other groups in your class. inv estigat ioninv estigat ioninv estigat ioninv estigat ioninv estigat ioninv estigat ion MQ Maths A Yr 11 - 10 Page 411 Wednesday, July 4, 2001 5:58 PM
32. 32. 412 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Measures of dispersion or spread Once a set of scores has been collected and tabulated, we are ready to make some con- clusions about the data. Two key concepts are the range and the interquartile range, which are used to measure the spread of a set of scores. Range The range is the difference between the highest and the lowest score. Range = highest score − lowest score Range from single scores A smaller range will usually represent a more consistent set of scores. Exceptions to this are when one or two scores are much higher or lower than most. A graphics calculator can also be used to determine the range of a distribution. The 1-Var Stats displays min X (lowest score) and max X (highest score). The difference between these two values indicates the range of the data. Range from a frequency distribution table When we are calculating the range from a frequency distribution table, we ﬁnd the highest and lowest scores from the score column. We do not use any information from the frequency column in calculating the range. When the data are presented in grouped form, the range is found by taking the highest score from the highest class and the lowest score from the lowest class. There are 17 players in the squad for a State of Origin match. The number of State of Origin matches played by each member of the squad is shown below. 2 6 12 8 1 4 8 9 24 4 5 11 14 6 11 15 10 What is the range of this distribution? THINK WRITE The lowest number of matches played is 1. Lowest score = 1 match The highest number of matches played is 24. Highest score = 24 matches Calculate the range by subtracting the lowest score from the highest score. Range = 24 − 1 = 23 matches 1 2 3 11WORKEDExample GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the range of a distribution MQ Maths A Yr 11 - 10 Page 412 Wednesday, July 4, 2001 5:58 PM
33. 33. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 413 Interquartile range In many cases, the range is not a good indicator of the overall spread of scores. Consider the two sets of scores below, showing the wages of people in two small businesses. A: \$240, \$240, \$240, \$245, \$250, \$250, \$260, \$800 B: \$180, \$200, \$240, \$290, \$350, \$400, \$500, \$600 The range for business A = \$800 − \$240 and for business B = \$600 − \$180 = \$560 = \$420 While the range for business A is greater, by looking at the wages in the two busi- nesses, we can see that the wages in business B are generally more spread. The range uses only two scores in its calculation. The interquartile range is usually a better measure of dispersion (spread). We looked at this in the previous chapter. The quartiles are found by dividing the data into quarters. The lower quartile is the lowest 25% of scores, the upper quartile is the highest 25% of scores. Before we can calculate an interquartile range we must be able to calculate the median. To calculate the median, we must ﬁrst arrange the scores in ascending order. The median is the middle score (if there is an odd number of scores; or the average of the two middle scores if there is an even number of scores). Remember that the median position is the th score. The frequency distribution table at right shows the heights (in cm) of boys competing for a place on a basketball team. Find the range of these data. THINK WRITE The lowest score is at the bottom of the 170 to <175 class. Lowest score = 170 cm The highest score is at the top of the 195 to <200 class. Highest score = 200 cm Range = highest score − lowest score. Range = 200 − 170 = 30 cm Height Frequency 170 to <175 3 175 to <180 6 180 to <185 12 185 to <190 10 190 to <195 8 195 to <200 1 1 2 3 12WORKEDExample n 1+ 2 ------------ MQ Maths A Yr 11 - 10 Page 413 Wednesday, July 4, 2001 5:58 PM
34. 34. 414 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d The interquartile range is the difference between the upper quartile and the lower quartile. To ﬁnd the lower and upper quartiles we arrange the scores in ascending order. The lower quartile is of the way through the distribution and the upper quartile is of the way through the distribution. To ﬁnd the interquartile range we follow the steps below. 1. Arrange the data in ascending order. 2. Divide the data into halves by ﬁnding the median. (a) If there is an odd number of scores the median score should not be included in either half of the scores. (b) If there is an even number of scores the middle will be half way between two scores and this will divide the data neatly into two sets. 3. The lower quartile will be the median of the lower half of the data. 4. The upper quartile will be the median of the upper half of the data. 5. The interquartile range will be the difference between the medians of the two halves of the data. Calculate the median of: a 2, 5, 8, 8, 8, 11, 12 b 45, 69, 69, 87, 88, 92, 99, 100. THINK WRITE a These scores are already arranged in ascending order, so there is no need to reorder. There are 7 scores, so the median is the 4th score. a Median = 8 b There are 8 scores, so the median is the average of the 4th score and the 5th score. b Median = = 87.5 7 1+ 2 ------------ 4th score=     8 1+ 2 ------------ 4.5th score=     87 88+ 2 ------------------ 13WORKEDExample 1 4 --- 3 4 --- 14WORKEDExample Find the interquartile range of the following data which shows the number of home runs scored in a series of baseball matches. 12, 9, 4, 6, 5, 8, 9, 4, 10, 2 THINK WRITE Write the data in ascending order. 2, 4, 4, 5, 6, 8, 9, 9, 10, 12 Divide the data into two equal halves. 2, 4, 4, 5, 6 8, 9, 9, 10, 12 The lower quartile will be the median of the lower half. Lower quartile = 4 runs The upper quartile will be the median of the upper half. Upper quartile = 9 runs The interquartile range will be the upper quartile minus the lower quartile. Interquartile range = 9 − 4 = 5 runs 1 2 3 4 5 MQ Maths A Yr 11 - 10 Page 414 Wednesday, July 4, 2001 5:58 PM
35. 35. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 415 The interquartile range can also be calculated using a graphics calculator. The data below give the amount spent (to the nearest whole dollar) by each child in a group that was taken on an excursion to the Brisbane Exhibition. 15 12 17 23 21 19 16 11 17 18 23 24 25 21 20 37 17 25 22 21 19 Calculate the interquartile range for these data. THINK DISPLAY Enter the data. (a) Press . (b) Select 1:Edit by pressing . (c) Enter the data in L1. Note: There is no need to organise the data into ascending order ﬁrst. Obtain the values of the quartiles. (a) Press . (b) Select CALC. Make sure that 1-Var Stats is set up as Xlist: L1 and Freq: 1. (c) Select 1:1–Var Stats by pressing . (d) Type L1. Press . A list of statistics appears. Locate the ﬁrst and third quartiles. Scroll down the screen using the key. Q1 = 17 and Q3 = 23 So, IQR = \$23 − \$17 = \$6 1 STAT ENTER 2 STAT ENTER ENTER 3 M 15WORKEDExample remember 1. Measures of dispersion are used to measure the spread of a set of scores. 2. The range is calculated by subtracting the lowest score from the highest score. 3. A single outlying score can enlarge the range. The interquartile range is therefore a better measure of dispersion. 4. The interquartile range is found by subtracting the lower quartile from the upper quartile. 5. The lower and upper quartiles are found by dividing the scores into two equal halves. The median of the lower half is the lower quartile and the median of the upper half is the upper quartile. 6. Remember to show units in your ﬁnal answer. remember MQ Maths A Yr 11 - 10 Page 415 Wednesday, July 4, 2001 5:58 PM
36. 36. 416 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Range and interquartile range 1 Copy and complete the following: The range is a measure of __________ or __________ of a set of scores. It can be calculated by subtracting the __________ score from the __________ score. The value of the range can be affected by a single __________ score. For this reason, the __________ range is sometimes a better measure of the spread of the scores. It can be calculated as the difference between the __________ quartile and the __________ quartile. The __________ divides the scores in half; the lower quartile represents a score below which lies __________ of the scores; the upper quartile represents a score above which __________ of the scores lie. The lower quartile, median and upper quartile divide the distribution into __________ equal parts. In each of these parts there is the same number of __________. 2 Find the range of each of the following sets of data. a 2, 5, 4, 5, 7, 4, 3 b 103, 108, 111, 102, 111, 107, 110 c 2.5, 2.8, 3.4, 2.7, 2.6, 2.4, 2.9, 2.6, 2.5, 2.8 d 3.20, 3.90, 4.25, 7.29, 1.45, 2.77, 8.39 e 45, 23, 7, 47, 76, 89, 96, 48, 87, 76, 66 3 Use the frequency distribution tables below to ﬁnd the range for each of the following sets of scores. a Score Frequency b Score Frequency 1 2 38 23 2 6 39 46 3 12 40 52 4 10 41 62 5 7 42 42 43 45 c Score Frequency 89 12 90 25 91 36 92 34 93 11 94 9 95 4 10F WORKED Example 11 MQ Maths A Yr 11 - 10 Page 416 Wednesday, July 4, 2001 5:58 PM
37. 37. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 417 4 For the grouped dispersions below, state the range. 5 The scores below show the number of points scored by two AFL teams over the ﬁrst 10 games of the season. Sydney: 110 95 74 136 48 168 120 85 99 65 Collingwood: 125 112 89 111 96 113 85 90 87 92 a Calculate the range of the scores for each team. b Based on the results above, which team would you say is the more consistent? 6 Two machines are used to put approximately 100 Smarties into boxes. A check is made on the operation of the two machines. Ten boxes ﬁlled by each machine have the number of Smarties in them counted. The results are shown below. Machine A: 100, 99, 99, 101, 100, 101, 100, 100, 101, 108 Machine B: 98, 104, 96, 97, 103, 96, 102, 100, 97, 104 a What is the range in the number of Smarties from the ﬁrst machine? b What is the range in the number of Smarties from the second machine? c Ralph is the quality control ofﬁcer and he argues that machine A is more consis- tent in its distribution of Smarties. Explain why. 7 Find the median for each of the data sets below. a 3, 4, 4, 5, 7, 9, 10 b 17, 20, 19, 25, 29, 27, 28, 25, 29 c 52, 55, 53, 53, 54, 55, 52, 53, 54, 52 d 12, 14, 15, 12, 14, 19, 17, 15, 18, 20 e 56, 75, 83, 47, 93, 35, 84, 83, 73, 20, 66, 90 a Class Frequency b Class Frequency 51–60 2 150 to <155 12 61–70 8 155 to <160 25 71–80 15 160 to <165 38 81–90 7 165 to <170 47 91–100 1 170 to <175 39 175 to <180 20 c Class Frequency 40–43 48 44–47 112 48–51 254 52–55 297 56–59 199 60–63 84 WORKED Example 12 WORKED Example 13 MQ Maths A Yr 11 - 10 Page 417 Wednesday, July 4, 2001 6:00 PM
38. 38. 418 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 8 For each of the data sets in question 7, calculate the interquartile range. 9 For the frequency table below, what is the range? 10 Calculate the interquartile range of the following data. 17, 18, 18, 19, 20, 21, 21, 23, 25 11 The interquartile range is considered to be a better measure of the variability of a set of scores than the range because it: A takes into account more scores B is the difference between the upper and lower quartiles C is easier to calculate D is not affected by extreme values. 12 The distribution below shows the ranges in the heights of 25 members of a football squad. Which of the statements below is correct? A The range of the distribution is 40. B The range of the distribution is 49. C The range of the distribution is 9. D The range can be estimated only by using the cumulative frequency. Score Frequency 25 14 26 12 27 19 28 25 29 19 A 4 B 5 C 6 D 17 A 3 B 4 C 5 D 8 Height (cm) Class centre Frequency Cumulative frequency 140–149 144.5 2 2 150–159 154.5 5 7 160–169 164.5 10 17 170–179 174.5 7 24 180–189 184.5 1 25 WORKED Example 14,15 EXCE L Spreadshe et Interquartile range mmultiple choiceultiple choice mmultiple choiceultiple choice mmultiple choiceultiple choice mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 418 Wednesday, July 4, 2001 6:00 PM
39. 39. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 419 Standard deviation We have already discussed using the range and the interquartile range as measures of the spread of a data set. However, the most commonly used measure of spread is the standard deviation. The standard deviation is a measure of how much a typical score in a data set differs from the mean. Standard deviation from single scores The standard deviation may be found by entering a set of scores into your calculator, just as you do when you are ﬁnding the mean. Your calculator will have a statistical function that gives the standard deviation. There are two standard deviation functions on your calculator. The ﬁrst, σn, is the population standard deviation. This function is used when the statistical analysis is conducted on the entire population. When the statistical analysis is done using a sample of the population, a slightly dif- ferent standard deviation function is used. Called the sample standard deviation, this value will be slightly higher than the population standard deviation. The sample standard deviation will be found on your calculator using the σn − 1 or the sn function. Below are the scores out of 100 achieved by a class of 20 students on a science exam. Calculate the mean and the standard deviation. 87 69 95 73 88 47 95 63 91 66 59 70 67 83 71 57 82 65 84 69 THINK WRITE Enter the data set into your calculator. Retrieve the mean using the x– function. x– = 74.05 marks Retrieve the standard deviation using the σn function. σn = 13.07 marks 1 2 3 16WORKEDExample Ian surveys twenty Year 11 students and asks how much money they earn from part-time work each week. The results are given below. \$65 \$82 \$47 \$78 \$108 \$94 \$60 \$79 \$88 \$91 \$50 \$73 \$68 \$95 \$83 \$76 \$79 \$72 \$69 \$97 Calculate the mean and standard deviation. THINK WRITE Enter the statistics into your calculator. Retrieve the mean using the x– function. x– = \$77.70 Retrieve the standard deviation using the σn − 1 function, as a sample has been used. σn − 1 = \$15.56 1 2 3 17WORKEDExample MQ Maths A Yr 11 - 10 Page 419 Wednesday, July 4, 2001 6:00 PM
40. 40. 420 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d For most examples, you will need to read the question carefully to decide whether to use the population or the sample standard deviation. Standard deviation using a graphics calculator A graphics calculator can be used to determine a population or sample standard devi- ation. Using the TI–83, the population standard deviation is displayed under 1-Var Stats as sx and the sample standard deviation is shown as Sx. As mentioned previously, the sample standard deviation is slightly higher than the population standard deviation (compare the values for Sx and σx in the above example). Standard deviation from a frequency distribution table The standard deviation can also be calculated when the data are presented in table form. This is done by entering the data in the same way as they were when calculating the mean earlier in this chapter. A graphics calculator can also be used. The price (in cents) per litre of petrol at a service station is recorded each Friday over a 15-week period and the data are given below. 76.2 80.1 79.8 84.3 80.7 78.3 82.4 81.3 80.5 78.2 79.5 80.1 81.3 84.2 83.4 Calculate the sample standard deviation for this set of data using a graphics calculator. THINK WRITE/DISPLAY Enter the data as L1 in the graphics calculator. Calculate the standard deviation. (a) Press . (b) Highlight CALC. (c) Select 1–Var Stats. (d) Type L1 or the name of the list into which you have entered the data. A list of statistics is produced with the mean at the top. So, the standard deviation is given by Sx = 2.257 959 467 Sx = 2.26 (correct to 2 decimal places). s = 2.257 959 467 s = 2.26 cents/L 1 2 STAT 18WORKEDExample MQ Maths A Yr 11 - 10 Page 420 Wednesday, July 4, 2001 6:00 PM
41. 41. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 421 If you are using a graphics calculator to determine the mean and standard deviation in example 19, enter the scores as L1 and their corresponding frequencies as L2. Remember to set up 1-Var Stats as Xlist: L1 and Freq: L2. The mean will then be displayed as x and the standard deviation as sx. Once we have calculated the standard deviation we can make conclusions about the reliability and consistency of the data set. The lower the standard deviation, the less spread out the data set is. By using the standard deviation we can determine whether a set of scores is more or less consistent (or reliable) than another set. The standard deviation is the best measure of this because, unlike the range or interquartile range as a measure of dispersion, the standard deviation considers the distance of every score from the mean. A higher standard deviation means that scores are less clustered around the mean and less dependable. For example, consider the following two students’ results over a number of assessment pieces: Student A: x– = 60 σn = 5 Student B: x– = 60 σn = 15 The table below shows the scores of a class of thirty Year 3 students on a spelling test. Calculate the mean and standard deviation. Score Frequency 4 1 5 2 6 4 7 9 8 6 9 7 10 1 THINK WRITE Enter the data into your calculator using score × frequency. Retrieve the mean by using the x– function. x– = 7.4 Retrieve the standard deviation using the σn function, as the whole population is included in the statistics. σn = 1.4 1 2 3 19WORKEDExample GraphicsCalculatorGraphicsCalculator tip!tip! Standard deviation MQ Maths A Yr 11 - 10 Page 421 Wednesday, July 4, 2001 6:00 PM
42. 42. 422 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Both students have the same mean. However, student A has a standard deviation of 5 and student B has a standard deviation of 15. Student A is far more consistent and can conﬁdently be expected to score around 60 in any future exam. Student B is more inconsistent but is probably capable of scoring a mark higher than student A’s. Two brands of light globe are tested to see how long they will burn (in hours). Brand X: 850 950 1400 875 1200 1150 1000 900 850 825 Brand Y: 975 1100 1050 1000 975 950 1075 1025 950 900 Which of the two brands of light globe is more reliable? THINK WRITE Enter both sets of data into your calculator. Choose the sample standard deviation because a sample of each light globe brand has been chosen. Write down the sample standard deviation for each brand. Brand X: sample standard deviation = 190.4 h Brand Y: sample standard deviation = 62.4 h The brand with the lower standard deviation is the more reliable. Brand Y is the more reliable as it has a lower standard deviation. 1 2 3 4 20WORKEDExample remember 1. The standard deviation is a measure of the spread of a data set. 2. Standard deviation is found on your calculator by entering the data set using the calculator’s statistical mode. 3. The population standard deviation is used when an entire population is considered in the statistical analysis and can be found on the calculator using the σn function (or Sx on the graphics calculator). 4. The sample standard deviation is used when a sample of the population is used in the analysis and can be found using the σn − 1 function (Sx on the graphics calculator). 5. A set of data is considered to be more consistent or reliable if it has a low standard deviation. remember MQ Maths A Yr 11 - 10 Page 422 Wednesday, July 4, 2001 6:00 PM
43. 43. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 423 Standard deviation 1 Copy and complete the following: Standard deviation is a measure of the __________ of the scores. It represents how much a typical score differs from the __________ of the data set. The statistical func- tion used on the calculator for a population standard deviation is __________, while for a sample of the population, the function used is __________. A low value for the standard deviation indicates that the set of scores is more __________ while a high value indicates __________ consistency or reliability. 2 For each of the sets of scores below, calculate the standard deviation. Assume that the scores represent an entire population and answer correct to 2 decimal places. a 3, 5, 8, 2, 7, 1, 6, 5 b 11, 8, 7, 12, 10, 11, 14 c 25, 15, 78, 35, 56, 41, 17, 24 d 5.2, 4.7, 5.1, 12.6, 4.8 e 114, 12, 3.6, 42.8, 0.5 3 For each of the sets of scores below, calculate the sample standard deviation, correct to 2 decimal places. a 25, 36, 75, 85, 6, 49, 77, 80, 37, 66 b 4.8, 9.3, 7.1, 9.9, 7.0, 4.1, 6.2 c 112, 25, 56, 81, 0, 5, 178, 99, 41 d 0.3, 0.3, 0.3, 0.4, 0.5, 0.6, 0.8, 0.8, 0.8, 0.9, 1.0 e 56, 1, 258, 45, 23, 58, 48, 35, 246 4 For each of the following, state whether it is appropriate to use the population standard deviation or the sample standard deviation. a A quality control ofﬁcer tests the life of 50 batteries from a batch of 1000. b The weight of every bag of potatoes is checked and recorded before being sold. c The number of people who attend every football match over a season is analysed. d A survey of 100 homes records the number of cars in each household. e The score of every Year 11 student in mathematics is recorded. 5 The band ‘Aquatron’ is to release a new CD. The recording company needs to predict the number of copies that will be sold at various music stores throughout Australia. To do so, a sample of 10 music stores supplied information about the sales of the previous CD released by Aquatron, as shown below. 580 695 547 236 458 620 872 364 587 1207 a Calculate the mean number of sales at each store. b Should the population or sample standard deviation be used in this case? c What is the value of the appropriate standard deviation? 10G WORKED Example 16 GC pro gram UV statistics WORKED Example 17 WORKED Example 18 MQ Maths A Yr 11 - 10 Page 423 Wednesday, July 4, 2001 6:00 PM
44. 44. 424 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 6 A supermarket chain is analysing its sales over a week. The chain has 15 stores and the sales for each store for the past week were (in \$million): 1.5 2.1 2.4 1.8 1.1 0.8 0.9 1.1 1.4 1.6 2.0 0.7 1.2 1.7 1.3 a Calculate the mean sales for the week. b Should the population or sample standard deviation be used in this case? c What is the value of the appropriate standard deviation? 7 Use the statistical function on your calculator to ﬁnd the mean and standard deviation (correct to 1 decimal place) for the information presented in the following tables. In each case, use the population standard deviation. 8 Copy and complete the class centre column for each of the following distributions and hence use your calculator to ﬁnd an estimate for the mean and standard deviation (correct to 2 decimal places). In each case use the population standard deviation. WORKED Example 18 a b c Score Frequency 3 12 4 24 5 47 6 21 7 7 Score Frequency 45 1 46 16 47 39 48 61 49 52 50 36 Score Frequency 75 22 76 17 77 8 78 10 79 12 80 21 81 29 WORKED Example 19 a c b Class Class centre Frequency 10–12 12 13–15 16 16–18 25 19–21 28 22–24 13 Class Class centre Frequency 31–40 15 41–50 28 51–60 36 61–70 19 71–80 8 81–90 7 91–100 2 Class Class centre Frequency 0–4 15 5–9 24 10–14 31 15–19 33 20–24 29 25–29 17 MQ Maths A Yr 11 - 10 Page 424 Wednesday, July 4, 2001 6:00 PM
45. 45. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 425 9 Below are the marks achieved by two students in ﬁve tests. Brianna: 75, 80, 70, 72, 78 Katie: 50, 95, 90, 80, 55 a Calculate the mean and standard deviation for each student. b Which of the two students is more consistent? Explain your answer. 10 From Year 11, 21 students are chosen to complete a test. The scores are shown in the table below. When preparing an analysis of the typical performance of Year 11 students on the test, the standard deviation used is: 11 The results below are Ian’s marks in four exams for each subject that he studies. English: 63 85 78 50 Maths: 69 71 32 97 Biology: 45 52 60 41 Geography: 65 78 59 61 In which subject does Ian achieve the most consistent results? 12 The following frequency distribution gives the prices paid by a car wrecking yard for a sample of 40 car wrecks. Find the mean and standard deviation of the price paid for these wrecks. Class Frequency 10 to <20 1 20 to <30 6 30 to <40 9 40 to <50 4 50 to <60 1 A 9.209 B 9.437 C 21 D 34.048 A English B Maths C Biology D Geography Price (\$) Frequency 0 to <500 2 500 to <1000 4 1000 to <1500 8 1500 to <2000 10 2000 to <2500 7 2500 to <3000 6 3000 to <3500 3 WORKED Example 20 mmultiple choiceultiple choice mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 425 Wednesday, July 4, 2001 6:00 PM
46. 46. 426 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 13 The table below shows the life of a sample of 175 household light globes. a Find the range of the data. b Use the class centres to ﬁnd the mean and standard deviation in the lifetimes of this sample of light globes. 14 Crunch and Crinkle are two brands of potato crisps. Each is sold in packets nominally of the same size and for the same price. Upon investigation of a sample of packets of each, it is found that Crunch and Crinkle packets have the same mean mass (25 g). The standard deviation of the masses of Crunch packets is, however, 5 g and the standard deviation of the masses of Crinkle packets is 2 g. Which brand do you think represents better value for money under these circumstances? Why? Life (hours) Frequency 200 to <250 2 250 to <300 5 300 to <350 12 350 to <400 25 400 to <450 42 450 to <500 38 500 to <550 26 550 to <600 15 600 to <650 7 650 to <700 3 Work SHEET 10.2 MQ Maths A Yr 11 - 10 Page 426 Wednesday, July 4, 2001 6:00 PM
47. 47. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 427 For the set of scores 23, 45, 24, 19, 22, 16, 16, 27, 20, 21, ﬁnd: 1 the mean 2 the median 3 the mode 4 the range 5 the lower quartile 6 the upper quartile 7 the interquartile range 8 the population standard deviation. 9 Which measure of central tendency is the best measure of location in this data set? 10 Explain why the interquartile range is a better measure of spread than the range. Displaying statistical data and statistical graphs 1 As a class, collect information on: a the number of people that live in each student’s household b the number of pets in each student’s household. 2 Use your graphics calculator to enter the data as two separate lists, L1 and L2. 3 Use the statistics function on the calculator to ﬁnd the following information for each data set. a mean b median c minimum value d maximum value e lower quartile f upper quartile 4 Use the statistical plotting function on your calculator to draw a boxplot of the data you have entered. inv estigat ioninv estigat ion 2 MQ Maths A Yr 11 - 10 Page 427 Thursday, July 5, 2001 9:14 AM
48. 48. 428 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Archie’s new skull site Having considered some of the statistics on skull measurements from 4000 BC and AD 150, we are now in a position to collate all the ﬁgures from these two time periods and compare them with the statistics from Archie’s new discovery. We must approach this task in a methodical manner. Step 1 Organise the statistics we already know. Copy the table below and ﬁll in any data already calculated for 4000 BC and AD 150. Step 2 Calculate and ﬁll in any statistics missing for 4000 BC and AD 150 (that is, standard deviation and range values). Step 3 Consider the data Archie has collated from measurements on the skulls from his new site. GCpr ogram UV statistics investigat ioninv estigat ion 4000 BC AD 150 New site Mean Breadth Height Length Median Breadth Height Length Mode Breadth Height Length Standard deviation Breadth Height Length Range Breadth Height Length MQ Maths A Yr 11 - 10 Page 428 Wednesday, July 4, 2001 6:00 PM
49. 49. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 429 Calculate the mean, median, mode, standard deviation and range for the data from the new site. Add these to the table. Step 4 Compare the values obtained for Archie’s new site with those for 4000 BC and AD 150. Step 5 Now comes the decision phase. To which era does the new site appear closer? Is there consistency here? Step 6 The ﬁnal phase requires a report of the ﬁndings. In doing so, you must remember that any conclusions drawn must be backed by providing substantial evidence. Write a paragraph advising Archie of the results of this study. Describe the changes you have observed in the shape of the skulls from 4000 BC to AD 150 and indicate which time period you feel his new data most closely matches. Back your recommendation with ample statistical evidence. These types of project are undertaken every day in a variety of situations. It is vital that we realise the importance of reporting statistical information accurately and in an unbiased manner. Math cad Summary statistics Breadth Height Length Breadth Height Length 124 138 101 131 128 98 133 134 97 138 129 107 138 134 98 123 131 101 148 129 104 130 129 105 126 124 95 134 130 93 135 136 98 137 136 106 132 145 100 126 131 100 133 130 102 135 136 97 131 134 96 129 126 91 133 125 94 134 139 101 133 136 103 131 134 90 131 139 98 132 130 104 131 136 99 130 132 93 138 134 98 135 132 98 130 136 104 130 128 101 MQ Maths A Yr 11 - 10 Page 429 Wednesday, July 4, 2001 6:00 PM
50. 50. 430 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Comparing sets of data Back-to-back stem-and-leaf plots Some of the most useful and interesting statistical investigations involve the com- parison of two sets of data. In the previous chapter, we drew stem-and-leaf plots of single sets of data. We shall now consider using such plots to compare two data sets. Back-to-back stem-and-leaf plots are useful to compare the distribution of two similar sets of data. This is particularly useful in the situation of controlled experiments. The two sets of data use the same central stem. One set of leaves is set to the right of the stem and the other to the left. Care must be taken when arranging the data of the left set. Place the smallest numeral closest to the central margin, then range outwards as the data size increases. The key generally relates to data which are presented on the right of the plot. An example of such a plot is presented below. The data show the lifetimes of a sample of 40 batteries of each of two brands when ﬁtted into a standard children’s toy. Some of the toys are ﬁtted with an ordinary brand battery and some with Brand X. Which brand is better? Key: 6 9 = 69 hours The spread of each set of data can be seen graphically from the stem-and-leaf plot. In this case it can be seen that, although Brand X showed a little more variability than the ordinary brand, it generally gave a longer lasting performance. Side-by-side or parallel boxplots Two or more sets of data may be compared by using side-by-side boxplots. The boxplots share a common scale. Numerical comparisons can be made between the sets of data based upon the size and position of the range, interquartile range and median. This is a strong feature of a boxplot. In general, a histogram or stem-and-leaf plot is better than a boxplot at giving the reader information about the distribution of a set of scores (because boxplots do not show individual scores), but boxplots have greater scope for making quantitative com- parisons. In the case of the battery test data above, the following side-by-side boxplot would result. (Quartiles and medians are found in the usual way.) Ordinary brand Brand X Leaf 8 6 2 0 0 9 9 9 8 8 6 4 0 8 8 7 5 3 1 1 1 0 9 6 6 4 2 2 2 0 0 8 7 5 3 1 1 1 4 2 Stem 6 7 8 9 10 11 12 13 14 Leaf 9 3 5 2 4 8 0 1 4 5 5 9 0 0 2 5 8 8 9 9 0 0 1 1 3 3 6 7 9 1 4 6 6 6 7 8 8 3 5 6 MQ Maths A Yr 11 - 10 Page 430 Wednesday, July 4, 2001 6:00 PM
51. 51. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 431 We can make the fol- lowing comparisons be- tween the sets of data: 1. Brand X showed more variability in its performance (that is, its lifetime) than the ordinary brand. (Brand X range = 77, ordinary brand range = 54; Brand X interquartile range = 27.5, ordinary brand interquartile range = 19.) 2. The longest lifetime recorded was that of a Brand X battery (146). 3. The shortest lifetime recorded was that of an ordinary brand battery (60). 4. Brand X batteries’ median lifetime was better than that of the ordinary brand (Brand X median = 109.5, ordinary brand median = 87.5). 5. Over one-quarter of Brand X batteries were better performers than the best ordinary brand battery; that is, had longer lifetimes than the longest of the ordinary brand batteries’ lifetimes. (Remember that the four sections of a boxplot each represent one-quarter of the scores.) 6050 70 80 90 100 110 120 130 140 150 Hours Brand X Ordinary Brand The stem-and-leaf plot below shows the weights of two samples of chickens 3 months after hatching. One group of chickens (Group A) had been given a special growth hormone. The other group (Group B) was kept under identical conditions but was not given the hormone. Prepare side-by-side boxplots of the data and draw conclusions about the effectiveness of the growth hormone. Key: 0* 8 = 0.8 kg 1 3 = 1.3 kg Continued over page Group B Group A Leaf 4 4 4 9 8 8 7 7 5 5 4 4 3 0 0 0 0 0* 1 1* 2 2* Leaf 8 3 5 7 7 9 0 0 0 1 1 3 3 5 8 8 THINK WRITE First locate the medians of each group. There are 16 observations in each group. The median of each group is the th score — that is, the 8.5th score, or between the 8th and 9th scores. The median divides each group into two halves; the quartiles are the medians of the upper and lower halves. There are 8 scores in each half. The position of the quartiles is given by the th score — that is, the 4.5th score, halfway between the 4th and 5th scores in each half. 1 16 1+ 2 --------------- 2 8 1+ 2 ------------ 21WORKEDExample MQ Maths A Yr 11 - 10 Page 431 Wednesday, July 4, 2001 6:00 PM
52. 52. 432 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d The graphics calculator can also be used to display parallel boxplots. THINK WRITE Find the quartiles and medians on the stem-and-leaf plot. Be careful to count from the centre out with each set of data. Write a ﬁve-number summary for each group. Group A: 0.8, 1.7, 2.0, 2.3, 2.8 Group B: 0.4, 0.5, 0.8, 1.0, 1.4 Draw the boxplots using a common scale. Compare the data. Consider central score, highest and lowest scores, variability in scores, etc. • The biggest of all chickens was from Group A (hormone group). • The smallest of all chickens was from Group B (no hormone). • The Group A data showed a little more vari- ability (Group A interquartile range = 0.6, Group B interquartile range = 0.5). • The median size of chickens in Group A was larger (Group A median = 2, Group B median = 0.8). • Over three-quarters of the Group A chickens were bigger than all of the Group B chickens! Conclusion: The growth hormone proved to be effective. 3 Key: 0* 8 = 0.8 kg 1 3 = 1.3 kg Group B Group A Leaf 4 4 4 9 8 8 7 7 5 5 4 4 3 0 0 0 Stem 0 0* 1 1* 2 2* Leaf 8 3 5 7 7 9 0 0 0 1 1 3 3 5 8 8 Q3 Median Q1 Q3 Q1 Median 4 5 Group A Group B 0 1.51.00.5 2.0 2.5 3.0 kg 6 MQ Maths A Yr 11 - 10 Page 432 Wednesday, July 4, 2001 6:00 PM
53. 53. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 433 The four Year 11 Maths A classes at Western Secondary College complete the same end-of-year maths test. The marks, expressed as percentages for each of the students in the four classes, are given below. Display the data using a parallel boxplot and use this to describe any similarities or differences in the distributions of the marks among the four classes. Continued over page 11A 11B 11C 11D 11A 11B 11C 11D 40 60 50 40 63 78 70 69 43 62 51 42 63 82 72 73 45 63 53 43 63 85 73 74 47 64 55 45 68 87 74 75 50 70 57 50 70 89 76 80 52 73 60 53 75 90 80 81 53 74 63 55 80 92 82 82 54 76 65 59 85 95 82 83 57 77 67 60 89 97 85 84 60 77 69 61 90 97 89 90 THINK WRITE/DISPLAY Create the ﬁrst boxplot (for class 11A) on a graphics calculator using [STAT PLOT] and appropriate WINDOW settings. Using to show key values, sketch the ﬁrst boxplot using pen and paper, leaving room for three additional plots. Repeat step 1 for the other three classes. All four boxplots share the common scale. 1 2nd TRACE 2 3 30 40 50 60 70 80 90 100 11D 11C 11B 11A Maths mark (%) 22WORKEDExample MQ Maths A Yr 11 - 10 Page 433 Wednesday, July 4, 2001 6:00 PM
54. 54. 434 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d It is important to remember that, in a boxplot, each section represents one-quarter of the scores. If one section of a boxplot is longer compared with another section of the same boxplot, it can be interpreted that the scores are more spread out in that section — not that the longer section contains a greater number of scores. THINK WRITE Describe the similarities and differences between the four distributions. Class 11B had the highest median mark and the range of the distribution was only 37. The lowest mark in 11B was 60. We notice that the median of 11A’s marks is approximately 60. So, 50% of students in 11A received less than 60. This means that half of 11A had scores that were less than the lowest score in 11B. The range of marks in 11A was about the same as that of 11D with the highest scores in each about equal, and the lowest scores in each about equal. However, the median mark in 11D was higher than the median mark in 11A so, despite a similar range, more students in 11D received a higher mark than in 11A. While 11D had a top score that was higher than that of 11C, the median score in 11C was higher than that of 11D and the bottom 25% of scores in 11D were less than the lowest score in 11C. In summary, 11B did best, followed by 11C then 11D and ﬁnally 11A. 4 remember 1. Back-to-back stem plots are useful for comparing distributions of two similar sets of data. When completing back-to-back stem plots: (a) use a common stem (b) distinguish between the two sets of data by labelling them clearly (c) the key generally relates to data on the right-hand side of the central stem (d) when organising the data to the left of the central stem, the smallest piece of data goes closest to the central stem, then outwards as the data increases. 2. Side-by-side or parallel boxplots: (a) share a common scale (b) allow us to make quantitative comparisons between sets of data, based upon the size and position of the range, quartiles, interquartile range and medians (c) allow us to compare more than two sets of data. remember MQ Maths A Yr 11 - 10 Page 434 Wednesday, July 4, 2001 6:00 PM
55. 55. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 435 Comparing sets of data 1 The boxplots at right show the results of testing two brands of transistor by applying increasing voltage. a Which brand had the transistor which could withstand the highest voltage? b Which brand had the transistor which could withstand the least voltage? c Which brand gave the better median performance? d Which brand showed most variability in terms of range? e Which brand showed most variability in terms of interquartile range? f Which brand would you recommend to a manufacturer of electronic equipment if they requested a transistor that reliabily worked at its expected voltage? g Which brand would you recommend to a manufacturer of electronic equipment if they requested a transistor that was likely to withstand higher voltages? 2 The boxplots at right were drawn by a teacher who was trying to assess the effects of allowing students the privi- lege of an ‘open book’ exam. Plot A gives information about the results of students who were allowed to use their textbook in an end-of-unit test. Plot B gives information about the results of students who did the test without the aid of the textbook. a Which group had the student who gained the best test result? b Which group did best on median result? c Compare the variability in the performance of each group. d Does the use of the textbook get a better test performance? Explain. e What other things need to be taken into account when drawing these conclusions? 3 Draw side-by-side boxplots for the following pair of ﬁve ﬁgure summaries. Group X: 14, 18.5, 21.5, 27.5, 33 Group Y: 11, 17.5, 21, 26.5, 35 4 The following stem-and-leaf plots give the age at marriage of a group of 10 women and a group of 10 men. Key: 1 8 = 18 years old a Draw side-by-side boxplots of the data. b Make comparisons about the distribution of the sets of data. Men Women Leaf 8 7 9 8 7 5 1 6 3 0 Stem 1 2 3 4 Leaf 8 8 0 2 3 4 4 5 0 1 10H 5 10 15 20 25 30 Volts Brand A Brand B Math cad Interpreting boxplots 10 20 30 40 50 60 70 80 90 100 Test result Text No Text Plot A Plot B WORKED Example 21 MQ Maths A Yr 11 - 10 Page 435 Wednesday, July 4, 2001 6:00 PM
56. 56. 436 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 5 The number of words in each of the ﬁrst 12 sentences is counted in each of 3 different types of book: a children’s book, a Year 12 geography text, and a major daily newspaper. The results are as follows: a Draw side-by-side boxplots of the data. b Make comparisons about the sentence length of each type of publication. Use stat- istics in your answer. 6 The stem-and-leaf plot below gives the batting scores of two cricket players — Smith and Jones — who share the responsibility of ‘opening the batting’ for their side. Key: 1 2 = 12 a Derive a ﬁve-number summary for each player. b Draw side-by-side boxplots of the data. c Make comparisons between the two sets of data. Use statistics in your answer. d Which player do you consider to be the best ‘opening bat’ and why? Questions 7 and 8 refer to the following stem-and-leaf plot. Key: 12 2 = 122 7 The lower quartile of Group B is: 8 Which of the following statements is untrue? A Data from Group A show less consistency than the data from Group B. B Data from Group B have a lower interquartile range. C Group B has a greater median. D Group A shows a greater amount of variability. E None of the above. (All of the statements are true.) Children’s book Geography text Newspaper 6 16 12 8 18 6 12 25 8 15 13 14 6 10 18 8 25 7 10 29 12 8 18 10 5 7 21 11 22 17 10 28 16 8 22 8 Jones Smith Leaf 3 8 7 4 2 9 9 8 7 7 5 8 4 4 2 0 5 2 0 1 Stem 0 1 2 3 4 5 6 7 8 Leaf 0 0 1 2 6 9 6 6 8 7 8 8 9 9 0 4 6 2 4 5 Group B Group A Leaf 6 8 5 4 2 2 8 5 5 3 0 0 7 4 4 1 0 1 1 Stem 12 13 14 15 16 17 18 Leaf 2 3 8 0 4 4 6 2 3 5 7 8 2 4 4 5 2 6 1 A 156.5 B 144 C 155 D 152 E none of the above. WORKED Example 22 mmultiple choiceultiple choice mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 436 Wednesday, July 4, 2001 6:00 PM
57. 57. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 437 Questions 9 and 10 refer to the boxplots at right. 9 Which of the following statements is a correct comparison of the data? A Group X has a higher median and shows more variability than Group Y. B Group X has a lower median and shows more variability than Group Y. C Group X has a higher median and shows less variability than Group Y. D Group X has a lower median and shows less variability than Group Y. E It is impossible to make comparisons like this without seeing the data displayed on a stem-and-leaf plot. 10 Which of the following statements is untrue of the boxplots? A One-quarter of all Group X data is greater than any of Group Y data. B The median of Group X is 25. C The interquartile range of Group X is 25. D The range of Group Y is 9. E None of the above. (All the statements are true.) 11 A packing machine is meant to pack sacks of ﬂour in 20.0 kg weights. A quality con- trol manager notices that the machine appears to be too generous in the amount that it is putting into each sack. After checking a sample of sacks for weight he adjusts the machine. After some time he selects a second sample of sacks and checks their weights. The results are detailed on the stem-and-leaf plot below. Key: 20 3 = 20.3 kg 20* 5 = 20.5 kg a Derive a ﬁve-number summary for each set of data. b Draw side-by-side boxplots of the data. c Compare the performance of the machine before and after it was adjusted. Use statistics in your answer. d Should the manager have adjusted the machine? Why (not)? 12 A new spray treatment has been developed to improve the budding of apple trees. Twenty-ﬁve trees are subjected to the spray treatment while 25 others are kept as a control. The number of apples that form on each tree is recorded below. Group A (sprayed) 35 52 71 21 34 42 76 45 48 32 29 85 73 28 34 59 52 56 27 29 33 38 54 42 51 After adjustment Before adjustment Leaf 4 3 9 8 8 7 7 6 5 5 4 3 1 1 0 0 9 8 6 4 3 0 6 6 2 Stem 19 19* 20 20* 21 21* 22 Leaf 3 4 4 5 5 7 8 8 8 9 0 0 1 1 2 3 3 3 4 5 5 6 7 7 8 15 20 25 30 35 40 Scale Group X Group Ymmultiple choiceultiple choice mmultiple choiceultiple choice MQ Maths A Yr 11 - 10 Page 437 Wednesday, July 4, 2001 6:00 PM
58. 58. 438 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d Group B (unsprayed) 44 42 55 41 39 68 63 62 58 51 43 47 49 45 40 52 56 50 71 39 35 37 38 52 58 a Detail the data on a back-to-back stem-and-leaf plot. Use a class size of 10. b Prepare side-by-side boxplots of the data. c Make comparisons of the data. Use statistics in your answer. d Comment on the overall effect of the spray. Would you recommend that orchard- ists use the spray? 13 Twenty different ﬂashlight bulbs of each of two brands are tested until they burn out. The lifetime of each (in hours) is recorded below. Glow-worm 23 45 31 38 39 41 48 47 54 23 28 35 42 49 50 41 52 48 27 35 Starlet 28 16 24 36 47 18 59 32 64 68 72 35 46 72 54 31 29 36 55 43 a Detail the data on a back-to-back stem-and-leaf plot. Use a class size of 5. b Prepare side-by-side boxplots of the data. c Make comparisons of the data. Use statistics in your answer. d Which brand would you recommend as the better? Why? Drug test analysis A new drug for the relief of cold symptoms has been developed. To test the drug, 40 people were exposed to a cold virus. Twenty patients were then given a dose of the drug while another 20 patients were given a placebo. (In medical tests a control group is often given a ‘placebo’ drug. The subjects in this group believe that they have been given the real drug but in fact their dose contains no drug at all.) All participants were then asked to indicate the time when they ﬁrst felt relief of symptoms. The number of hours from the time the dose was administered to the time when the patients ﬁrst felt relief of symptoms are detailed below. 1 Detail the data on a back-to-back stem-and-leaf plot. 2 Prepare side-by-side boxplots of the data. 3 Make comparisons of the data. Use statistics in your answer. 4 Does the drug work? Justify your answer. 5 What other considerations should be taken into account when trying to draw conclusions from an experiment of this type? inv estigat ioninv estigat ion Group A (drug) 25 42 29 38 32 44 45 42 18 35 21 47 37 62 42 17 62 34 13 32 Group B (placebo) 25 34 17 32 35 25 42 18 35 22 28 28 20 21 32 24 38 32 35 36 MQ Maths A Yr 11 - 10 Page 438 Wednesday, July 4, 2001 6:00 PM
59. 59. C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 439 Using a spreadsheet or graphics calculator to obtain summary statistics Task 1 Using a spreadsheet Consider the data below for the average daily sales for the fast food outlets McDonald’s, KFC and Pizza Hut. 1 Set up the spreadsheet as indicated, entering the sales ﬁgures as numeric values, then formatting them to ‘currency’ with zero decimal places. 2 The Excel formula that calculates the mean is the average formula. Its format is =AVERAGE(range). In cell B12 enter the formula =AVERAGE(B4:B10) to calculate the mean daily sales for McDonald’s. 3 Copy this formula across to cells C12 and D12 to calculate the mean daily sales for KFC and Pizza Hut. 4 The formula for standard deviation is =STDEV(range). Enter the appropriate formula into cell B13, then copy it across to cells C13 and D13. 5 Consider the mean and standard deviation values for the three companies. Whose sales are the best? Which company experiences the most consistent sales throughout the week? inv estigat ioninv estigat ion MQ Maths A Yr 11 - 10 Page 439 Wednesday, July 4, 2001 6:00 PM
60. 60. 440 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d 6 Select the range A4 to D10. Enter the Sort facility of the Data option and sort the data in Column B (McDonald’s data) into ascending order. From this sorted list, determine the ﬁve-number summary data for McDonald’s. 7 Sort the data in KFC sales and determine the ﬁve-number summary ﬁgures for KFC. 8 Similarly, determine the ﬁve-number summary ﬁgures for Pizza Hut. 9 The spreadsheet does not offer the facility of graphing a boxplot. From your data collected above, draw parallel boxplots on the same scale, then compare the performances of the three companies. 10 Write a paragraph reporting the results of your ﬁndings. Support your conclusions by speciﬁc reference to your spreadsheet and boxplots. Task 2 Using a graphics calculator 1 Using the average daily sales ﬁgures for McDonald’s, KFC and Pizza Hut indicated in the spreadsheet in Task 1, enter these ﬁgures into your graphics calculator as three separate lists, L1, L2 and L3. 2 Use the statistical function on the calculator to determine the following for each set of data: a mean b standard deviation c ﬁve-number summary data. 3 Use the statistical plotting function on your calculator to draw parallel boxplots of the three sets of data. 4 Compare these results with those you obtained on the spreadsheet. Concluding the Egyptian skulls study In a previous investigation, you made recommendations to Archie regarding the male Egyptian skulls he discovered at a new site. Your conclusions were based on numerical calculations of mean, median, mode, standard deviation and range. It is now appropriate to check whether the conclusions drawn at that stage would be consistent with those which might be made on the basis of graphical comparisons. We will now compare ﬁve-number summaries of the new site data with those of the 4000 BC and AD 150 measurements. Values for the two known eras are shown in the following table. The ﬁve-number values have also been included for the breadth of the skulls at the New site. 1 Consult the previous investigation, which tabled data from the New site. Copy and complete the table. inv estigat ioninv estigat ion MQ Maths A Yr 11 - 10 Page 440 Wednesday, July 4, 2001 6:00 PM