Maths A - Chapter 10

10syllabussyllabusrrefefererenceence
Strand:
Statistics and probability
Core topic:
Data collection and
presentation
In thisIn this chachapterpter
10A Calculating and
interpreting the mean
10B Mean, from frequency
distribution tables
10C Mean, from grouped data
10D Median and mode
10E Best summary statistics
10F Range and interquartile
range
10G Standard deviation
10H Comparing sets of data
Describing,
exploring and
comparing data
MQ Maths A Yr 11 - 10 Page 381 Wednesday, July 4, 2001 5:58 PM

382 M a t h s Q u e s t M a t h s A Ye a r 1 1 f o r Q u e e n s l a n d
Introduction
Archie is an archeologist. He is
passionate about his job, which
involves digging for buried artefacts,
classifying his findings and piecing
them together to unravel and record the
history of past civilizations. Imagine
his excitement when he uncovered a
site of buried skulls in Egypt!
Further investigation confirmed that
these were male skulls which had orig-
inated from a race residing in Egypt.
He was keen to place their existence in
time. Delving into existing records, he
uncovered measurements on male
Egyptian skulls recorded for two time
periods – one around 4000 BC and the
other around AD 150.
These measurements confirmed a
change in skull shape over the time
period and this was taken as evidence
of interbreeding of the Egyptians with
migrant populations over the years. If
Archie compared the measurements on
record with those he made on his
recently excavated skulls, he could
possibly identify a time in history
when this race existed.
The measurements of male Egyptian
skulls on record for 4000 BC and AD
150 were:
1. breadth of skull
2. height of skull and
3. length of skull.
The recorded data for the measurements (in mm) of 30 male Egyptian skulls are
collated in the table on the following page.
Where should Archie start? Statistical techniques enable us to summarise sets of data,
which can then be compared. If Archie can summarise these two data sets, he could
then compare them with his own measurements.
In this chapter, we shall investigate the main methods available to describe data sets
such as these. These methods employ measures of central tendency, in particular the
mean, median and mode. We shall also examine the range and interquartile range, the
standard deviation, and stem plots and boxplots. We shall then see how these measures
can be used to compare sets of data.
In the previous chapter we investigated boxplots as a tool for comparing data sets.
We now explore this tool further, endeavouring to place Archie’s skull at some period
in history. Combining this with other statistical tools may enable us to provide a
solution for Archie.
Height
Breadth
Length

C h a p t e r 1 0 D e s c r i b i n g , e x p l o r i n g a n d c o m p a r i n g d a t a 383
4000 BC AD 150
Breadth Height Length Breadth Height Length
131 138 89 137 123 91
125 131 92 136 131 95
131 132 99 128 126 91
119 132 96 130 134 92
136 143 100 138 127 86
138 137 89 126 138 101
139 130 108 136 138 97
125 136 93 126 126 92
131 134 102 132 132 99
134 134 99 139 135 92
129 138 95 143 120 95
134 121 95 141 136 101
126 129 109 135 135 95
132 136 100 137 134 93
141 140 100 142 135 96
131 134 97 139 134 95
135 137 103 138 125 99
132 133 93 137 135 96
129 136 96 133 125 92
132 131 101 145 129 89
126 133 102 138 136 92
135 135 103 131 129 97
134 124 93 143 126 88
128 134 103 134 124 91
130 130 104 132 127 97
138 135 100 137 125 85
128 132 93 129 128 81
127 129 106 140 135 103
131 136 114 147 129 87
124 138 101 136 133 97

You may not be familiar with some of the following statistical terms. We shall investigate
them further, in this chapter.
1 A set of test results is shown below.
8, 3, 6, 4, 5, 4, 9, 7, 4, 6, 5
a Arrange the scores in ascending order.
b How many scores are in the set?
c In what position does the middle score lie?
d What is the value of the middle score (the median)?
e What is the range of the data?
f Calculate the average (mean).
g How many scores are below the mean? How many above?
h Give the most frequently occurring score (mode) of the set of data.
i Comment on any difference in value between the mean, median and mode.
j Determine values for the lower and upper quartiles.
2 The mean, median and mode are measures of ‘central tendency’. Explain what this
term ‘central tendency’ means.
3 The spread of the scores can be determined using a number of statistical measures.
Name some measures of ‘spread’ with which you are familiar.
4 What is the relationship between the median and the quartiles?
5 In a boxplot, which of the following are true?
a The quartiles divide the data into four sections of equal length.
b The median is the score with an equal number of data values above it and below it.
c If the ‘whiskers’ are longer than the ‘box’, it means that there are more scores in the
whiskers than there are in the box.
d The whole ‘box’ contains the same number of scores as the two ‘whiskers’ together.
e It is possible to calculate the mean of the set of data by observing the values in the
boxplot.
6 For those statements in question 5 that are incorrect, explain why this is so. Adjust the
statements to make them correct.
Calculating the mean
If you were to survey a group of people about what they believe is meant by the word
‘average’, you would find a variety of answers.
When looking at a set of statistics we are often asked for the average. The average is a
figure that describes a typical score. In statistics, the correct term for the average is the
mean. The mean is the first of three measures of central tendency that we shall be
studying. The others are the median and the mode.
The statistical symbol for the mean is x–. The formula for the mean is
x– =
x∑
n
--------

In mathematics, the symbol Σ (sigma) means sum or total, x represents each individual
score in a list and Σ x is therefore the sum of the scores. The sum is divided by n, which
represents the number of scores.
A graphics calculator can be used to calculate and display many statistical functions.
There are several brands of graphics calculator, but the Texas Instrument T83 will be
the model referred to in illustrations. Other brands of calculator allow calculations and
displays with similar instructions. Many of the exercises lend themselves to either
manual working or graphics calculator use.
Find the mean of the scores 17, 16, 13, 15, 16, 20, 10, 15.
THINK WRITE
Find the total of all scores. Total = 17 + 16 + 13 + 15 + 16 + 20 + 10 + 15
Σx = 122
Divide the total by 8 (the number of
scores).
Mean =
x– = 15.25
1
2
122
8
---------
Σx
n
------
 
 
1WORKEDExample
Calculate the mean of the set of data below, using a graphics calculator.
10, 12, 15, 16, 18, 19, 22, 25, 27, 29
THINK WRITE/DISPLAY
Enter the data in L1.
(Press , select 1:Edit... and press
to access the screen.)
Calculate the mean.
(a) Press .
(b) Highlight CALC in the top line.
(c) Highlight 1:1–Var Stats and press .
(d) Type L1 and press .
(e) A number of values are given. The top
entry = 19.3 gives us the mean.
= 19.3
1
STAT
ENTER
2
STAT
ENTER
ENTER
x
x
2WORKEDExample

Interpreting the mean
When we use the mean, we are attempting to represent the central value of the data.
Let us investigate what affects its value. Consider five scores: 1, 2, 3, 4 and 5. The
value of the mean is the total (15), divided by the number of scores (5). The answer, 3,
clearly lies in the centre. What would be the value of the mean if the last score had been
20 instead of 5? The answer of 6, where only one score lies above the mean and four lie
below it, clearly demonstrates the influence of extreme values on the mean. Since the
calculation takes into account the values of all scores, a check must be applied to deter-
mine whether the resulting value is a reasonable representation of the centre of the data.
Calculating and interpreting
the mean
Use a graphics calculator or manual working for the following.
1 Copy and complete the following:
Another word commonly used for ‘mean’ is __________. The mean is calculated by
finding the __________ of the scores, then dividing by the __________ of scores. The
mean is a measure of __________ tendency. Two other measures are __________ and
__________.
2 Calculate the mean of each of the following sets of scores.
a 4, 8, 3, 5, 5
b 16, 24, 30, 35, 23, 11, 45, 28
c 65, 92, 56, 84
d 9.2, 9.7, 8.8, 8.1, 5.6, 7.5, 8.5, 6.4, 7.0, 6.4
e 356, 457, 182, 316, 432, 611, 299, 355
3 Majid sits for five tests in mathematics. His percentages on the tests were 45%, 90%,
67%, 86% and 75%. Calculate Majid’s mean percentage on the five tests. How many of
his percentages were above the mean, and how many below?
remember
1. The mean is the statistical term for ‘average’.
2. The mean is calculated by adding all scores then dividing by the number of
scores. That is,
x– =
3. As a measure of central tendency, the mean represents a value for the ‘centre’
of the scores.
4. Check to determine the number of scores above and below the mean.
5. The value of the mean is affected by extremes in scores.
6. Remember to include correct units in your final answer.
x∑
n
--------
remember
10A
WORKED
Example
1
WORKED
Example
2
SkillS
HEET 10.1

4 An oil company surveys the price of petrol in eight Brisbane suburbs. The results are
listed below.
Manly 76.9 c/L Kenmore 72.9 c/L
Bardon 73.4 c/L Nundah 70.9 c/L
Springwood 72.3 c/L Mansﬁeld 75.8 c/L
Oxley 73.9 c/L Boondall 71.1 c/L
Based on these results, calculate the mean price of petrol in cents per litre in Brisbane.
Is this mean a realistic representation of the central value? Explain.
5 The seven players on a netball team have the following heights: 1.65 m, 1.81 m,
1.75 m, 1.78 m, 1.88 m, 1.92 m and 1.86 m. Calculate the mean height of the players
on this team, correct to 2 decimal places. How many of the players have heights above
the mean height?
6 A golf ball manufacturer randomly tests the mass of 10 golf balls from a batch. The
batch will be considered satisfactory if the average mass of the balls is between 44.8 g
and 45.2 g. The masses, in grams, of those tested are:
45.19, 45.06, 45.35, 44.78, 45.47, 44.68, 44.95, 45.32, 44.60, 44.95.
a Will the batch be passed as satisfactory?
b Which ball has a mass which is furthest from the mean — the lightest one or the
heaviest one?
7 Consider the ﬁve values, 1, 2, 3, 4 and 5. The mean is calculated as 3.
a What happens to the value of the mean if 10 is added to each score?
b What effect does multiplying each score by 10 have on the mean’s value?
Means of skull measurements
Refer to the table of skull measurements for 4000 BC and AD 150 displayed earlier
in the chapter.
1 Using the breadth, height and length measurements (in mm) for 4000 BC,
calculate the mean for each set of data.
2 Draw up the table shown below and include the means calculated above. The
means for the corresponding measurements for AD 150 have been included for
comparison.
3 Note the difference between the means for the 4000 BC measurements and the
corresponding ones for AD 150. Do you notice a trend?
4 Examine the breadth data for 4000 BC. How many scores are above the mean?
How many scores are below the mean?
5 Examine the height and length data sets for 4000 BC and determine the number
of scores above and below the mean in each set.
6 In your opinion, does the mean appear to represent a value close to the centre of
each data set?
inv
estigat
ioninv
estigat
ion
4000 BC AD 150
Mean 136.2 130.3 93.5

Frequency distribution tables
In the last section, we dealt with easily manageable quantities of data. However, more
commonly we are confronted with the task of processing much larger data sets. Making
sense of large quantities of data is best achieved by using a frequency distribution
table. The headings for this table are Score (x), Tally (optional), Frequency ( f ) and a
fourth column, ( fx), which contains the score (x) multiplied by the frequency ( f ). The
total of this fourth column indicates the total of all the scores. The mean is then calcu-
lated by dividing this total of all scores by the sum of the frequency column (which
represents the total number of scores). Written as a formula, this is:
x– =
fx∑
f∑
-----------
Complete the frequency table below, then calculate the mean.
Score (x) Tally Frequency (f ) fx
4 | | |
5 | | | | | |
6 | | | | | | | | |
7 | | | | | | | | | | |
8 | | | | | | | |
9 | | | | |
Σ f = Σ fx =
THINK WRITE
Complete the frequency
column from the tally
column.
Complete the fx column by
multiplying each score by the
frequency.
Sum the frequency column.
Sum the fx column.
Use the formula to calculate
the mean.
x– =
x =
x = 6.76
1
2
3
4
Score
(x) Tally
Frequency
(f ) fx
4 | | | 3 12
5 | | | | | | 7 35
6 | | | | | | | | | 11 66
7 | | | | | | | | | | | 13 91
8 | | | | | | | | 10 80
9 | | | | | 6 54
Σ f = 50 Σ fx = 338
5
fx∑
f∑
-----------
338
50
---------
3WORKEDExample

To enlist the aid of a graphics calculator in determining the mean in worked example 3:
1. Enter the data.
(a) To clear any previous equations
press and clear any functions.
(b) Press , select 1:EDIT and
press .
(c) Enter the scores in L1 and the
frequencies in L2.
2. Set up the calculator to calculate the mean.
(a) Press , select CALC, then
the 1-Var Stats option. Type L1 and
L2 separated by a comma.
(b) Press to display the number
of statistical measures.
(c) Amongst other statistical data you can
read off the number of scores, the sum
of the scores and the mean.
GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the mean
Y=
STAT
ENTER
STAT
ENTER
remember
1. The mean for a large number of scores is generally calculated from a frequency
distribution table. A graphics calculator can also be used.
2. The formula for the mean is
x– =
fx∑
f∑
-----------
remember

Mean, from frequency
distribution tables
1 Using our skull measurements for breadth for 4000 BC, draw up a frequency
distribution table as shown below. The tallies for each score have been included. Copy
and complete the frequency ( f ) column and the ( fx) column; total the last two
columns; then calculate the mean. Notice that its value is the same as that calculated
before, using the individual scores.
a
Same value Same value
as n. as total
of scores.
b x– =
= ———?
Breadth (x) Tally Frequency ( f ) fx
119 |
124 |
125 | |
126 | |
127 |
128 | |
129 |
130 |
131 | | | |
132 | | |
134 | | |
135 | |
136 |
138 | |
139 | |
141 |
Σf = Σfx =
10B
WORKED
Example
3
EXCE
L Spreadshe
et
Mean
fx∑
f∑
-----------

2 A class’s marks (out of 10) on a spelling test are recorded in the frequency table below.
a Copy and complete the table.
b Use the formula to calculate the class’s mean.
c How many scores are greater than the mean?
3 An electrical store records the number of television sets sold each week over a year.
The results are shown in the table below.
a Copy and complete the table.
b Calculate the mean number of television sets sold each week over the year. Give
your answer correct to one decimal place.
Score (x) Tally Frequency ( f ) fx
4 | |
5 | | | |
6 | | | |
7 | | | | | | | |
8 | | |
9 | | | |
10 | |
Σ f = Σ fx =
No. of
television
sets sold (x)
No. of
weeks ( f ) fx
16 4
17 4
18 3
19 6
20 7
21 12
22 8
23 2
24 4
25 2
Σ f = Σ fx =
Mean
fx∑
f∑
-----------=
E
XCEL Spread
sheet
Mean
DIY

4 In a soccer season a team played 50 matches. The number of goals scored in each
match is shown in the table below.
a Redraw this table in the form of a frequency distribution table.
b Use your table to calculate the mean number of goals scored each game.
c By calculating the number of scores below and above the mean, decide whether its
value is suitable as a measure of central tendency. Justify your decision.
5 A clothing store records the dress sizes sold during a day. The results are shown
below.
12 14 10 12 8 12 16 10 8 12
10 12 18 10 12 14 16 10 12 12
12 14 18 10 14 12 12 14 14 10
a Present this information in a frequency table.
b Calculate the mean dress size sold this day.
c Comment on your answer.
6
There are eight players in a Rugby forward pack. The mean mass of the players is
104 kg. The total mass of the forward pack is:
7
A small business employs ﬁve people on a mean wage of $380 per week. A manager is
then employed and receives $500 per week. What is the mean wage of the six
employees?
8
The mean height of ﬁve starting players in a basketball match is 1.82 m. During a time
out, a player who is 1.78 m tall is replaced by a player 1.88 m tall. What is the mean
height of the players after the replacement has been made?
Grouping data and using grouped data
In some cases, the range of data values is so great that grouping the data into classes
makes the data more manageable. For example, consider the following data set of
people with ages ranging from 25 to 49. We might group the ages in intervals of 5 in
the form 25–29, 30–34 etc. This means that all the values (25, 26, 27, 28 and 29) would
be grouped in one class. The centre of this class would be 27, and this is the value used
No. of goals 0 1 2 3 4 5
No. of matches 4 9 18 10 5 4
A 13 kg B 104 kg C 112 kg D 832 kg
A $380 B $400 C $480 D $2400
A 1.78 m B 1.82 m C 1.84 m D 1.88 m
Mat
hcad
Mean
mmultiple choiceultiple choice

as the score (x). This class centre is then multiplied by the frequency, ( f ). In this case,
the value obtained for the mean is an estimate rather than an exact value. Sometimes
the choice of the size of the class intervals also has an effect on the accuracy of the
mean.
Complete the frequency distribution table and use it to estimate the mean of the
distribution.
Class Class centre (x) Tally Frequency (f ) fx
25–29 | | | |
30–34 | | | | | | | |
35–39 | | | | | | | | | | |
40–44 | | | | | | | | | |
45–49 | | | | | |
Σ f = Σ fx =
THINK WRITE
Calculate the class
centres.
Complete the frequency
column from the tally
column.
Multiply each class
centre by the frequency
to complete the fx
column.
Sum the frequency
column.
Sum the fx column.
Use the formula to
calculate the mean.
x– =
x =
x = 38
1
2
3
Class
Class
centre
(x) Tally
Frequency
(f ) fx
25–29 27 | | | | 4 108
30–34 32 | | | | | | | | 9 288
35–39 37 | | | | | | | | | | | 13 481
40–44 42 | | | | | | | | | | 12 504
45–49 47 | | | | | | 7 329
Σ f = 45 Σ fx = 1710
4
5
6
fx∑
f∑
-----------
1710
45
------------
4WORKEDExample

A graphics calculator can be used to calculate the mean from a grouped data frequency
distribution. In such cases, the class centre can be entered as L1 and the frequency as L2.
Remember to set up the 1-Var Stats to recognise these two lists (Xlist: L1 and Freq: L2).
Mean, from grouped data
1 a Using our skull measurements for breadth for 4000 BC (shown previously as an
ungrouped frequency distribution), draw up the table below, using class intervals
119–121, 122–124 etc. Complete the columns and calculate the mean.
b Does the mean differ from the two previous calculations? Explain any difference.
Compare with Σfx from exercise 10B, question 1.
x– = = __________?
Class Class centre
(x)
Tally Frequency
(f)
fx
119–121 120
122–124
125–127
128–130
131–133
134–136
137–139
140–142
Σf = Σfx =
GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the mean
from grouped data
remember
1. The mean is the statistical term for average.
2. The mean is calculated by adding all scores then dividing by the number of
scores.
3. When calculating the mean from a frequency distribution table, a column for
frequency × score ( fx) is added. The mean is then calculated using the formula
x– = .
4. If the frequency distribution uses grouped data, the fx column is calculated
using class centres for the x-value.
5. The mean can also be calculated using a graphics calculator.
fx∑
f∑
-----------
remember
10C
WORKED
Example
4
fx∑
f∑
-----------

2 The table below shows a set of class marks on a test out of 100.
a Copy and complete the frequency distribution table.
b Use the table to calculate the mean class mark.
c In which class interval does the mean lie?
3 In the heats of the 100-m freestyle at a swimming meet, the times of the swimmers
were recorded in the table below.
b Use the table to calculate the mean time.
c How many swimmers swam faster than this mean time?
4 A cricketer played 50 innings in test cricket for
the following scores.
23 65 8 112 54 0 84 12 21 4
25 105 74 40 1 15 33 45 21 47
16 70 22 33 21 8 34 36 5 7
69 104 57 78 158 0 51 16 6 16
0 49 0 14 28 52 21 3 3 7
a Put the above information into a frequency
distribution table using appropriate groupings.
b Use the table to estimate the batting average
for this player.
c Repeat the exercise using a different size class interval. Compare your answers.
Class
Class centre
(x) Tally
Frequency
(f ) fx
31–40 |
41–50 | | |
51–60 | | | |
61–70 | | | | | |
71–80 | | | | | | | | |
81–90 | |
91–100 | |
Σ f = Σ fx =
Time Class centre (x) No. of swimmers ( f ) fx
50.01–51.00 4
51.01–52.00 12
52.01–53.00 23
53.01–54.00 38
54.01–55.00 15
55.01–56.00 3
Σ f = Σ fx =

5 Use the statistics function on your calculator to find the mean of each of the following
scores, correct to 1 decimal place.
a 11, 15, 13, 12, 21, 19, 8, 14
b 2.8, 2.3, 3.6, 2.9, 4.5, 4.2
c 41, 41, 41, 42, 43, 45, 45, 45, 45, 46, 49, 50
6 Use your calculator to find the mean from each of the following tables.
7 The table below shows the heights of a group of people.
Calculate the mean of this distribution.
8 Seventy students were timed on a 100-m sprint during their P.E. class. The results are
shown in the table below.
a Calculate the class centre for each group in the distribution.
b Use your calculator to find the mean of the distribution.
a Score Frequency b Score Frequency
3 7 28 5
4 10 29 18
5 18 30 25
6 19 31 25
7 38 32 14
8 27 33 10
9 10 34 3
10 5
Height Class centre Frequency
150–154 152 7
155–159 157 14
160–164 162 13
165–169 167 23
170–174 172 24
175–179 177 12
Time (s) 12 to <13 13 to <14 14 to <15 15 to <16 16 to <17
Number 13 17 25 15 10

9 A drink machine is installed near a quiet beach. The number of cans sold over the first
10 weeks after its installation is shown below.
4 39 31 31 50 43 70 45 57 71 18 26 3 52
51 59 33 51 27 62 30 90 3 30 97 59 33 44
99 62 72 6 42 83 19 49 11 6 63 4 53 20
45 58 1 9 79 41 2 33 97 71 52 97 69 83
39 84 92 43 71 98 8 97 18 89 21 9 4 17
a Put this information into a frequency distribution table using the classes 1–10,
11–20, 21–30 etc.
b Calculate the mean number of cans sold per day over these 10 weeks.
c Using the raw data above, calculate the number of days on which the sales were
greater than the mean.
Median and mode
So far we have used the mean as a measure of the typical score in a data set. Consider
the case of someone who is analysing the typical house price in an area. On a particular
day, five houses are sold in the area for the following prices:
$175 000 $149 000 $160 000 $211 000 $850 000
For these five houses the mean price is $309 000. The mean is much greater than most
of the houses in the data set. This is because there is one score which is much greater
than all the others. For such data sets, we need to use a different measure of central
tendency.
Median
The median is the middle score in a data set (of n scores), when all scores are arranged
in order. If the data set consists of an odd number of scores, there is one score which
lies exactly in the middle. For a data set consisting of an even number of scores, the
median will always occur half way between two scores.
Work
SHEET 10.1

Using single scores
The position of the median can be found using the formula:
Median position = th score
The median becomes more complicated when there is an even number of scores
because there are two scores in the middle. When there is an even number of scores, the
median is the average of the two middle scores.
Median from a frequency distribution table of ungrouped data
The median can be calculated from a frequency distribution table if we extend the table
by adding a cumulative frequency column. This column ‘cumulates’ or totals the fre-
quencies as we descend the rows. It is then possible to determine which scores are in
each position. Consider the frequency distribution table following.
n 1+
2
------------
For the scores 3, 4, 8, 2, 2, 6, 9, 1, 6 calculate the median.
THINK WRITE
Rewrite the scores in ascending order.
There are 9 scores here.
1, 2, 2, 3, 4, 6, 6, 8, 9
The median is the middle score, that is,
the th score.
Median = th score
Median = 5th score
Median = 4
1
2
n 1+
2
------------
9 1+
2
------------
5WORKEDExample
Find the median of the scores 13, 13, 16, 12, 19, 18, 20, 18.
THINK WRITE
Write the scores in ascending order. 12, 13, 13, 16, 18, 18, 19, 20
There is an even number (8) scores, so
average the two middle scores.
Median = th score.
Median = th score
= 4.5th score
that is, half way between 4th and 5th score. The
4th score is 16. The 5th score is 18.
Median =
Median = 17
1
2
n 1+
2
------------
8 1+
2
------------
16 18+
2
------------------
6WORKEDExample

There are 30 scores in this distribution and so the middle two scores will be the 15th
and 16th scores. By looking down the cumulative frequency column we can see that
these scores are both 6. Therefore, 6 is the median of this distribution.
Score Frequency
Cumulative
frequency
4 1 1 The 1st score is 4.
5 6 7 The 2nd–7th scores are 5.
6 9 16 The 8th–16th scores are 6.
9 2 30 The 29th and 30th scores are 9.
Find the median for the frequency distribution
at right.
THINK WRITE
Redraw the frequency table with a
cumulative frequency column.
There are 45 scores and so the middle
score is the 23rd score.
Median = score
Median = score
Median = 23rd score
Median = 36
Look down the cumulative frequency
column to see that the 23rd score is 36.
Score Frequency
34 3
35 8
36 12
37 9
38 8
39 5
1
Score Frequency
Cumulative
frequency
34 3 3
35 8 (3 + 8) 11
36 12 (11 + 12) 23
37 9 (23 + 9) 32
38 8 (32 + 8) 40
39 5 (40 + 5) 45
2
n 1+
2
------------
45 1+
2
---------------3
7WORKEDExample

Mode
There are many examples where neither the mean nor the median is the appropriate
measure of the typical score in a data set.
Using single scores
Consider the case of a clothing store. It needs to re-order a supply of dresses. To know
what sizes to order it looks at past sales of this particular style and gathers the
following data:
8 12 14 12 16 10 12 14 16 18
14 12 14 12 12 8 18 16 12 14
For this data set the mean dress size is 13.2. Dresses are not sold in size 13.2, so this
has very little meaning. The median is 13, which also has little meaning as dresses are
sold only in even-numbered sizes.
What is most important to the clothing store is the dress size that sells the most. In
this case size 12 occurs most frequently. The score that has the highest frequency is
called the mode.
When two scores share the ‘highest’ frequency, that is, occur an equal number of times,
both scores are given as the mode. In this situation the scores are bimodal. If all scores
occur an equal number of times, then the distribution has no mode.
Mode from a frequency distribution table
To ﬁnd the mode from a frequency distribution table, we simply give the score that has
the highest frequency.
Find the mode of the scores below.
4, 5, 9, 4, 6, 8, 4, 8, 7, 6, 5, 4.
THINK WRITE
The score 4 occurs most often and so it is the mode. Mode = 4
8WORKEDExample
For the frequency distribution at right state
the mode.
THINK WRITE
The highest frequency is 14 which belongs
to the score 17 and so 17 is the mode.
Mode = 17
Score Frequency
14 3
15 6
16 11
17 14
18 10
19 7
9WORKEDExample

When a table is presented using grouped data, we do not have a single mode. In these
cases, the class with the highest frequency is called the modal class.
Median and mode
The median score is the __________ one, when the scores are __________
__________. The formula for the position of the median score is __________. For an
even number of scores, it is the __________ of the two middle ones. When using a
frequency distribution table, the median is obtained from the __________
__________ column.
2 The scores of seven people on a spelling test are given below.
5 6 5 8 5 9 8
Calculate the median of these marks.
3 Below are the scores of eight people who played a round of golf.
75 80 81 76 84 83 81 82
Calculate the median for this set of scores.
4 Find the median for each of the following sets
of scores.
a 3, 4, 5, 5, 5, 6, 9
b 5.6, 5.2, 5.4, 5.3, 5.8, 5.4, 5.3, 5.4
c 45, 62, 39, 88, 75
d 102, 99, 106, 108, 101, 103, 102, 105, 102, 101
5 A factory has 80 employees. Over a two-week period
the number of people absent from work each day
was recorded and the results are shown below.
3, 1, 5, 4, 3, 25, 4, 2, 4, 5
a Calculate the median number of people
absent from work each day.
b Calculate the mean number of people
absent from work each day.
c Does the mean or the median give a better measure
of the typical number of people absent from work each day?
Explain your answer.
remember
1. The median is the middle score in a data set or the average of the two middle
scores. The scores must be arranged in order.
2. The median can be found using the cumulative frequency column of a
frequency table.
3. The mode is the score that occurs the most.
4. Remember to include units in the ﬁnal answer.
remember
10D
WORKED
Example
5
EXCE
L Spreadshe
et
Median
WORKED
Example
6
SkillS
HEET 10.2
EXCE
L Spreadshe
et
Median
DIY
MQ Maths A Yr 11 - 10 Page 401 Friday, July 6, 2001 2:28 PM

6 The table below shows the number of cans of drink sold from a vending machine at a
high school each day.
7 The table at right shows the number of
accidents a tow truck attends each day
over a three-week period.
Calculate the median number of accidents
attended by the tow truck each day.
8 The table at right shows the number of errors
made by a machine each day over a 50-day
period. Calculate the median number of errors
made by the machine each day.
9
There are 25 scores in a distribution. The median score will be the:
A 12th score
B 12.5th score
C 13th score
D average of the 12th and 13th scores.
Score Frequency
Cumulative
frequency
17 4
18 9
19 6
20 12
21 8
22 5
23 4
24 2
WORKED
Example
7
a Copy and complete the
frequency distribution
table.
b Use the table to calculate
the median number of
cans of drink sold each
day from the vending
machine.
No. of
accidents No. of days
2 4
3 12
4 3
5 1
6 1
No. of errors
per day Frequency
0 9
1 18
2 13
3 6
4 3
5 1

10
For the scores 4, 5, 5, 6, 7, 7, 9, 10 the median is:
11
Consider the frequency table at right.
The median of these scores is:
12 The table below shows the number of sick days taken by each worker in a small busi-
ness.
b Calculate the median class for this distribution.
The mode is the __________ __________ score. If two scores occur most frequently
an equal number of times, we have two modes, and this is termed a __________ dis-
tribution. In a frequency distribution table of grouped data, we generally do not
attempt to ﬁnd a single mode, but give the __________ __________.
14 For each of the following sets of scores ﬁnd the mode.
a 2, 5, 3, 4, 5
b 8, 10, 7, 10, 9, 8, 8
c 11, 12, 11, 15, 14, 13
d 0.5, 0.4, 0.6, 0.3, 0.2, 0.4, 0.6, 0.9, 0.4
e 110, 113, 100, 112, 110, 113, 110
15 Find the mode for each of the following. (Hint: Some are bimodal and others have no
mode.)
a 16, 17, 19, 15, 17, 19, 14, 16, 17
b 147, 151, 148, 150, 148, 152, 151
c 2, 3, 1, 9, 7, 6, 8
d 68, 72, 73, 72, 72, 71, 72, 68, 71, 68
e 2.6, 2.5, 2.9, 2.6, 2.4, 2.4, 2.3, 2.5, 2.6
A 5 B 6 C 6.5 D 7
A 2
B 3
C 8
D 13
Days sickness Frequency
Cumulative
frequency
0–4 10
5–9 12
10–14 7
15–19 6
20–24 5
25–29 3
30–34 2
mmultiple choiceultiple choice Score Frequency
1 12
2 13
3 8
4 7
5 5
E
XCEL Spread
sheet
Mode
WORKED
Example
8
E
XCEL Spread
sheet
Mode
DiY

16 Use the tables below to state the mode of the distribution.
17 Use the frequency histogram below to state the mode of the distribution.
18 For each of the following grouped distributions, state the modal class.
19 The weekly wage (in dollars) of 40 people is shown below.
376 592 299 501 375 366 204 359 382 274
223 295 232 325 311 513 348 235 329 203
556 419 226 494 205 307 417 204 528 487
543 532 435 415 540 260 318 593 592 393
a Use the classes $200–$249, $250–$299, $300–$349 etc. to display the information
in a frequency distribution table.
b From your table, calculate the median class.
WORKED
Example
9
a b cScore Frequency
1 2
2 4
3 5
4 6
5 3
Score Frequency
5 1
6 3
7 5
8 8
9 5
10 3
Score Frequency
38 2
39 4
40 1
41 5
42 6
43 3
44 6
45 2
12
0
10
20
30
13 14 15 16 17 18
Score
Frequency
40
19 20
5
15
25
35
a bClass Frequency
1–4 6
5–8 12
9–12 30
13–16 23
17–20 46
21–24 27
25–28 9
Class Frequency
1–7 3
8–14 8
15–21 9
22–28 25
29–35 12
36–42 11
43–49 2

1 Copy the frequency table above
and complete the class centre column.
2 Complete the cumulative frequency
column.
3 How many scores in the
data set were above 30?
4 How many scores in the data set were
40 or less?
5 Is the data set an example of
grouped or ungrouped data?
6 Draw a frequency histogram
for the data set.
7 On your histogram draw
a frequency polygon for
this data set.
8 Calculate the mean of the data.
9 In which class would the median lie?
10 Which is the modal class?
Class Class centre Frequency
Cumulative
frequency
1–10 5
11–20 15
21–30 29
31–40 37
41–50 11
1

Best summary statistics
Having now examined all three summary statistics, it is important to recognise when
it is appropriate to use each one. In some circumstances, one summary statistic may be
more appropriate than the others. For example, a shoe manufacturer notes that in a new
style of sporting footwear:
mean size sold is 8.63
median size is 8.75
mode size is 9.
Summary statistics for
skull measurements
Looking back at the data on Egyptian skulls, we are now in a position to
summarise the measurements with respect to the mean, median and mode for each
set.
1 Draw the table below. (The values for AD 150 have been included for
comparison.)
2 For the time period 4000 BC:
a enter the values for the means calculated previously
b calculate the median for each set
c determine the mode for each set.
3 Compare the ﬁgures you obtained for 4000 BC with the corresponding values
for AD 150. Jot down comments in the ﬁnal column.
4 Write a paragraph indicating what you feel has happened to the shape of the
Egyptian skulls over the time period 4000 BC to AD 150.
investigat
ioninv
estigat
ion
4000 BC AD 150 Comment
Mean x– Breadth 136.2
Height 130.3
Length 93.5
Median
15.5th score
Breadth 137
Height 130
Length 94
Mode Breadth 137
Height 135
Length 92

In this case, the mode is the most useful measure as the manufacturer needs to
know which size sells the most. The mean and median are of less use to the
manufacturer.
The term average is often used indiscriminately, being interpreted sometimes as the
mean, sometimes as the median and sometimes as the mode. The figure that best sup-
ports the cause of the author is the one which (unfortunately) tends to be promoted. We
need to be aware of this, particularly when interpreting statistics. When we summarise
and report statistical information, we need to act in a responsible manner and report
figures that are not misleading.
For each of these examples you will need to think carefully about the relevance of each
summary statistic in terms of the particular example.
Below are the wages of ten employees in a small business.
$220 $230 $290 $275 $265 $250 $1500 $220 $220 $240
a Calculate the mean wage.
b Calculate the median wage.
c Calculate the mode wage.
d Does the mean, median or mode give the best measure of a typical wage in this
business?
THINK WRITE
a Total all the wages. a Total = $3710
Divide the total by 10. Mean = $3710 ÷ 10
= $371
b Write the wages in ascending order. b $220 $220 $220 $230 $240 $250 $265
$275 $290 $1500
Average the 5th and 6th score to find
the median.
Median =
Median = $245
c $220 is the score that occurs most often
and so this is the mode.
c Mode = $220
d The mean is larger than what is typical
because of one very large wage, and the
mode is the lowest wage and so this is not
typical. Therefore, the median is the best
measure.
d The median is the best measure of the typical
wage as the mode is the lowest score, which
is not typical, and the mean is inflated by the
$1500 wage.
1
2
1
2
$240 $250+
2
------------------------------
10WORKEDExample

Best summary statistics
1 There are ten houses in a street. A real estate agent values each house with the
following results.
$150 000 $190 000 $175 000 $150 000 $650 000
$150 000 $165 000 $180 000 $160 000 $180 000
a Calculate the mean house valuation.
b Calculate the median house valuation.
c Calculate the mode house valuation.
d Which of the above is the best measure of central tendency?
2 The table below shows the number of shoes of each size that were sold over a week at
a shoe store.
a Calculate the mean shoe size sold.
b Calculate the median shoe size sold.
c Calculate the mode of the data set.
d Which measure of central tendency
has the most meaning to the shoe
store proprietor?
Size Frequency
4 5
5 7
6 19
7 24
8 16
9 8
10 7
remember
1. The three summary statistics are:
mean — calculated by adding all scores, then dividing by the number of scores
median — the middle score or average of the two middle scores (when scores
are arranged in order)
mode — the score with the highest frequency.
2. Be careful when using the mean. One or two extreme scores can greatly
increase or decrease its value.
3. When the mean is not a good measure of central tendency, the median is used.
4. The mode is the best measure in some examples where discrete data mean that
the mean and median may have very little meaning.
remember
10E
WORKED
Example
10
Mat
hcad
Median,
mode and
range

3 The table below shows the crowds at football matches over a season.
a Calculate the mean crowd over the season.
b Calculate the median class.
c Calculate the modal class.
d Which measure of central tendency would best describe the typical crowd at foot-
ball matches over the season?
4
Mr and Mrs Yousef research the typical price of a large family car. At one car yard they
ﬁnd six family cars. Five of the cars are priced between $30 000 and $40 000, while the
sixth is priced at $80 000. What would be the best measure of the price of a typical
family car?
5 Thirty men were asked to reveal the number of hours they spent doing housework each
week. The results are given below.
1 5 2 12 2 6 2 8 14 18
0 1 1 8 20 25 3 0 1 2
7 10 12 1 5 1 18 0 2 2
a Represent the data in a frequency distribution table. (Use classes 0–4, 5–9, 10–14 etc.)
b Find the mean number of hours that the men spend doing housework.
c Find the median class for hours spent by the men at housework.
d Find the modal class for hours spent by the men at housework.
6 The resting pulse rates of 20 female athletes were measured. The results are shown
below.
50 62 48 52 71
61 30 45 42 48
43 47 51 52 34
61 44 54 38 40
a Represent the data in a frequency distribution table using appropriate groupings.
b Find the mean of the data.
c Find the median class of the data.
d Find the modal class of the data.
e Comment on the similarities and differences between the three values.
Crowd Class centre Frequency
10 000 to <20 000 15 000 95
20 000 to <30 000 25 000 64
30 000 to <40 000 35 000 22
40 000 to <50 000 45 000 15
50 000 to <60 000 55 000 3
60 000 to <70 000 65 000 0
70 000 to <80 000 75 000 1
A Mean B Median C Mode D All are equally important.

7 The following data give the age of 25 patients admitted to the emergency ward of a
hospital.
18 16 6 75 24
23 82 74 25 21
43 19 84 72 31
74 24 20 63 79
80 20 23 17 19
a Represent the data in a frequency distribution table. (Use classes 1–15, 16–30,
31–45, etc.)
b Find the mean age of patients admitted.
c Find the median class of age of patients admitted.
d Find the modal class for age of patients admitted.
e Do any of your statistics (mean, median or mode) give a clear representation of the
typical age of an emergency ward patient?
f Give some reasons that could explain the pattern of the distribution of data in this
question.
8 The batting scores for two cricket players over six innings are as follows:
Player A 31, 34, 42, 28, 30, 41
Player B 0, 0, 1, 0, 250, 0
a Find the mean score for each player.
b Which player appears to be better if the mean result is used?
c Find the median score for each player.
d Which player appears to be better when the decision is based on the median result?
e Which player do you think would be more useful to have in a cricket team and
why? How can the mean result sometimes lead to a misleading conclusion?
9 The following frequency table gives the number of employees in different salary
brackets for a small manufacturing plant.
a Workers are arguing for a pay rise but the management of the factory claims that
workers are well paid because the mean salary of the factory is $22 100. Is this a
sound argument?
b Suppose that you were representing the factory workers and had to write a short
submission in support of the pay rise. How could you explain the management’s
claim? Provide some other statistics to support your case.
Position Salary ($) No. of employees
Machine operator 18 000 50
Machine mechanic 20 000 15
Floor steward 24 000 10
Manager 62 000 4
Chief Executive Ofﬁcer 80 000 1

Wage rise
The workers in an office are trying to obtain a wage rise. In the previous year, the
ten people who work in the office received a 2% rise while the company CEO
received a 42% rise.
1 What was the mean wage rise received in the office last year?
2 What was the median wage rise received in the office last year?
3 What was the modal wage rise received in the office last year?
4 The company is trying to avoid paying the rise. What statistic do you think they
would quote about last year’s wage rises? Why?
5 What statistic do you think the trade union would quote about wage rises?
Why?
6 Which statistic do you think is the most ‘honest’ reflection of last year’s wage
rises? Explain your answer.
Summary statistics for house prices
Quoting different averages can give different impressions about what is normal.
Try the following task.
1 Visit a local real estate agent and study the properties for sale in the window.
Alternatively, retrieve the for-sale ads for a real estate company from the
newspaper.
2 Calculate the mean, median and mode price for houses in the area.
3 If you were a real estate agent and a person wanting to sell his/her home asked
what the typical property sold for in the area, which figure would you quote?
4 Which figure would you quote to a person who wanted to buy a house in the area?
Best summary statistics and
comparison of samples
For this investigation, work in groups of 3 to 6 students.
Examine each of the following statistics.
• The typical mark in maths among Year 11 students.
• The number of attempts taken by Years 11 and 12 students to get their driver’s
licence.
• The typical number of days taken off school by Year 11 students so far this year.
1 For each of the above, gather your data by selecting a random sample.
2 Calculate the mean, median and mode for each topic.
3 Compare your results with the results of other students who will have selected
their samples from the same population.
4 In each case, state the best summary statistic and explain your answer to the
other groups in your class.
inv
estigat
ioninv
estigat
ioninv
estigat
ioninv
estigat
ioninv
estigat
ioninv
estigat
ion

Measures of dispersion or spread
Once a set of scores has been collected and tabulated, we are ready to make some con-
clusions about the data. Two key concepts are the range and the interquartile range,
which are used to measure the spread of a set of scores.
Range
The range is the difference between the highest and the lowest score.
Range = highest score − lowest score
Range from single scores
A smaller range will usually represent a more consistent set of scores. Exceptions to
this are when one or two scores are much higher or lower than most.
A graphics calculator can also be used to determine the range of a distribution.
The 1-Var Stats displays min X (lowest score) and max X (highest score). The difference
between these two values indicates the range of the data.
Range from a frequency distribution table
When we are calculating the range from a frequency distribution table, we ﬁnd the
highest and lowest scores from the score column. We do not use any information from
the frequency column in calculating the range. When the data are presented in grouped
form, the range is found by taking the highest score from the highest class and the
lowest score from the lowest class.
There are 17 players in the squad for a State of Origin match. The number of State of
Origin matches played by each member of the squad is shown below.
2 6 12 8 1 4 8 9 24
4 5 11 14 6 11 15 10
What is the range of this distribution?
THINK WRITE
The lowest number of matches played
is 1.
Lowest score = 1 match
The highest number of matches played
is 24.
Highest score = 24 matches
Calculate the range by subtracting the
lowest score from the highest score.
Range = 24 − 1
= 23 matches
1
2
3
11WORKEDExample
GraphicsCalculatorGraphicsCalculator tip!tip! Calculating the range of
a distribution

Interquartile range
In many cases, the range is not a good indicator of the overall spread of scores.
Consider the two sets of scores below, showing the wages of people in two small
businesses.
A: $240, $240, $240, $245, $250, $250, $260, $800
B: $180, $200, $240, $290, $350, $400, $500, $600
The range for business A = $800 − $240 and for business B = $600 − $180
= $560 = $420
While the range for business A is greater, by looking at the wages in the two busi-
nesses, we can see that the wages in business B are generally more spread. The range
uses only two scores in its calculation. The interquartile range is usually a better
measure of dispersion (spread). We looked at this in the previous chapter.
The quartiles are found by dividing the data into quarters. The lower quartile is the
lowest 25% of scores, the upper quartile is the highest 25% of scores.
Before we can calculate an interquartile range we must be able to calculate the
median. To calculate the median, we must ﬁrst arrange the scores in ascending order.
The median is the middle score (if there is an odd number of scores; or the average of
the two middle scores if there is an even number of scores). Remember that the median
position is the th score.
The frequency distribution table at right shows
the heights (in cm) of boys competing for a place
on a basketball team.
Find the range of these data.
THINK WRITE
The lowest score is at the bottom of the
170 to <175 class.
Lowest score = 170 cm
The highest score is at the top of the
195 to <200 class.
Highest score = 200 cm
Range = highest score − lowest score. Range = 200 − 170
= 30 cm
Height Frequency
170 to <175 3
175 to <180 6
180 to <185 12
185 to <190 10
190 to <195 8
195 to <200 1
1
2
3
12WORKEDExample
n 1+
2
------------

The interquartile range is the difference between the upper quartile and the lower
quartile. To find the lower and upper quartiles we arrange the scores in ascending order.
The lower quartile is of the way through the distribution and the upper quartile is
of the way through the distribution.
To find the interquartile range we follow the steps below.
1. Arrange the data in ascending order.
2. Divide the data into halves by finding the median.
(a) If there is an odd number of scores the median score should not be included in
either half of the scores.
(b) If there is an even number of scores the middle will be half way between two
scores and this will divide the data neatly into two sets.
3. The lower quartile will be the median of the lower half of the data.
4. The upper quartile will be the median of the upper half of the data.
5. The interquartile range will be the difference between the medians of the two halves
of the data.
Calculate the median of:
a 2, 5, 8, 8, 8, 11, 12 b 45, 69, 69, 87, 88, 92, 99, 100.
THINK WRITE
a These scores are already arranged in ascending order, so
there is no need to reorder. There are 7 scores, so the
median is the 4th score.
a Median = 8
b There are 8 scores, so the median is the average of the
4th score and the 5th score.
b Median =
= 87.5
7 1+
2
------------ 4th score=
 
 
8 1+
2
------------ 4.5th score=
 
 
87 88+
2
------------------
13WORKEDExample
1
4
---
3
4
---
14WORKEDExample
Find the interquartile range of the following data which shows the number of home runs
scored in a series of baseball matches.
12, 9, 4, 6, 5, 8, 9, 4, 10, 2
THINK WRITE
Write the data in ascending order. 2, 4, 4, 5, 6, 8, 9, 9, 10, 12
Divide the data into two equal halves. 2, 4, 4, 5, 6 8, 9, 9, 10, 12
The lower quartile will be the median
of the lower half.
Lower quartile = 4 runs
The upper quartile will be the median
of the upper half.
Upper quartile = 9 runs
The interquartile range will be the
upper quartile minus the lower quartile.
Interquartile range = 9 − 4
= 5 runs
1
2
3
4
5

The interquartile range can also be calculated using a graphics calculator.
The data below give the amount spent (to the nearest whole dollar) by each child in a
group that was taken on an excursion to the Brisbane Exhibition.
15 12 17 23 21 19 16 11 17 18 23 24 25 21 20 37 17 25 22 21 19
Calculate the interquartile range for these data.
THINK DISPLAY
Enter the data.
(a) Press .
(b) Select 1:Edit by pressing .
(c) Enter the data in L1.
Note: There is no need to organise the
data into ascending order first.
Obtain the values of the quartiles.
(a) Press .
(b) Select CALC. Make sure that 1-Var
Stats is set up as Xlist: L1 and Freq:
1.
(c) Select 1:1–Var Stats by pressing
.
(d) Type L1. Press .
A list of statistics appears. Locate the
first and third quartiles.
Scroll down the screen using the key.
Q1 = 17 and Q3 = 23
So, IQR = $23 − $17 = $6
1
STAT
ENTER
2
STAT
ENTER
ENTER
3 M
15WORKEDExample
remember
1. Measures of dispersion are used to measure the spread of a set of scores.
2. The range is calculated by subtracting the lowest score from the highest score.
3. A single outlying score can enlarge the range. The interquartile range is
therefore a better measure of dispersion.
4. The interquartile range is found by subtracting the lower quartile from the
upper quartile.
5. The lower and upper quartiles are found by dividing the scores into two equal
halves. The median of the lower half is the lower quartile and the median of the
upper half is the upper quartile.
6. Remember to show units in your final answer.
remember

Range and interquartile
range
The range is a measure of __________ or __________ of a set of scores. It can be
calculated by subtracting the __________ score from the __________ score. The
value of the range can be affected by a single __________ score. For this reason, the
__________ range is sometimes a better measure of the spread of the scores. It can be
calculated as the difference between the __________ quartile and the __________
quartile. The __________ divides the scores in half; the lower quartile represents a
score below which lies __________ of the scores; the upper quartile represents a
score above which __________ of the scores lie. The lower quartile, median and
upper quartile divide the distribution into __________ equal parts. In each of these
parts there is the same number of __________.
2 Find the range of each of the following sets of data.
a 2, 5, 4, 5, 7, 4, 3
b 103, 108, 111, 102, 111, 107, 110
c 2.5, 2.8, 3.4, 2.7, 2.6, 2.4, 2.9, 2.6, 2.5, 2.8
d 3.20, 3.90, 4.25, 7.29, 1.45, 2.77, 8.39
e 45, 23, 7, 47, 76, 89, 96, 48, 87, 76, 66
3 Use the frequency distribution tables below to ﬁnd the range for each of the following
sets of scores.
a Score Frequency b Score Frequency
1 2 38 23
2 6 39 46
3 12 40 52
4 10 41 62
5 7 42 42
43 45
c Score Frequency
89 12
90 25
91 36
92 34
93 11
94 9
95 4
10F
WORKED
Example
11

4 For the grouped dispersions below, state the range.
5 The scores below show the number of points scored by two AFL teams over the first
10 games of the season.
Sydney: 110 95 74 136 48 168 120 85 99 65
Collingwood: 125 112 89 111 96 113 85 90 87 92
a Calculate the range of the scores for each team.
b Based on the results above, which team would you say is the more consistent?
6 Two machines are used to put approximately 100 Smarties into boxes. A check is
made on the operation of the two machines. Ten boxes filled by each machine have
the number of Smarties in them counted. The results are shown below.
Machine A: 100, 99, 99, 101, 100, 101, 100, 100, 101, 108
Machine B: 98, 104, 96, 97, 103, 96, 102, 100, 97, 104
a What is the range in the number of Smarties from the first machine?
b What is the range in the number of Smarties from the second machine?
c Ralph is the quality control officer and he argues that machine A is more consis-
tent in its distribution of Smarties. Explain why.
7 Find the median for each of the data sets below.
a 3, 4, 4, 5, 7, 9, 10
b 17, 20, 19, 25, 29, 27, 28, 25, 29
c 52, 55, 53, 53, 54, 55, 52, 53, 54, 52
d 12, 14, 15, 12, 14, 19, 17, 15, 18, 20
e 56, 75, 83, 47, 93, 35, 84, 83, 73, 20, 66, 90
a Class Frequency b Class Frequency
51–60 2 150 to <155 12
61–70 8 155 to <160 25
71–80 15 160 to <165 38
81–90 7 165 to <170 47
91–100 1 170 to <175 39
175 to <180 20
c Class Frequency
40–43 48
44–47 112
48–51 254
52–55 297
56–59 199
60–63 84
WORKED
Example
12
WORKED
Example
13

8 For each of the data sets in question 7, calculate the interquartile range.
9
For the frequency table below, what is the range?
10
Calculate the interquartile range of the following data.
17, 18, 18, 19, 20, 21, 21, 23, 25
11
The interquartile range is considered to be a better measure of the variability of a set
of scores than the range because it:
A takes into account more scores
B is the difference between the upper and lower quartiles
C is easier to calculate
D is not affected by extreme values.
12
The distribution below shows the ranges in the heights of 25 members of a football squad.
Which of the statements below is correct?
A The range of the distribution is 40.
B The range of the distribution is 49.
C The range of the distribution is 9.
D The range can be estimated only by using the cumulative frequency.
Score Frequency
25 14
26 12
27 19
28 25
29 19
A 4 B 5 C 6 D 17
A 3 B 4 C 5 D 8
Height (cm) Class centre Frequency
Cumulative
frequency
140–149 144.5 2 2
150–159 154.5 5 7
160–169 164.5 10 17
170–179 174.5 7 24
180–189 184.5 1 25
WORKED
Example
14,15
EXCE
L Spreadshe
et
Interquartile
range

Standard deviation
We have already discussed using the range and the interquartile range as measures of
the spread of a data set. However, the most commonly used measure of spread is the
standard deviation.
The standard deviation is a measure of how much a typical score in a data set differs
from the mean.
Standard deviation from single scores
The standard deviation may be found by entering a set of scores into your calculator,
just as you do when you are ﬁnding the mean. Your calculator will have a statistical
function that gives the standard deviation.
There are two standard deviation functions on your calculator. The ﬁrst, σn, is the
population standard deviation. This function is used when the statistical analysis is
conducted on the entire population.
When the statistical analysis is done using a sample of the population, a slightly dif-
ferent standard deviation function is used. Called the sample standard deviation, this
value will be slightly higher than the population standard deviation.
The sample standard deviation will be found on your calculator using
the σn − 1 or the sn function.
Below are the scores out of 100 achieved by a class of 20 students on a science exam.
Calculate the mean and the standard deviation.
87 69 95 73 88 47 95 63 91 66
59 70 67 83 71 57 82 65 84 69
THINK WRITE
Enter the data set into your calculator.
Retrieve the mean using the x– function. x– = 74.05 marks
Retrieve the standard deviation using the σn function. σn = 13.07 marks
1
2
3
16WORKEDExample
Ian surveys twenty Year 11 students and asks how much money they earn from part-time
work each week. The results are given below.
$65 $82 $47 $78 $108 $94 $60 $79 $88 $91
$50 $73 $68 $95 $83 $76 $79 $72 $69 $97
Calculate the mean and standard deviation.
THINK WRITE
Enter the statistics into your calculator.
Retrieve the mean using the x– function. x– = $77.70
Retrieve the standard deviation using the σn − 1
function, as a sample has been used.
σn − 1 = $15.56
1
2
3
17WORKEDExample

For most examples, you will need to read the question carefully to decide whether to
use the population or the sample standard deviation.
Standard deviation using a graphics calculator
A graphics calculator can be used to determine a population or sample standard devi-
ation. Using the TI–83, the population standard deviation is displayed under 1-Var Stats
as sx and the sample standard deviation is shown as Sx.
As mentioned previously, the sample standard deviation is slightly higher than the
population standard deviation (compare the values for Sx and σx in the above example).
Standard deviation from a frequency distribution table
The standard deviation can also be calculated when the data are presented in table
form. This is done by entering the data in the same way as they were when calculating
the mean earlier in this chapter. A graphics calculator can also be used.
The price (in cents) per litre of petrol at a service station is recorded each
Friday over a 15-week period and the data are given below.
76.2 80.1 79.8 84.3 80.7 78.3 82.4 81.3
80.5 78.2 79.5 80.1 81.3 84.2 83.4
Calculate the sample standard deviation for this set of data using a graphics calculator.
THINK WRITE/DISPLAY
Enter the data as L1 in the graphics calculator.
Calculate the standard deviation.
(a) Press .
(b) Highlight CALC.
(c) Select 1–Var Stats.
(d) Type L1 or the name of the list into which
you have entered the data. A list of
statistics is produced with the mean at the
top. So, the standard deviation is given
by
Sx = 2.257 959 467
Sx = 2.26 (correct to 2 decimal places).
s = 2.257 959 467
s = 2.26 cents/L
1
2
STAT
18WORKEDExample

If you are using a graphics calculator to determine the mean and standard deviation in
example 19, enter the scores as L1 and their corresponding frequencies as L2.
Remember to set up 1-Var Stats as Xlist: L1 and Freq: L2. The mean will then be displayed
as x and the standard deviation as sx.
Once we have calculated the standard deviation we can make conclusions about the
reliability and consistency of the data set. The lower the standard deviation, the less spread
out the data set is. By using the standard deviation we can determine whether a set of
scores is more or less consistent (or reliable) than another set. The standard deviation is
the best measure of this because, unlike the range or interquartile range as a measure of
dispersion, the standard deviation considers the distance of every score from the mean.
A higher standard deviation means that scores are less clustered around the mean and
less dependable. For example, consider the following two students’ results over a
number of assessment pieces:
Student A: x– = 60 σn = 5
Student B: x– = 60 σn = 15
The table below shows the scores of a class of thirty Year 3 students on a spelling test.
Calculate the mean and standard deviation.
Score Frequency
4 1
5 2
6 4
7 9
8 6
9 7
10 1
THINK WRITE
Enter the data into your calculator
using score × frequency.
Retrieve the mean by using the x–
function.
x– = 7.4
Retrieve the standard deviation using
the σn function, as the whole population
is included in the statistics.
σn = 1.4
1
2
3
19WORKEDExample
GraphicsCalculatorGraphicsCalculator tip!tip! Standard deviation

Both students have the same mean. However, student A has a standard deviation of 5
and student B has a standard deviation of 15. Student A is far more consistent and can
conﬁdently be expected to score around 60 in any future exam. Student B is more
inconsistent but is probably capable of scoring a mark higher than student A’s.
Two brands of light globe are tested to see
how long they will burn (in hours).
Brand X: 850 950 1400 875 1200
1150 1000 900 850 825
Brand Y: 975 1100 1050 1000 975
950 1075 1025 950 900
Which of the two brands of light globe is more reliable?
THINK WRITE
Enter both sets of data into your
calculator.
Choose the sample standard deviation
because a sample of each light globe
brand has been chosen.
Write down the sample standard
deviation for each brand.
Brand X: sample standard deviation = 190.4 h
Brand Y: sample standard deviation = 62.4 h
The brand with the lower standard
deviation is the more reliable.
Brand Y is the more reliable as it has a lower
standard deviation.
1
2
3
4
20WORKEDExample
remember
1. The standard deviation is a measure of the spread of a data set.
2. Standard deviation is found on your calculator by entering the data set using
the calculator’s statistical mode.
3. The population standard deviation is used when an entire population is
considered in the statistical analysis and can be found on the calculator using
the σn function (or Sx on the graphics calculator).
4. The sample standard deviation is used when a sample of the population is used
in the analysis and can be found using the σn − 1 function (Sx on the graphics
calculator).
5. A set of data is considered to be more consistent or reliable if it has a low
standard deviation.
remember

Standard deviation
Standard deviation is a measure of the __________ of the scores. It represents how
much a typical score differs from the __________ of the data set. The statistical func-
tion used on the calculator for a population standard deviation is __________, while
for a sample of the population, the function used is __________. A low value for the
standard deviation indicates that the set of scores is more __________ while a high
value indicates __________ consistency or reliability.
2 For each of the sets of scores below, calculate the standard deviation. Assume that the
scores represent an entire population and answer correct to 2 decimal places.
a 3, 5, 8, 2, 7, 1, 6, 5
b 11, 8, 7, 12, 10, 11, 14
c 25, 15, 78, 35, 56, 41, 17, 24
d 5.2, 4.7, 5.1, 12.6, 4.8
e 114, 12, 3.6, 42.8, 0.5
3 For each of the sets of scores below, calculate the sample standard deviation, correct
to 2 decimal places.
a 25, 36, 75, 85, 6, 49, 77, 80, 37, 66
b 4.8, 9.3, 7.1, 9.9, 7.0, 4.1, 6.2
c 112, 25, 56, 81, 0, 5, 178, 99, 41
d 0.3, 0.3, 0.3, 0.4, 0.5, 0.6, 0.8, 0.8, 0.8, 0.9, 1.0
e 56, 1, 258, 45, 23, 58, 48, 35, 246
4 For each of the following, state whether it is
appropriate to use the population standard deviation
or the sample standard deviation.
a A quality control ofﬁcer tests the life of
50 batteries from a batch of 1000.
b The weight of every bag of potatoes is checked
and recorded before being sold.
c The number of people who attend every football
match over a season is analysed.
d A survey of 100 homes records the number of
cars in each household.
e The score of every Year 11 student in
mathematics is recorded.
5 The band ‘Aquatron’ is to release a new CD. The recording company needs to predict
the number of copies that will be sold at various music stores throughout Australia. To
do so, a sample of 10 music stores supplied information about the sales of the
previous CD released by Aquatron, as shown below.
580 695 547 236 458 620 872 364 587 1207
a Calculate the mean number of sales at each store.
b Should the population or sample standard deviation be used in this case?
c What is the value of the appropriate standard deviation?
10G
WORKED
Example
16
GC pro
gram
UV
statistics
WORKED
Example
17
WORKED
Example
18

6 A supermarket chain is analysing its sales over a week. The chain has 15 stores and
the sales for each store for the past week were (in $million):
1.5 2.1 2.4 1.8 1.1 0.8 0.9 1.1 1.4 1.6 2.0 0.7 1.2 1.7 1.3
a Calculate the mean sales for the week.
b Should the population or sample standard deviation be used in this case?
c What is the value of the appropriate standard deviation?
7 Use the statistical function on your calculator to ﬁnd the mean and standard deviation
(correct to 1 decimal place) for the information presented in the following tables. In
each case, use the population standard deviation.
8 Copy and complete the class centre column for each of the following distributions and
hence use your calculator to ﬁnd an estimate for the mean and standard deviation
(correct to 2 decimal places). In each case use the population standard deviation.
WORKED
Example
18
a b c
Score Frequency
3 12
4 24
5 47
6 21
7 7
Score Frequency
45 1
46 16
47 39
48 61
49 52
50 36
Score Frequency
75 22
76 17
77 8
78 10
79 12
80 21
81 29
WORKED
Example
19
a
c
b
Class
Class
centre Frequency
10–12 12
13–15 16
16–18 25
19–21 28
22–24 13
Class
Class
centre Frequency
31–40 15
41–50 28
51–60 36
61–70 19
71–80 8
81–90 7
91–100 2
Class
Class
centre Frequency
0–4 15
5–9 24
10–14 31
15–19 33
20–24 29
25–29 17

9 Below are the marks achieved by two students in ﬁve tests.
Brianna: 75, 80, 70, 72, 78
Katie: 50, 95, 90, 80, 55
a Calculate the mean and standard deviation for each student.
b Which of the two students is more consistent? Explain your answer.
10
From Year 11, 21 students are chosen to complete a test. The scores are shown in the
table below.
When preparing an analysis of the typical performance of Year 11 students on the test,
the standard deviation used is:
11
The results below are Ian’s marks in four exams for each subject that he studies.
English: 63 85 78 50
Maths: 69 71 32 97
Biology: 45 52 60 41
Geography: 65 78 59 61
In which subject does Ian achieve the most consistent results?
12 The following frequency distribution gives the prices paid by a car wrecking yard for
a sample of 40 car wrecks.
Find the mean and standard deviation of the price paid for these wrecks.
Class Frequency
10 to <20 1
20 to <30 6
30 to <40 9
40 to <50 4
50 to <60 1
A 9.209 B 9.437 C 21 D 34.048
A English B Maths C Biology D Geography
Price ($) Frequency
0 to <500 2
500 to <1000 4
1000 to <1500 8
1500 to <2000 10
2000 to <2500 7
2500 to <3000 6
3000 to <3500 3
WORKED
Example
20

13 The table below shows the life of a sample of 175 household light globes.
a Find the range of the data.
b Use the class centres to ﬁnd the
mean and standard deviation in
the lifetimes of this sample of light
globes.
14 Crunch and Crinkle are two
brands of potato crisps. Each is
sold in packets nominally of the
same size and for the same price.
Upon investigation of a sample
of packets of each, it is found
that Crunch and Crinkle packets
have the same mean mass (25 g).
The standard deviation of the
masses of Crunch packets is,
however, 5 g and the standard
deviation of the masses of
Crinkle packets is 2 g.
Which brand do you think
represents better value for money
under these circumstances? Why?
Life (hours) Frequency
200 to <250 2
250 to <300 5
300 to <350 12
350 to <400 25
400 to <450 42
450 to <500 38
500 to <550 26
550 to <600 15
600 to <650 7
650 to <700 3
Work
SHEET 10.2

For the set of scores 23, 45, 24, 19, 22, 16, 16, 27, 20, 21, ﬁnd:
1 the mean
2 the median
3 the mode
4 the range
5 the lower quartile
6 the upper quartile
7 the interquartile range
8 the population standard deviation.
9 Which measure of central tendency is the best measure of location in this data set?
10 Explain why the interquartile range is a better measure of spread than the range.
Displaying statistical data and
statistical graphs
1 As a class, collect information on:
a the number of people that live
in each student’s household
b the number of pets in each
student’s household.
2 Use your graphics calculator to
enter the data as two separate
lists, L1 and L2.
3 Use the statistics function on the
calculator to ﬁnd the following
information for each data set.
a mean b median
c minimum value d maximum value
e lower quartile f upper quartile
4 Use the statistical plotting function on your calculator to draw a boxplot of the
data you have entered.
inv
estigat
ioninv
estigat
ion
2
MQ Maths A Yr 11 - 10 Page 427 Thursday, July 5, 2001 9:14 AM

Archie’s new skull site
Having considered some of the statistics on skull measurements from 4000 BC and
AD 150, we are now in a position to collate all the figures from these two time
periods and compare them with the statistics from Archie’s new discovery. We
must approach this task in a methodical manner.
Step 1
Organise the statistics we already know. Copy the table below and fill in any data
already calculated for 4000 BC and AD 150.
Step 2
Calculate and fill in any statistics missing for 4000 BC and AD 150 (that is, standard
deviation and range values).
Step 3
Consider the data Archie has collated from measurements on the skulls from his
new site.
GCpr
ogram
UV
statistics
investigat
ioninv
estigat
ion
4000 BC AD 150 New site
Mean Breadth
Height
Length
Median Breadth
Height
Length
Mode Breadth
Height
Length
Standard
deviation
Breadth
Height
Length
Range Breadth
Height
Length

Calculate the mean, median, mode, standard deviation and range for the data from
the new site. Add these to the table.
Step 4
Compare the values obtained for Archie’s new site with those for 4000 BC and
AD 150.
Step 5
Now comes the decision phase. To which era does the new site appear closer? Is
there consistency here?
Step 6
The ﬁnal phase requires a report of the ﬁndings. In doing so, you must remember
that any conclusions drawn must be backed by providing substantial evidence.
Write a paragraph advising Archie of the results of this study. Describe the
changes you have observed in the shape of the skulls from 4000 BC to AD 150 and
indicate which time period you feel his new data most closely matches. Back your
recommendation with ample statistical evidence.
These types of project are undertaken every day in a variety of situations. It is
vital that we realise the importance of reporting statistical information accurately
and in an unbiased manner.
Math
cad
Summary
statistics
124 138 101 131 128 98
133 134 97 138 129 107
138 134 98 123 131 101
148 129 104 130 129 105
126 124 95 134 130 93
135 136 98 137 136 106
132 145 100 126 131 100
133 130 102 135 136 97
131 134 96 129 126 91
133 125 94 134 139 101
133 136 103 131 134 90
131 139 98 132 130 104
131 136 99 130 132 93
138 134 98 135 132 98
130 136 104 130 128 101

Comparing sets of data
Back-to-back stem-and-leaf plots
Some of the most useful and interesting statistical investigations involve the com-
parison of two sets of data. In the previous chapter, we drew stem-and-leaf plots of
single sets of data. We shall now consider using such plots to compare two data sets.
Back-to-back stem-and-leaf plots are useful to compare the distribution of two
similar sets of data. This is particularly useful in the situation of controlled
experiments. The two sets of data use the same central stem. One set of leaves is set to
the right of the stem and the other to the left. Care must be taken when arranging the
data of the left set. Place the smallest numeral closest to the central margin, then range
outwards as the data size increases. The key generally relates to data which are
presented on the right of the plot.
An example of such a plot is presented below.
The data show the lifetimes of a sample of 40
batteries of each of two brands when ﬁtted into
a standard children’s toy. Some of the toys are
ﬁtted with an ordinary brand battery and some
with Brand X. Which brand is better?
Key: 6 9 = 69 hours
The spread of each set of data can be seen graphically from the stem-and-leaf plot. In
this case it can be seen that, although Brand X showed a little more variability than the
ordinary brand, it generally gave a longer lasting performance.
Side-by-side or parallel boxplots
Two or more sets of data may be compared by using side-by-side boxplots. The
boxplots share a common scale. Numerical comparisons can be made between the sets
of data based upon the size and position of the range, interquartile range and median.
This is a strong feature of a boxplot.
In general, a histogram or stem-and-leaf plot is better than a boxplot at giving the
reader information about the distribution of a set of scores (because boxplots do not
show individual scores), but boxplots have greater scope for making quantitative com-
parisons. In the case of the battery test data above, the following side-by-side boxplot
would result. (Quartiles and medians are found in the usual way.)
Ordinary brand Brand X
Leaf
8 6 2 0 0
9 9 9 8 8 6 4 0
8 8 7 5 3 1 1 1 0
9 6 6 4 2 2 2 0 0
8 7 5 3 1 1 1
4 2
Stem
6
7
8
9
10
11
12
13
14
Leaf
9
3 5
2 4 8
0 1 4 5 5 9
0 0 2 5 8 8 9 9
0 0 1 1 3 3 6 7 9
1 4 6 6 6 7 8 8
3 5
6

We can make the fol-
lowing comparisons be-
tween the sets of data:
1. Brand X showed more
variability in its performance (that is, its lifetime) than the ordinary brand. (Brand X
range = 77, ordinary brand range = 54; Brand X interquartile range = 27.5, ordinary
brand interquartile range = 19.)
2. The longest lifetime recorded was that of a Brand X battery (146).
3. The shortest lifetime recorded was that of an ordinary brand battery (60).
4. Brand X batteries’ median lifetime was better than that of the ordinary brand (Brand
X median = 109.5, ordinary brand median = 87.5).
5. Over one-quarter of Brand X batteries were better performers than the best ordinary
brand battery; that is, had longer lifetimes than the longest of the ordinary brand
batteries’ lifetimes. (Remember that the four sections of a boxplot each represent
one-quarter of the scores.)
6050 70 80 90 100 110 120 130 140 150 Hours
Brand X
Ordinary Brand
The stem-and-leaf plot below shows the weights
of two samples of chickens 3 months after
hatching. One group of chickens (Group A) had
been given a special growth hormone. The other
group (Group B) was kept under identical
conditions but was not given the hormone.
Prepare side-by-side boxplots of the data and
draw conclusions about the effectiveness of the
growth hormone.
Key: 0* 8 = 0.8 kg
1 3 = 1.3 kg
Continued over page
Group B Group A
Leaf
4 4 4
9 8 8 7 7 5 5
4 4 3 0 0 0
0
0*
1
1*
2
2*
Leaf
8
3
5 7 7 9
0 0 0 1 1 3 3
5 8 8
THINK WRITE
First locate the medians of each group.
There are 16 observations in each
group.
The median of each group is the th score
— that is, the 8.5th score, or between the 8th
and 9th scores.
The median divides each group into
two halves; the quartiles are the
medians of the upper and lower halves.
There are 8 scores in each half.
The position of the quartiles is given by the
th score — that is, the 4.5th score,
halfway between the 4th and 5th scores in
each half.
1
16 1+
2
---------------
2
8 1+
2
------------
21WORKEDExample

The graphics calculator can also be used to display parallel boxplots.
THINK WRITE
Find the quartiles and medians on the
stem-and-leaf plot. Be careful to count
from the centre out with each set of
data.
Write a ﬁve-number summary for each
group.
Group A: 0.8, 1.7, 2.0, 2.3, 2.8
Group B: 0.4, 0.5, 0.8, 1.0, 1.4
Draw the boxplots using a common
scale.
Compare the data. Consider central
score, highest and lowest scores,
variability in scores, etc.
• The biggest of all chickens was from Group
A (hormone group).
• The smallest of all chickens was from Group
B (no hormone).
• The Group A data showed a little more vari-
ability (Group A interquartile range = 0.6,
Group B interquartile range = 0.5).
• The median size of chickens in Group A was
larger (Group A median = 2, Group B median
= 0.8).
• Over three-quarters of the Group A chickens
were bigger than all of the Group B chickens!
Conclusion:
The growth hormone proved to be effective.
3
Key: 0* 8 = 0.8 kg
1 3 = 1.3 kg
Group B Group A
Leaf
4 4 4
9 8 8 7 7 5 5
4 4 3 0 0 0
Stem
0
0*
1
1*
2
2*
Leaf
8
3
5 7 7 9
0 0 0 1 1 3 3
5 8 8 Q3
Median
Q1
Q3
Q1
Median
4
5
Group A
Group B
0 1.51.00.5 2.0 2.5 3.0 kg
6

The four Year 11 Maths A classes at Western Secondary College complete the
same end-of-year maths test. The marks, expressed as percentages for each of
the students in the four classes, are given below.
Display the data using a parallel boxplot and use this to describe any similarities or
differences in the distributions of the marks among the four classes.
Continued over page
11A 11B 11C 11D 11A 11B 11C 11D
40 60 50 40 63 78 70 69
43 62 51 42 63 82 72 73
45 63 53 43 63 85 73 74
47 64 55 45 68 87 74 75
50 70 57 50 70 89 76 80
52 73 60 53 75 90 80 81
53 74 63 55 80 92 82 82
54 76 65 59 85 95 82 83
57 77 67 60 89 97 85 84
60 77 69 61 90 97 89 90
THINK WRITE/DISPLAY
Create the ﬁrst boxplot (for class 11A)
on a graphics calculator using
[STAT PLOT] and appropriate WINDOW
settings. Using to show key
values, sketch the ﬁrst boxplot using
pen and paper, leaving room for three
additional plots.
Repeat step 1 for the other three
classes.
All four boxplots share the common
scale.
1
2nd
TRACE
2
3
30 40 50 60 70 80 90 100
11D
11C
11B
11A
Maths mark (%)
22WORKEDExample

It is important to remember that, in a boxplot, each section represents one-quarter of
the scores. If one section of a boxplot is longer compared with another section of the
same boxplot, it can be interpreted that the scores are more spread out in that section —
not that the longer section contains a greater number of scores.
THINK WRITE
Describe the similarities and
differences between the four
distributions.
Class 11B had the highest median mark and the
range of the distribution was only 37. The
lowest mark in 11B was 60.
We notice that the median of 11A’s marks is
approximately 60. So, 50% of students in 11A
received less than 60. This means that half of
11A had scores that were less than the lowest
score in 11B.
The range of marks in 11A was about the
same as that of 11D with the highest scores in
each about equal, and the lowest scores in each
about equal. However, the median mark in 11D
was higher than the median mark in 11A so,
despite a similar range, more students in 11D
received a higher mark than in 11A.
While 11D had a top score that was higher
than that of 11C, the median score in 11C was
higher than that of 11D and the bottom 25% of
scores in 11D were less than the lowest score in
11C. In summary, 11B did best, followed by
11C then 11D and ﬁnally 11A.
4
remember
1. Back-to-back stem plots are useful for comparing distributions of two similar
sets of data.
When completing back-to-back stem plots:
(a) use a common stem
(b) distinguish between the two sets of data by labelling them clearly
(c) the key generally relates to data on the right-hand side of the central stem
(d) when organising the data to the left of the central stem, the smallest piece
of data goes closest to the central stem, then outwards as the data increases.
2. Side-by-side or parallel boxplots:
(a) share a common scale
(b) allow us to make quantitative comparisons between sets of data, based
upon the size and position of the range, quartiles, interquartile range and
medians
(c) allow us to compare more than two sets of data.
remember

Comparing sets of data
1 The boxplots at right show the results of testing
two brands of transistor by applying increasing
voltage.
a Which brand had the transistor which could
withstand the highest voltage?
b Which brand had the transistor which could withstand the least voltage?
c Which brand gave the better median performance?
d Which brand showed most variability in terms of range?
e Which brand showed most variability in terms of interquartile range?
f Which brand would you recommend to a manufacturer of electronic equipment if
they requested a transistor that reliabily worked at its expected voltage?
g Which brand would you recommend to a manufacturer of electronic equipment if
they requested a transistor that was likely to withstand higher voltages?
2 The boxplots at right were
drawn by a teacher who was
trying to assess the effects of
allowing students the privi-
lege of an ‘open book’ exam.
Plot A gives information about the results of students who were allowed to use their
textbook in an end-of-unit test. Plot B gives information about the results of students
who did the test without the aid of the textbook.
a Which group had the student who gained the best test result?
b Which group did best on median result?
c Compare the variability in the performance of each group.
d Does the use of the textbook get a better test performance? Explain.
e What other things need to be taken into account when drawing these conclusions?
3 Draw side-by-side boxplots for the following pair of ﬁve ﬁgure summaries.
Group X: 14, 18.5, 21.5, 27.5, 33
Group Y: 11, 17.5, 21, 26.5, 35
4 The following stem-and-leaf plots give the age at marriage of a group of 10 women
and a group of 10 men.
Key: 1 8 = 18 years old
a Draw side-by-side boxplots of the data.
b Make comparisons about the distribution of the sets of data.
Men Women
Leaf
8 7
9 8 7 5 1
6 3
0
Stem
1
2
3
4
Leaf
8 8
0 2 3 4 4 5
0 1
10H
5 10 15 20 25 30 Volts
Brand A
Brand B
Math
cad
Interpreting
boxplots
10 20 30 40 50 60 70 80 90 100 Test result
Text
No Text
Plot A
Plot B
WORKED
Example
21

5 The number of words in each of the ﬁrst 12 sentences is counted in each of 3 different
types of book: a children’s book, a Year 12 geography text, and a major daily
newspaper. The results are as follows:
a Draw side-by-side boxplots of the data.
b Make comparisons about the sentence length of each type of publication. Use stat-
istics in your answer.
6 The stem-and-leaf plot below gives the batting scores of two cricket players — Smith
and Jones — who share the responsibility of ‘opening the batting’ for their side.
Key: 1 2 = 12
a Derive a ﬁve-number summary for each player.
b Draw side-by-side boxplots of the data.
c Make comparisons between the two sets of data. Use statistics in your answer.
d Which player do you consider to be the best ‘opening bat’ and why?
Questions 7 and 8 refer to the following stem-and-leaf plot.
Key: 12 2 = 122
7
The lower quartile of Group B is:
8
Which of the following statements is untrue?
A Data from Group A show less consistency than the data from Group B.
B Data from Group B have a lower interquartile range.
C Group B has a greater median.
D Group A shows a greater amount of variability.
E None of the above. (All of the statements are true.)
Children’s book
Geography text
Newspaper
6
16
12
8
18
6
12
25
8
15
13
14
6
10
18
8
25
7
10
29
12
8
18
10
5
7
21
11
22
17
10
28
16
8
22
8
Jones Smith
Leaf
3
8 7 4 2
9 9 8 7 7 5
8 4 4 2 0
5 2 0
1
Stem
0
1
2
3
4
5
6
7
8
Leaf
0 0 1
2 6 9
6 6 8
7 8 8 9 9
0 4 6
2 4
5
Group B Group A
Leaf
6
8 5 4 2 2
8 5 5 3 0 0
7 4 4 1 0
1 1
Stem
12
13
14
15
16
17
18
Leaf
2
3 8
0 4 4 6
2 3 5 7 8
2 4 4 5
2 6
1
A 156.5 B 144 C 155 D 152 E none of the above.
WORKED
Example
22

Questions 9 and 10 refer to the
boxplots at right.
9
Which of the following statements is a correct comparison of the data?
A Group X has a higher median and shows more variability than Group Y.
B Group X has a lower median and shows more variability than Group Y.
C Group X has a higher median and shows less variability than Group Y.
D Group X has a lower median and shows less variability than Group Y.
E It is impossible to make comparisons like this without seeing the data displayed on
a stem-and-leaf plot.
10
Which of the following statements is untrue of the boxplots?
A One-quarter of all Group X data is greater than any of Group Y data.
B The median of Group X is 25.
C The interquartile range of Group X is 25.
D The range of Group Y is 9.
E None of the above. (All the statements are true.)
11 A packing machine is meant to pack sacks of flour in 20.0 kg weights. A quality con-
trol manager notices that the machine appears to be too generous in the amount that it
is putting into each sack. After checking a sample of sacks for weight he adjusts the
machine. After some time he selects a second sample of sacks and checks their
weights. The results are detailed on the stem-and-leaf plot below.
Key: 20 3 = 20.3 kg
20* 5 = 20.5 kg
a Derive a five-number summary for each set of data.
b Draw side-by-side boxplots of the data.
c Compare the performance of the machine before and after it was adjusted. Use
statistics in your answer.
d Should the manager have adjusted the machine? Why (not)?
12 A new spray treatment has been developed to improve the budding of apple trees.
Twenty-five trees are subjected to the spray treatment while 25 others are kept as a
control. The number of apples that form on each tree is recorded below.
Group A (sprayed)
35 52 71 21 34 42 76 45 48 32
29 85 73 28 34 59 52 56 27 29
33 38 54 42 51
After adjustment Before adjustment
Leaf
4 3
9 8 8 7 7 6 5 5
4 3 1 1 0 0
9 8 6
4 3 0
6 6
2
Stem
19
19*
20
20*
21
21*
22
Leaf
3 4 4
5 5 7 8 8 8 9
0 0 1 1 2 3 3 3 4
5 5 6 7 7 8
15 20 25 30 35 40 Scale
Group X
Group Ymmultiple choiceultiple choice

Group B (unsprayed)
44 42 55 41 39 68 63 62 58 51
43 47 49 45 40 52 56 50 71 39
35 37 38 52 58
a Detail the data on a back-to-back stem-and-leaf plot. Use a class size of 10.
b Prepare side-by-side boxplots of the data.
c Make comparisons of the data. Use statistics in your answer.
d Comment on the overall effect of the spray. Would you recommend that orchard-
ists use the spray?
13 Twenty different flashlight bulbs of each of two brands are tested until they burn out.
The lifetime of each (in hours) is recorded below.
Glow-worm
23 45 31 38 39 41 48 47 54 23
28 35 42 49 50 41 52 48 27 35
Starlet
28 16 24 36 47 18 59 32 64 68
72 35 46 72 54 31 29 36 55 43
a Detail the data on a back-to-back stem-and-leaf plot. Use a class size of 5.
b Prepare side-by-side boxplots of the data.
c Make comparisons of the data. Use statistics in your answer.
d Which brand would you recommend as the better? Why?
Drug test analysis
A new drug for the relief of cold symptoms has been
developed. To test the drug, 40 people were exposed to a cold
virus. Twenty patients were then given a dose of the drug
while another 20 patients were given a placebo. (In medical
tests a control group is often given a ‘placebo’ drug. The
subjects in this group believe that they have been given the
real drug but in fact their dose contains no drug at all.) All
participants were then asked to indicate the time when they
first felt relief of symptoms. The number of hours from the
time the dose was administered to the time when the patients
first felt relief of symptoms are detailed below.
1 Detail the data on a back-to-back stem-and-leaf plot.
2 Prepare side-by-side boxplots of the data.
3 Make comparisons of the data. Use statistics in your answer.
4 Does the drug work? Justify your answer.
5 What other considerations should be taken into account when trying to draw
conclusions from an experiment of this type?
inv
estigat
ioninv
estigat
ion
Group A (drug)
25
42
29
38
32
44
45
42
18
35
21
47
37
62
42
17
62
34
13
32
Group B (placebo)
25
34
17
32
35
25
42
18
35
22
28
28
20
21
32
24
38
32
35
36

Using a spreadsheet or graphics
calculator to obtain summary
statistics
Task 1 Using a spreadsheet
Consider the data below for the average daily sales for the fast food outlets
McDonald’s, KFC and Pizza Hut.
1 Set up the spreadsheet as indicated, entering the sales ﬁgures as numeric
values, then formatting them to ‘currency’ with zero decimal places.
2 The Excel formula that calculates the mean is the average formula. Its format
is =AVERAGE(range). In cell B12 enter the formula =AVERAGE(B4:B10)
to calculate the mean daily sales for McDonald’s.
3 Copy this formula across to cells C12 and D12 to calculate the mean daily
sales for KFC and Pizza Hut.
4 The formula for standard deviation is =STDEV(range). Enter the appropriate
formula into cell B13, then copy it across to cells C13 and D13.
5 Consider the mean and standard deviation values for the three companies.
Whose sales are the best? Which company experiences the most consistent
sales throughout the week?
inv
estigat
ioninv
estigat
ion

6 Select the range A4 to D10. Enter the Sort facility of the Data option and sort
the data in Column B (McDonald’s data) into ascending order. From this
sorted list, determine the five-number summary data for McDonald’s.
7 Sort the data in KFC sales and determine the five-number summary figures for
KFC.
8 Similarly, determine the five-number summary figures for Pizza Hut.
9 The spreadsheet does not offer the facility of graphing a boxplot. From your
data collected above, draw parallel boxplots on the same scale, then compare
the performances of the three companies.
10 Write a paragraph reporting the results of your findings. Support your
conclusions by specific reference to your spreadsheet and boxplots.
Task 2 Using a graphics calculator
1 Using the average daily sales figures for McDonald’s, KFC and Pizza Hut
indicated in the spreadsheet in Task 1, enter these figures into your graphics
calculator as three separate lists, L1, L2 and L3.
2 Use the statistical function on the calculator to determine the following for each
set of data:
a mean
b standard deviation
c five-number summary data.
3 Use the statistical plotting function on your calculator to draw parallel boxplots
of the three sets of data.
4 Compare these results with those you obtained on the spreadsheet.
Concluding the Egyptian skulls study
In a previous investigation, you made recommendations to Archie regarding the
male Egyptian skulls he discovered at a new site. Your conclusions were based on
numerical calculations of mean, median, mode, standard deviation and range. It is
now appropriate to check whether the conclusions drawn at that stage would be
consistent with those which might be made on the basis of graphical comparisons.
We will now compare five-number summaries of the new site data with those of
the 4000 BC and AD 150 measurements. Values for the two known eras are shown
in the following table. The five-number values have also been included for the
breadth of the skulls at the New site.
1 Consult the previous investigation, which tabled data from the New site. Copy
and complete the table.
inv
estigat
ioninv
estigat
ion

Maths A - Chapter 10

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Maths A - Chapter 10

Similar to Maths A - Chapter 10 (20)

More from westy67968

More from westy67968 (6)

Recently uploaded

Recently uploaded (20)

Maths A - Chapter 10