1.
Maths A Yr 12 - Ch. 04 Page 167 Wednesday, September 11, 2002 4:07 PM
4
Populations,
samples,
statistics and
probability
syllabus reference
eference
Strand:
Statistics and probability
Core topic:
Exploring and understanding
data
In this chapter
chapter
4A
4B
4C
4D
4E
Populations and samples
Samples and sampling
Bias
Contingency tables
Applications of statistics and
probability
2.
Maths A Yr 12 - Ch. 04 Page 168 Friday, September 13, 2002 9:19 AM
168
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Populations and samples
Early population counts were musters, where community members were gathered
and counted. In 1828, the ﬁrst Australian census was conducted in New South
Wales. Each State conducted its own separate census until 1886, ﬁve years after
the ﬁrst simultaneous census of the British Empire. In 1901, a common census
was conducted throughout Australia; however, the results were not collated to
form a total for Australia.
The Census and Statistics Act of December 1905 provided that: ‘The
Census shall be taken in the year 1911, and in every tenth year thereafter.’
During the Depression and World War II, no census was taken. The ﬁrst
post-war census took place in Australia in 1947.
The types of questions have changed over time to reﬂect the changes in our
society. The time required to process the responses to the questions has been
reduced with the introduction of Optical Mark Reading machines (1991 census)
and Intelligent Character Recognition machines which can read handwritten words
in the 2001 census. Since 1961, a census has been held every ﬁve years, and the
fourteenth national Census of Housing and Population was held on 7 August 2001.
The 2001 census coincided with Australia’s Centenary of Federation. Participants
were given the opportunity to place their census forms in a time capsule (to be held by
the National Archives) for 99 years. Descendants would then have a glimpse into the
lives of their forebears.
Many of the skills required for this chapter were developed in Year 11 (chapters 9 and 10
of Maths Quest Maths A Year 11). Revise the methods by completing the following exercises.
1 Write each of the following as a decimal (correct to 3 decimal places).
1
------------a 3
b ----c 65
d 124
210
80
12
8
2 Convert each of the following to percentages.
-a 3
b 0.125
c
4
85
-------200
d 0.04
3 Use your calculator to generate a set of 10 random integers in the range:
a 1 to 20 inclusive
b 50 to 100 inclusive.
4 Round the following numbers to integers.
a 3.6
b 4.02
c 2.91
d 6.5
e 0.9
5 Find the unknown in each of the following.
1 2
3
b
2 5
a -- = -b -- = ----c -- = -4 a
7 21
9 c
5 2
d -- = -d 7
e
f
7
-- = -9 6
6 What types of features on a graph can cause it to be misleading?
Work
ET
SHE
4.1
7 For the following sets of scores x:
6, 9, 8, 7, 6, 5, 8, 11, 6, 7
Calculate:
a Σx
b –
x
d mode
e lower quartile
g range
h inter-quartile range.
c
f
median
upper quartile
3.
Maths A Yr 12 - Ch. 04 Page 169 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
es
in
ion v
t i gat
n inv
io
es
169
Australia’s population and
housing census
It is important that we understand the reason for recording statistical data accurately.
In our society, it is difﬁcult to imagine a world without statistics. Try to imagine a
State of Origin football match where no one kept the score! The excitement of the
game would probably hold our attention for a while, but if no score was recorded,
winning or losing would not be an issue, and we would soon lose interest.
A census is an example of information collected from the whole population. It
is not always possible or feasible to conduct a questionnaire on the whole
population, so when this opportunity arises, it is vital to ensure that the questions
are carefully worded and that relevant information is sought.
The Australian Bureau of Statistics (ABS) is the government department
responsible for administering the Australian census, then collating and analysing
the responses. Their website <www.jaconline.com.au/maths/weblinks> details
information about their role and it displays statistical data from many areas. Access
this site to conduct your research. Prepare a report providing responses to the
following:
1 What is a national housing and population census?
2 Who takes part?
3 Is it compulsory to take part?
4 What types of questions are asked in the census? How have they changed over
the years?
5 Why should we have a census?
6 Who has access to the information we provide?
7 How is the census conducted?
8 Conclude your report with an expression of your opinion (agreement/
disagreement) of the answers gathered from your research. Provide constructive
suggestions to improve any aspect of the gathering, collating and analysing of
the census data.
Populations
A census represents information or data collected from every member of the population. The term population does not necessarily represent a group of people; it is also
used to represent a group of objects with the same deﬁned characteristics. So, the population under study may be the wildlife in a national forest, the number of wattle trees in
a park, the soil in a farmer’s ﬁeld or the number of cars in a country town.
In some cases it may be possible to determine the exact extent of the population (the
number of wattle trees in the park or the number of cars in a country town); however, it
is often not possible to obtain an exact ﬁgure for the population (the extent of the wildlife in a forest) because circumstances are constantly changing.
Sometimes it is not physically possible to consider the whole population (all the soil
in a farmer’s ﬁeld), as it would not be practical. It is often very costly and time consuming to consider the whole population in a study. For these reasons, we need to
obtain information about the population by selecting a sample that can then be studied.
A census is conducted when we obtain information from the whole population;
however, a survey is conducted on a sample of the population.
t i gat
4.
Maths A Yr 12 - Ch. 04 Page 170 Wednesday, September 11, 2002 4:07 PM
170
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
The particular circumstances determine the status of the
body being studied (whether it represents the population or a
sample of the population). Consider, for example, your Mathematics A class. If we were to try to determine the number of
left-handed people in your school who studied Mathematics
A, and there was only one such class in your school, then
your class would be regarded as the whole population. If, on
the other hand, there were several Mathematics A classes in
your school, then your class would be considered a sample of
the population.
Samples
It is most important when selecting a sample from a population that the sample represents the population as closely as
possible. For this to occur, the characteristics of the sample
should occur in the same proportions as they do in the population. There is little point in selecting a sample where this is
not the case, for analysis of the sample would lead to misleading conclusions. We often see this occurring when polls
are conducted prior to an election. Quite frequently they predict a particular outcome while the election results in a different outcome.
WORKED Example 1
In each of the following, state if the information was obtained by census or survey.
a A school uses the roll to count the number of students absent each day.
b The television ratings, in which 2000 families complete a questionnaire on what they
watch over a one-week period.
c A light globe manufacturer tests every hundredth light globe off the production line.
d A teacher records the examination marks of her class.
THINK
WRITE
a Every student is counted at roll call each
morning.
b Not every family is asked to complete a
ratings questionnaire.
c Not every light globe is tested.
d The marks of every student are recorded.
a Census
b Survey
c Survey
d Census
remember
remember
1. Before beginning a statistical investigation it is important to identify the target
population.
2. The information can be obtained either by:
(a) Census — the entire target population is questioned, or
(b) Survey — a population sample is questioned such that those selected are
representative of the entire target population.
5.
Maths A Yr 12 - Ch. 04 Page 171 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
4A
WORKED
Example
1
171
Populations and samples
1 Copy and complete the following:
When we obtain data from the whole population, we conduct a _______________;
however, a survey obtains data from a _______________ of the population.
2 A school conducts an election for a new school captain.
Every teacher and student in the school votes. Is this an
example of a census or a survey? Explain your answer.
3 A questionnaire is conducted by a council to see what sporting
facilities the community needs. If 500 people who live in the
community are surveyed, is this an example of a census or a
survey?
4 For each of the following, state whether a census or a survey
has been used.
a Two hundred people in a shopping centre are asked to
nominate the supermarket where they do most of their grocery shopping.
b To ﬁnd the most popular new car on the road, 500 new car
buyers are asked what make and model car they purchased.
c To ﬁnd the most popular new car on the road, the make and
model of every new car registered are recorded.
d To ﬁnd the average mark in the mathematics half-yearly
examination, every student’s mark is recorded.
e To test the quality of tyres on a production line, every 100th
tyre is road tested.
5 For each of the following, recommend whether you would use
a census or a survey to ﬁnd:
a the most popular television program on Monday night at 7.30 pm
b the number of cars sold during a period of one year
c the number of cars that pass through the tollgates on the Brisbane Gateway Bridge
each day
d the percentage of defective computers produced by a company.
6 An opinion poll is conducted to try to predict the outcome of an election. Two thousand people are telephoned and asked about their voting intention. Is this an example
of a census or a survey?
Samples and sampling
When we select a sample from a population, if it has been chosen carefully, it should,
upon analysis of the data, yield the same (or very similar) results to those of the population.
A decision must be made regarding the size of the sample. In practice, the size chosen is
the smallest one that would be considered appropriate in those circumstances and the size
that would yield a proportion of the elements close to that occurring in the population.
6.
Maths A Yr 12 - Ch. 04 Page 172 Friday, September 13, 2002 9:19 AM
172
es
in
ion v
t i gat
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Sample size
The aim of this investigation is to observe how the composition of a sample is
affected by the sample size.
1 Take a large packet of mixed coloured jellybeans (200 or more) as the
population. (Coloured disks could be substituted.)
2 Place the jellybeans in a container and mix well. Without looking, draw out a
sample of 10 in such a way that each jellybean has an equal chance of being
selected. This can then be considered a random sample. Count the number of
red jellybeans in the sample of 10.
3 Return the sample of 10 jellybeans to the container, mixing them well with the
others. Select a random sample of 20 jellybeans, using the same method as
before; record the number of red ones.
4 Continue in this manner, returning each sample to the container, mixing them
well, then selecting a sample containing 10 more than the previous selection.
Record the number of red jellybeans in each of the samples.
5 Generate a table of the format below:
Sample size
Number of red jellybeans
10
Proportion of red jellybeans
(as a decimal)
20
30
…
200
Whole population
6 Enter the data in the ﬁrst and third columns (sample size and proportion) into
a spreadsheet or graphics calculator. Graph the sample size against the
proportion of red jellybeans. (Alternatively, this could be graphed on graph
paper.)
7 Knowing that the proportion of red jellybeans in the whole population (the
ﬁnal row in the table above) represents the true answer, comment on the effect
of the sample size on the composition of the sample.
8 For your particular experiment, what would be the minimum sample size
which closely resembles the composition of the population?
9 If a sample is used to predict the composition or characteristics of a
population, describe what you feel are the desirable qualities of the sample in
order to be a reliable predictor of the composition or characteristics of the
population.
10 Repeat the experiment. Comment on the similarities/differences in your
results.
Sampling methods
Several techniques can be employed to select a sample from a population. Some
common methods are random sampling, accessibility sampling, systematic
sampling, quota sampling, judgmental sampling, stratiﬁed sampling, cluster
sampling, and capture–recapture sampling.
t i gat
7.
Maths A Yr 12 - Ch. 04 Page 173 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
173
Random sampling
A simple random sample is one for which each element of the population has an equal
chance of being chosen. A way in which this can be achieved is by numbering each
element of the population then randomly selecting items for the sample by using
random digit tables, the random function on a calculator or numbers drawn from a
container.
es
in
ion v
t i gat
n inv
io
es
Random sampling
The aim of this investigation is to compare different random sampling techniques
as methods of selecting a sample that is representative of the population. Consider
selecting a random sample of ten (10) students from your mathematics A class.
(Your class is the population in this investigation. You may adjust the sample size if
you wish.)
1. Select a characteristic that is present in some of your class members such as
brown eyes, fair hair, height above 175 cm and so on.
2. Calculate and record the proportion of the population in your class with this
characteristic.
3. Have the students number off 1, 2, 3, … until all students have a number. This
number for our purposes may be regarded as the population number.
Task 1
Using random digit tables to select a sample
Tables of randomly generated digits are published. Below are samples of sets of
two-, three- and four-digit random number tables. These tables are generally much
larger than the extracts shown. For our purposes, this size will be sufﬁcient.
Two-digit random number table
16
79
43
59
41
16
39
29
11
12
13
54
24
09
46
24
93
53
28
82
25
56
61
15
97
82
65
77
94
82
85
41
99
74
09
05
98
89
72
10
71
51
35
29
52
52
89
02
92
96
02
81
92
89
17
08
04
63
43
03
84
67
19
23
43
11
05
17
08
07
36
36
72
21
86
99
28
41
24
22
23
04
78
05
33
01
66
06
04
57
80
22
99
14
89
15
65
19
06
25
t i gat
8.
Maths A Yr 12 - Ch. 04 Page 174 Wednesday, September 11, 2002 4:07 PM
174
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Three-digit random number table
382
093
530
260
651
344
157
738
522
592
452
981
272
886
907
683
894
946
831
521
557
374
900
425
461
145
098
792
793
388
694
914
642
153
901
642
100
851
365
840
435
104
419
685
626
383
326
376
246
586
851
474
369
272
566
488
420
696
272
547
869
681
282
129
194
236
467
014
699
196
895
662
376
612
435
080
818
396
572
809
282
274
363
903
771
370
799
277
636
313
464
680
859
249
093
848
370
303
661
495
Four-digit random number table
4070
8145
3435
0891
8504
6691
5329
3729
6800
6262
7368
6927
7980
6625
7301
0145
8729
8145
5299
3951
8859
8070
3664
1177
1821
3729
0064
3715
8166
2427
3065
6791
3344
9357
2928
3807
7301
9513
0058
4049
6776
6603
1700
5233
0925
5817
3709
3213
1282
8856
7977
8319
6074
7955
2059
4763
8885
8565
0755
5087
4843
0033
1948
2371
5640
9865
2105
5484
8890
6160
7678
3588
7213
5572
6939
2544
2461
3232
9394
0253
8521
9289
5756
9137
6540
5741
1777
2149
4079
5279
9895
0709
0323
7394
5003
2494
6829
4634
3586
6238
The following rules apply to the use of random digit tables.
Step 1 Begin at any position in the table (this position being chosen randomly).
Step 2 Move in any direction (vertically, horizontally) along a column or row.
Step 3 Continue moving in this direction, recording the numbers as you go.
Step 4 If you use a three-digit or four-digit table, and you require only one or
two-digit numbers in your selection, you may choose to use the digit/s on
the left, on the right, on the edges and so on. (Make some random choice,
and then be consistent.)
Step 5 If the same number is repeated, do not record the number a second time.
Step 6 Continue recording until the required total has been reached.
9.
Maths A Yr 12 - Ch. 04 Page 175 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
175
1 Use the two-digit random number table to select ten numbers within the range
of numbers in your class.
2 Determine the students in your class to whom these numbers refer.
3 Calculate and record the proportion of these students with the characteristic you
chose. How closely does it match the population proportion which you have
previously calculated?
4 Repeat the experiment using the three-digit random number table, calculating
and recording the proportion of students in this sample with your chosen
characteristic.
5 Repeat using the four-digit random number table, again calculating and
recording the proportion.
6 Compare the results obtained from your three samples with each other and with
the population proportion. What conclusion/s can you form?
Task 2
Using the random function on a calculator
1 Many scientiﬁc and graphing calculators can be set to generate random integers
in the range of your population number. (Your teacher will show you, if you are
unsure.)
2 Use your calculator to generate ten different random integers.
3 Relate these numbers to speciﬁc students in your class.
4 Calculate and record the proportion of students in your sample with your
chosen characteristic.
5 Compare this value with the population proportion.
Task 3
Using lot sampling
This type of sampling is used in drawing lotto winning numbers.
1 Write numbers (up to and including your population number) on small, equally
sized pieces of paper and place them in a container. Mix well.
2 Draw the numbers one at a time (without replacing them) until ten numbers
have been drawn.
3 Relate these numbers to the relevant students, as before.
4 Calculate and record the proportion of students with your chosen characteristic
in this sample.
5 Compare the value with the population proportion.
Conclusions
1 Draw up a table to display the results of all your experiments.
2 Compare the results obtained using the various techniques.
3 Did you ﬁnd one method better than any other?
4 How did the results using these three methods compare with the population
result?
10.
Maths A Yr 12 - Ch. 04 Page 176 Wednesday, September 11, 2002 4:07 PM
176
es
in
ion v
t i gat
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Generating random integers
using a spreadsheet
This activity creates a spreadsheet to generate random integers (whole numbers)
within a given range. Consider the spreadsheet below.
1 Enter the headings in cells A1, A3, A4, A6, A7, A8 and A10.
2 Leave cells B7 and B8 blank. You will enter values in these cells once you run
the spreadsheet.
3 In cell B11, enter the formula =INT(RAND()*($B$8-$B$7+1))+$B$7). This
formula will generate a random integer in the range of the value entered in cell
B7 to the value entered in cell B8 inclusive. (You will not ﬁnd a correct value
appears until you enter values in cells B7 and B8.)
4 Copy this formula to the region B11 to K20. This will generate 100 random
integers in this region.
5 The function F9 will recalculate different sets of random integers. Add this
instruction to cell A22.
6 Enter values in B7 and B8. Notice the set of integers produced. Press the F9
key to generate a different set. Continue to generate new sets, making sure that
the numbers generated are within the range of those entered in cells B7 and B8.
You will ﬁnd that if you generate large integers you may have to widen
columns B to K.
7 Save your spreadsheet and obtain a printout.
8 You may wish to use this spreadsheet for generating two-, three- and four-digit
random number tables for your own use.
t i gat
11.
Maths A Yr 12 - Ch. 04 Page 177 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
177
WORKED Example 2
Use the three-digit random number table on page 174 to select ten students from a
numbered class of 30 according to the following rules.
Rule 1 Start in the bottom left-hand corner.
Rule 2 Snake up and down the columns.
Rule 3 Select the two digits on the right as the student number
Note: Use the three-digit random number table from page 174.
THINK
1
2
3
WRITE
The selected numbers must be in the
range 1 to 30 inclusive.
Moving up the ﬁrst column on the left,
reading the last two digits, there are no
numbers in the range. Continue by
snaking down the second column and
so on, until 10 numbers have been
selected (ignore the second occurrence
of a number).
Give 10 student numbers.
Students selected have numbers 14, 4, 19, 30,
25, 29, 12, 3, 26 and 1.
Accessibility sampling
This method of sampling selects those items that are most accessible. Consider the
following:
1. The student body in a school is investigating extended hours for the library. A
survey, conducted on students who were using the library after hours, overwhelmingly supported the proposal for extended hours.
2. The same survey, conducted on students in the gymnasium after school, indicated no
need for extending the library hours.
As can be seen from this example, bias can be introduced into the results of a survey by
carefully selecting the sample to either support or refute a cause. A sample drawn only
from easily accessible items often does not represent the views of the population.
Systematic sampling
In this method, the sample items are selected using some system, such as every tenth
item, every item on the top left-hand corner of a page or every item in the ﬁfth position
of a set of lists.
Consider a survey to determine the most popular service provider for Internet
subscribers. The sample could be selected by ringing every:
• one hundredth name in the telephone directory
• last name on each page
• name at the top of each list of names on every page.
Using the telephone directory to obtain survey data has the obvious disadvantage of
excluding those who do not own a telephone and those with an unlisted telephone
number. When using a systematic method as a sampling technique, best results are
obtained if the population is ﬁrst arranged randomly.
12.
Maths A Yr 12 - Ch. 04 Page 178 Wednesday, September 11, 2002 4:07 PM
178
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Quota sampling
The quota technique speciﬁes a particular number of items to be surveyed. Consider
the following scenario:
A group of businessmen is considering establishing a grammar school (no religious
afﬁliations) in a town of approximately 50 000 people. They decide to conduct a market
survey on 1000 people. They specify the composition of the sample as follows.
• The group should have 500 males and 500 females.
• Within each of those groups, half should be school children and half adult.
• The groups must contain people of all religious denominations — not in equal proportions, as they do not occur equally in the community.
• From the 500 children selected, there should be 200 from non-government schools
and 300 from government schools.
Within these speciﬁed quotas, the person responsible for choosing the sample can
use any sampling strategy. This leaves the sample open to bias, depending on the integrity of those selecting the sample. It also enables substitutions to occur when those
originally selected for the sample are not readily accessed.
Bearing in mind the problems associated with this type of sampling, this method can
prove to be cost-effective and quite reliable in its predictions if the composition of the
sample is appropriate and the sample is selected in an unbiased manner.
Judgmental sampling
Using this method of obtaining a sample, the person conducting the survey must make
a judgment as to the composition of the sample. This obviously is reliant on the good
judgment of those selecting the sample. Consider, for instance, undertaking a survey on
the bus service(s) (or lack thereof) in a city or town. If a judgment was made to select
the sample from only those who used the service(s), this could result in an entirely different outcome from what might occur if there had been a balance from both bus-users
and non bus-users. Consequently, we should be wary of surveys conducted using this
technique (although we probably would not be aware that this technique was the
method used). It is timely to reinforce the fact that, when bombarded with statistical
facts (and this occurs in our lives daily) we should not accept these ﬁgures without
question.
Stratiﬁed sampling
When a sample is selected from a
population consisting of various
strata, or levels, it is important to have
the strata or levels in the sample occurring in the same proportions as they do
in the population.
Consider the situation where a student council body is to be formed from
Years 8 to 12 students in a school. It
would not seem fair to have an equal
number from each of the year levels if
there were, for instance, twice as many
Year 9 students as there were Year 12
students.
13.
Maths A Yr 12 - Ch. 04 Page 179 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
179
WORKED Example 3
The number of students in a school is
shown in the table.
A student council is to be formed, consisting
of 15 members of the student body. The
composition of the council must reﬂect the
proportions in the population. How many
from each year level should be chosen?
Year level
Number of students
2
11
10
180
165
200
WRITE
Find the total number of
students.
Determine the proportion in
each year level.
Total number of students = 750
Proportion of students:
85
Year 12s = -------750
Year 11s =
Year 10s =
Year 9s =
Year 8s =
3
120
8
1
85
9
THINK
12
Multiply these proportions by
the sample number and
construct a table.
Round if necessary.
Year
level
120
-------750
180
-------750
165
-------750
200
-------750
Number of
students
Number in
sample
6
Write the answer.
85
-------750
× 15 = 1.7; i.e. 2
11
120
120
-------750
× 15 = 2.4; i.e. 2
10
180
180
-------750
× 15 = 3.6; i.e. 4
165
165
-------750
× 15 = 3.3; i.e. 3
200
200
-------750
× 15 = 4;.0 i.e. 4
Total
Check to ensure that the total
sample number is 15.
85
8
5
12
9
4
750
15
Two students should be chosen from Year 12, two from
Year 11, four from Year 10, three from Year 9 and four
from Year 8.
Cluster sampling
This method involves selecting clusters within a population and selecting a sample
from within these clusters. The subgroups selected from the population should be identiﬁed. Consider the situation where a market survey is to be conducted on the cost of
tertiary education. If the chosen clusters included only ones situated in poorer areas of
the community, the results of the survey would differ vastly from those occurring from
cluster groups consisting of only ones from more afﬂuent areas. It is important,
14.
Maths A Yr 12 - Ch. 04 Page 180 Wednesday, September 11, 2002 4:07 PM
180
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
therefore, if using this method of sampling, that the clusters are chosen to be as closely
representative of the population as possible. If this is the case, the survey can yield
quite reliable results in a far shorter time and at a greatly reduced cost when compared
with collecting data from the whole population.
Capture–recapture sampling
Capture–recapture sampling is particularly useful for estimating populations of items
that are difﬁcult or impossible to count, such as plants and animals. It is often necessary
to monitor wildlife numbers to prevent the occurrence of plagues and the extinction of
species. The technique used is to capture a certain number of the species, tag them,
then release them. At a later date, another sample is caught and the number of tagged
specimens in the sample observed. From this information, the population of the species
can be estimated. This method is best illustrated with a practical example.
es
in
ion v
t i gat
n inv
io
es
Capture–recapture technique
Place a large number of different coloured disks in a container (without counting
the number of each colour or the total). The various colours can represent the
variety of wildlife (say, ﬁsh in a dam). We are interested in determining the number
of yellow-belly ﬁsh in the dam (represented by the red-coloured disks in the
container).
1 Mix the disks thoroughly.
2 Draw out two handfuls of disks; mark the red disks to identify them as being
tagged. Count the number of tagged red disks and let this number be t.
3 Replace all these disks in the container; mix well.
4 Draw out a handful of disks; count the number of red disks in the sample (rs);
count the number of tagged red disks in the sample (ts). Replace all the disks in
the container and mix well.
5 Repeat the process of drawing out a handful of disks, counting the number of
red disks and the number of tagged red disks in each sample. Continue until
data for ten (10) samples have been obtained.
6 Draw up the table below to collate the sample data.
Trial
Number red tagged (ts)
Number red (rs)
1
2
3
4
5
6
7
8
9
10
Total
Σ ts =
Σ rs =
t i gat
15.
Maths A Yr 12 - Ch. 04 Page 181 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
181
7 For a sufﬁciently large number of trials, we could say that these samples
represent the population in miniature. This means that the ratio of tagged red
disks in the samples to the number of red disks in the
samples should be close to the ratio of the total number
of red tagged disks in the population to the total number
of red disks (r).
∑ ts t
--------- = ∑ rs r
Substitute the three known values in the equation; solve
to determine the value of ‘r’. (For a capture–recapture of
tagged yellow-belly, r would then represent an estimate
for the number of yellow-belly ﬁsh in the dam.)
8 Count the number of red disks in the container. How
close was this number to your estimate, calculated
above?
This method of sampling and population estimation
obviously has its limitations. For the yellow-belly example,
we are assuming that the situation in the lake remains
relatively stable; that the types of ﬁsh are uniformly
distributed throughout the lake; that the number of births is
roughly equal to the number of deaths and that intensive
ﬁshing has not occurred in the lake over the time period
between tagging and recapture. Any calculations of
populations of species in the wild must be considered as
estimates, as we can not be certain of exact numbers.
WORKED Example 4
In estimating the number of ﬁsh in a lake, 500 ﬁsh are caught, tagged then released back
into the dam. A week later a batch of 80 ﬁsh are caught and 25 of them are found to be
tagged. Estimate the number of ﬁsh in the dam.
THINK
1
The proportion of tagged ﬁsh in
the population closely resembles
the proportion of tagged ﬁsh in
the sample caught later.
2
Form an equation.
3
Solve the equation.
4
Write the estimate.
WRITE
Number tagged fish Number tagged in sample
---------------------------------------------- = ------------------------------------------------------------Population of fish
Size of sample
500 25
-------- = ----p
80
25p = 500 × 80
500 × 80
p = -------------------25
= 1600
There are an estimated 1600 ﬁsh in the dam.
16.
Maths A Yr 12 - Ch. 04 Page 182 Wednesday, September 11, 2002 4:07 PM
182
es
in
ion v
t i gat
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
ABS interviewer survey
The ABS conducts a census every ﬁve years. To monitor changes that might occur
between these times, surveys are conducted on samples of the population. The ABS
selects a representative sample of the population and interviewers are allocated
particular households. It is important that no substitutes occur in the sampling. The
interviewer must persevere until the selected household supplies the information
requested. It is a legal requirement that selected households cooperate.
The following questionnaire is reproduced from the ABS website
<www.jaconline.com.au/maths/weblinks>. It illustrates the format and types of
questions asked by an interviewer collecting data regarding employment from a
sample.
MINIMUM SET OF QUESTIONS WHEN INTERVIEWER USED — Q1 to Q17
Q.1.
I WOULD LIKE TO ASK ABOUT LAST WEEK, THAT IS, THE WEEK STARTING
MONDAY THE … AND ENDING (LAST SUNDAY THE …/YESTERDAY).
Q.2.
LAST WEEK DID … DO ANY WORK AT ALL IN A JOB, BUSINESS OR FARM?
K Go to Q.5
Yes
K
No
K No More Questions
Permanently unable to work
K No More Questions
Permanently not intending to work
(if aged 65+ only)
Q.3.
LAST WEEK DID … DO ANY WORK WITHOUT PAY IN A FAMILY
BUSINESS?
K Go to Q.5
Yes
K
No
K No More Questions
Permanently not intending to work
(if aged 65+ only)
Q.4.
DID … HAVE A JOB, BUSINESS OR FARM THAT … WAS AWAY FROM
BECAUSE OF HOLIDAYS, SICKNESS OR ANY OTHER REASON?
K
Yes
K Go to Q.13
No
K No More Questions
Permanently not intending to work
(if aged 65+ only)
Q.5.
DID … HAVE MORE THAN ONE JOB OR BUSINESS LAST WEEK?
Yes
K
No
K Go to Q.7
Q.6.
THE NEXT FEW QUESTIONS ARE ABOUT THE JOB OR BUSINESS IN
WHICH … USUALLY WORKS THE MOST HOURS.
Q.7.
DOES … WORK FOR AN EMPLOYER, OR IN … OWN BUSINESS?
Employer
K
Own business
K Go to Q.10
Other/Uncertain
K Go to Q.9
Q.8.
IS … PAID A WAGE OR SALARY, OR SOME OTHER FORM OF PAYMENT?
Wage/Salary
K Go to Q.12
Other/Uncertain
K
t i gat
17.
Maths A Yr 12 - Ch. 04 Page 183 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
Q.9.
183
WHAT ARE … (WORKING/PAYMENT) ARRANGEMENTS?
K Go to Q.13
Unpaid voluntary work
K
Contractor/Subcontractor
K
Own business/Partnership
K
Commission only
K Go to Q.12
Commission with retainer
K Go to Q.12
In a family business without pay
K Go to Q.12
Payment in kind
K Go to Q.12
Paid by the piece/item produced
K Go to Q.12
Wage/salary earner
K Go to Q.12
Other
Q.10. DOES … HAVE EMPLOYEES (IN THAT BUSINESS)?
Yes
K
No
K
Q.11. IS THAT BUSINESS INCORPORATED?
Yes
No
K
K
Q.12. HOW MANY HOURS DOES … USUALLY WORK EACH WEEK IN (THAT JOB/
THAT BUSINESS/ALL … JOBS)?
1 hour or more
K No More Questions
Less than 1 hour/no hours
K
Insert occupation questions if required
Insert industry questions if required
Q.13. AT ANY TIME DURING THE LAST 4 WEEKS HAS … BEEN LOOKING FOR
FULL-TIME OR PART-TIME WORK?
Yes, full-time work
K
Yes, part-time work
K
No
K No More Questions
Q.14. AT ANY TIME IN THE LAST 4 WEEKS HAS …
Written, phoned or applied in person
to an employer for work?
Answered an advertisement for a job?
Looked in newspapers?
Yes
No
Checked factory notice boards, or
used the touchscreens at Centrelink ofﬁces?
K
K
K
K
K
AT ANY TIME IN THE LAST 4 WEEKS HAS …
Been registered with Centrelink as a jobseeker?K
Checked or registered with an employment
K
agency?
K
Done anything else to ﬁnd a job?
K
Advertised or tendered for work
K
Contacted friends/relatives
K No More Questions
Other
K No More Questions
Only looked in newspapers
K No More Questions
None of these
Q.15. IF … HAD FOUND A JOB COULD … HAVE STARTED WORK LAST WEEK?
Yes
K
No
K
Don’t know
K
18.
Maths A Yr 12 - Ch. 04 Page 184 Wednesday, September 11, 2002 4:07 PM
184
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Remaining questions are only required if Duration of
Unemployment is needed
for output or to derive the long term unemployed.
Q.16. WHEN DID … BEGIN LOOKING FOR WORK?
Enter Date
......./......./.......
Less than 2 years ago
DD MM YY
......./......./.......
2 years or more ago
DD MM YY
......./......./.......
DD MM YY
5 years or more ago
Did not look for work
K
Q.17. WHEN DID … LAST WORK FOR TWO WEEKS OR MORE?
Enter Date
......./......./.......
Less than 2 years ago
DD MM YY
......./......./.......
2 years or more ago
DD MM YY
......./......./.......
DD MM YY
5 years or more ago
Has never worked (for two weeks or more)
K No More Questions
Reading the questionnaire carefully you will note that, although the questions are
labelled 1 to 17, there are only ﬁfteen (15) questions requiring answers (two are
introductory statements to be read by the interviewer). Because of directions to
forward questions, no individual would be asked all ﬁfteen questions.
1 How many questions would be asked of those who have a job?
2 How many questions would unemployed individuals answer?
3 How many questions apply to those not in the labour force?
Choose a topic of interest to you and conduct a survey
1 Design an interview questionnaire of a similar format to the ABS survey, using
directions to forward questions.
2 Decide on a technique to select a representative sample of the students in your
class.
3 Administer your questionnaire to this sample.
4 Collate your results.
5 Draw conclusions from your results.
6 Prepare a report which details the:
a aim of your survey
b design of the survey
c sample selection technique
d results of the survey collated in table format
e conclusions.
19.
Maths A Yr 12 - Ch. 04 Page 185 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
185
remember
remember
There are many methods for selecting a sample. Some important methods include:
1. Random sample — chance is the only factor in deciding who is surveyed. This
is best done using a random number generator.
2. Stratiﬁed sample — those sampled are chosen in proportion to the entire
population.
3. Systematic sample — a system is used to choose those who are to be in the
sample.
4. Accessibility sample — those within easy access form the sample.
5. Quota sample — a quota is imposed on the number in the sample.
6. Judgmental sample — judgment is made regarding those to be sampled.
7. Cluster sample — those in the sample are chosen from clusters within the
population.
8. Capture–recapture sample — used to estimate wildlife populations.
4B
WORKED
Example
2
Samples and sampling
1 Use the two-digit random number table on page 173. Start at the bottom left-hand corner
then snake up and down the columns selecting 10 numbers in the range 50 to 99.
2 Use your calculator to generate 10 random integers in the range 50 to 99.
3 Use your calculator to generate a set of random two-digit integers in the range 01 to
99. Write these numbers in table format. Use your table (and some random selection
technique) to select 10 random integers in the range 50 to 99.
4 Compare your answers to questions 1, 2 and 3. Does it appear that three different sets
of random numbers resulted?
5 Describe the techniques employed to select samples in each of the following situations.
a Drawing student numbers from a hat to select those to attend the athletics carnival
b Choosing the best student in each class to form a student council body
c Interviewing the students at the school tuck-shop for an opinion regarding the
school uniform
d Selecting those students in a classroom sitting next to a window to form a debating
group
e Selecting one quarter of the students from each year level to represent the school
at a local function
6 For each of the following, state whether the sample used is an example of random,
stratiﬁed or systematic sampling.
a Every tenth tyre coming off a production line is tested for quality.
b A company employs 300 men and 450 women. The sample of employees chosen
for a survey contains 20 men and 30 women.
c The police breathalyse the driver of every red car.
d The names of the participants in a survey are drawn from a hat.
e Fans at a football match ﬁll in a questionnaire. The ground contains 8000 grandstand seats and 20 000 general admission seats. The questionnaire is then given to
40 people in the grandstand and 100 people who paid for a general admission seat.
20.
Maths A Yr 12 - Ch. 04 Page 186 Wednesday, September 11, 2002 4:07 PM
186
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
7 multiple choice
Which of the following is an example of a systematic sample?
A The ﬁrst 20 students who arrive at school each day participate in the survey.
B Twenty students to participate in the survey are chosen by a random number
generator.
C Twenty students to participate in the survey are selected in proportion to the
number of students in each school year.
D Ten boys and 10 girls are chosen to participate in the survey.
8 multiple choice
Which of the following statistical investigations would be practical to complete by
census?
A A newspaper wants to know public opinion on a political issue.
B A local council wants to know if a skateboard ramp would be popular with young
people in the area.
C An author wants a cricket player’s statistics for a book being written.
D An advertising agency wants to know the most watched program on television.
WORKED
Example
3
9 The table below shows the number of students in each year at a NSW secondary
school.
Year
7
8
9
10
11
12
Total
No. of students
90
110
90
80
70
60
500
If a survey is to be given to 50 students at the school, how many from each Year
should be chosen if a stratiﬁed sample is used?
10 A company employs 300 men and 200 women. If a survey of 60 employees using a
stratiﬁed sample is completed, how many people of each sex participated?
11 The table below shows the age and gender of the staff of a corporation.
Age
Male
Female
20–29
61
44
30–39
40
50
40–49
74
16
50–59
5
10
A survey of 50 employees is to be done. Using a stratiﬁed survey, suggest the
breakdown of people to participate in terms of age and sex.
4.1
12 The ﬁsh population of a river is to be estimated. A sample of 400 ﬁsh are caught,
tagged and released. The next day another sample of 400 ﬁsh are caught and 40 of
4
them have tags. Estimate the ﬁsh population of the river.
WORKED
Example
13 A colony of bats live near a school. Wildlife ofﬁcers try to estimate the bat population
by catching 60 bats and tagging them. These bats are then released and another 60 are
caught, 9 of which had tags. Estimate the size of the bat population living near the
school.
21.
Maths A Yr 12 - Ch. 04 Page 187 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
187
14 A river’s ﬁsh population is to be estimated. On one day 1000 ﬁsh are caught, tagged
and released. The next day another 1000 ﬁsh are caught. Estimate the population of
the river if in the second sample of ﬁsh:
a 100 had tags
b 40 had tags
c 273 had tags.
15 A certain ﬁsh population is said to be endangered if the population falls below 15 000.
A sample of 1000 ﬁsh are caught, tagged and released. The next day another sample
of 1200 ﬁsh are caught, 60 of which had tags. Is the ﬁsh population endangered?
16 To estimate the population of a lake, 300 ﬁsh were
caught. These 300 ﬁsh (150 trout, 100 bream and 50
perch) were tagged and released. A second sample of ﬁsh
were then caught. Of 100 trout, 24 had tags; of 100
bream, 20 had tags; and of 100 perch, 8 had tags.
a Estimate the number of trout in the lake.
b Estimate the number of bream in the lake.
c Estimate the number of perch in the lake.
17 The kangaroo population in a national park is to be estimated. On one day, 100 kangaroos were caught and tagged before being released. (Note: For each sample taken, the
kangaroos are released after the number with tags is counted.)
a The next day 100 were caught, 12 of which had tags. Estimate the population.
b The following day another estimate was done. This time 200 were caught and 20
had tags. Estimate the population again.
c A third estimate was done by catching 150 and this time 17 had tags. What will
the third estimate for the population be?
d For a report, the average of the three estimates is taken. Calculate this average.
1
For each of the following (1 to 3), state whether a census or survey has been used.
1 A school votes to elect a school captain.
2 Five hundred drivers complete a questionnaire on the state of a major highway.
3 All insurance customers complete a questionnaire when renewing their policies.
For each of the following (4 to 10), state the type of sample that has been taken.
4 A computer selects 500 phone numbers.
5 Every one thousandth person in the telephone book is selected.
6 Private and business telephone numbers are chosen in proportion to the
number of private and business listings.
7 Residents from three suburbs are selected from a town.
8 The best runners from each year level are selected.
9 Twenty students are chosen from a class.
10 All the student numbers are placed in a hat then a sample is chosen.
22.
Maths A Yr 12 - Ch. 04 Page 188 Wednesday, September 11, 2002 4:07 PM
188
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Bias
No doubt you have heard the comment, ‘There are lies, damned lies and statistics’. This
implies that we should be wary of statistical ﬁgures quoted. Indeed, we should always
make informed decisions of our own and not simply accept the mass of statistics that
bombards us through the media.
Bias can be introduced into statistics by:
1. questionnaire design
2. sampling bias
3. the interpretation of results.
Bias in questionnaire design
Consider a survey designed to collect
data on opinions relating to culling
kangaroo numbers in Australia.
The questions may be designed to be
emotive in nature. Respondents in these
situations feel obliged to show compassion. Posing a question in the form,
‘The kangaroo is identiﬁed as a native
Australian animal, not found anywhere
else in the world. Would you be in
favour of culling kangaroos in Australia?’,
would
almost
certainly
encourage a negative response.
Using a leading question (one which
leads the respondent to answer in a particular way) can cause bias to creep into
responses. Rephrasing the question in
the form, ‘As you know, kangaroos
cause massive damage on many farming
properties. You’d agree that their
numbers need culling, wouldn’t you?’,
would encourage a positive response.
Using terminology that is unfamiliar
to a large proportion of those being surveyed would certainly produce unreliable responses. ‘Do you think we need
to cull herbivorous marsupial mammals
in Australia?’, would cause most respondents to answer according to their understanding of the terms used. If the survey was conducted by an interviewer, the term
could be explained. In the case of a self-administered survey, there would be no indication of whether the question was understood or not.
Sampling bias
As discussed previously, an ideal sample should reﬂect the characteristics of the population. Statistical calculations performed on the sample would then be a reliable indication of the population’s features.
Selecting a sample using a non-random method, as discussed earlier, generally tends
to introduce an element of bias.
23.
Maths A Yr 12 - Ch. 04 Page 189 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
189
Particular responses can be selected from all those received. In collecting information on a local issue, an interviewer on a street corner may record responses from
many passers-by. From all the data collected, a sample could be chosen to support the
issue, or alternatively another sample could be chosen to refute the same issue.
A sample may be selected under abnormal conditions. Consider a survey to determine which lemonade was more popular – Kirks or Schweppes. Collecting data one
week when one of the brands was on special at half price would certainly produce misleading results.
Data are often collected by radio and television stations via telephone polls. A ‘Yes’
response is recorded on a given phone-in number, while the ‘No’ respondents are asked
to ring a different phone-in number. This type of sampling does not produce a representative sample of the population. Only those who are highly motivated tend to ring and
there is no monitoring of the number of times a person might call, recording multiple
votes.
When data are collected from mailing surveys, bias results if the non-response rate is
high (even if the selected sample was a random one). The responses received often represent only those with strong views on the subject, while those with more moderate
views tend to lack representation in their correct proportion.
Statistical interpretation bias
Once the data have been collected, collated and subjected to statistical calculations,
bias may still occur in the interpretation of the results.
Misleading graphs can be drawn leading to a biased interpretation of the data.
Graphical representations of a set of data can give a visual impression of ‘little change’
or ‘major change’ depending on the scales used on the axes (we learned about misleading graphs in Year 11).
The use of terms such as ‘majority’, ‘almost all’ and ‘most’ are open to interpretation. When we consider that 50.1% ‘for’ and 49.9% ‘against’ represents a ‘majority
for’ an issue, the true ﬁgures have been hidden behind words with very broad meanings. Although we would probably not learn the real facts, we should be wary of statistical issues quoted in such terms.
es
in
ion v
t i gat
n inv
io
es
Bias in statistics
The aim of this investigation is to study statistical data that you suspect to be
biased.
Conduct a search of newspapers, magazines or any printed material to collect
instances of quoted statistics that you believe to be biased. There are occasions
when television advertisements quote statistical ﬁgures as a result of questionable
sampling techniques. For each example, discuss:
1 the purpose of the survey
2 how the data might have been collected
3 the question(s) that may have been asked (try to pose the question(s) in a
variety of ways to inﬂuence different outcomes)
4 ways in which bias might be introduced
5 variations in interpretation of the data.
t i gat
24.
Maths A Yr 12 - Ch. 04 Page 190 Thursday, September 12, 2002 11:09 AM
190
t i gat
t i gat
es
in
ion v
t i gat
n inv
io
es
es
in
ion v
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Biased sampling
Discuss the problem that would be caused by each of the following biased samples.
1 A survey is to be conducted to decide the most popular sport in a local
community. A sample of 100 people was questioned at a local football match.
2 A music store situated in a shopping centre wants to know the type of music that
it should stock. A sample of 100 people was surveyed. The sample was taken
from people who passed by the store between 10 and 11 am on a Tuesday.
3 A newspaper conducting a Gallup poll on an election took a sample of 1000
people from the Gold Coast.
Spreadsheets creating
misleading graphs
t i gat
We looked at creating misleading graphs in the Year 11 text. Let us practise that
investigation again to reinforce the techniques used to produce misleading graphs.
Consider the data in this table.
Year
Wages
% increase in wages
Proﬁts
% increase in proﬁts
1985
1990
1995
2000
6
25
1
20
9
50
1·5
50
13
44
2·5
66
20
54
5
100
Graph 2
We shall use a spreadsheet to produce misleading graphs based on these data.
Graph 1
Graph 3
25.
Maths A Yr 12 - Ch. 04 Page 191 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
191
1 Enter the data as indicated in the spreadsheet (see page 190).
2 Graph the data using the Chart Wizard. You should obtain a graph similar to
Graph 1.
3 Copy and paste the graph twice within the spreadsheet.
4 Graph 2 gives the impression that the wages are a great deal higher than the
proﬁts. This effect was obtained by reducing the horizontal axis. Experiment with
shortening the horizontal length and lengthening the vertical axis.
5 In Graph 3 we get the impression that the wages and proﬁts are not very
different. This effect was obtained by lengthening the horizontal axis and
shortening the vertical axis. Experiment with various combinations.
6 Print out your three graphs and examine their differences.
Note that all three graphs have been drawn from the same data using valid scales.
A cursory glance leaves us with three different impressions. Clearly, it is important
to look carefully at the scales on the axes of graphs.
Another method which could be used to change the shape of a graph is to
change the scale of the axes.
7 Right click on the axis value, enter the Format axis option, click on the Scale
tab, then experiment with changing the scale values on both axes.
Techniques such as these are used to create different visual impressions of the
same data.
8 Use the data in the table to create a spreadsheet, then produce two graphs
depicting the percentage increase in both wages and proﬁts over the years
giving the impression that:
a the proﬁts of the company have not grown at the expense of wage increases
(the percentage increase in wages is similar to the percentage increase in proﬁts)
b the company appears to be exploiting its employees (the percentage
increase in proﬁts is greater than that for wages).
WORKED Example 5
Discuss why the following selected samples could provide bias in the statistics collected.
a In order to determine the extent of unemployment in a community, a committee phoned two
households (randomly selected) from each page of the local telephone book during the day.
b A newspaper ran a feature article on the use of animals to test cosmetics. A form
beneath the article invited responses to the article.
THINK
WRITE
a
a Phoning two randomly selected households per
page of the telephone directory is possibly a
representative sample.
However, those without a home phone and
those with unlisted numbers could not form part
of the sample.
An unanswered call during the day would not
necessarily imply that the resident was at work.
1
Consider phone book selection.
2
Consider those with no phone
contact.
3
Consider the hours of contact.
Continued over page
26.
Maths A Yr 12 - Ch. 04 Page 192 Wednesday, September 11, 2002 4:07 PM
192
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
THINK
WRITE
b
b Selecting a sample from a circulated newspaper excludes those who do not have access
to the paper.
In emotive issues such as these, only those
with strong views will bother to respond, so
the sample will represent extreme points of
view.
1
Consider the newspaper circulation.
2
Consider the urge to respond.
remember
remember
Bias can be introduced at each of the following stages:
1. questionnaire design
2. sampling bias
3. interpretation of results.
4C
Bias
1 Rewrite the following questions, removing any elements or words that might contribute
to bias in responses.
a The poor homeless people, through no fault of their own, experience great hardship
during the freezing winter months. Would you contribute to a fund to build a shelter
to house our homeless?
b Most people think that, since we’ve developed as a nation in our own right and
broken many ties with Great Britain, we should adopt our own national ﬂag. You’d
agree with this, wouldn’t you?
c You’d know that our Australian 50 cent coin is in the shape of a dodecagon,
wouldn’t you?
d Many in the workforce toil long hours for low wages. By comparison, politicians
seem to get life pretty easy when you take into account that they only work for part
of the year and they receive all those perks and allowances. You’d agree, wouldn’t
you?
2 Rewrite parts a to d in question 1 so that the expected response is reversed.
3 What forms of sampling bias can you identify in the following samples?
a Choosing a sample from students on a bus travelling to a sporting venue to answer
5
a questionnaire regarding sporting facilities at their school
b Sampling using ‘phone-in’ responses to an issue viewed on a television program.
c Promoting the results of a mail-response survey when fewer than half the selected
sample replied.
d Comparing the popularity of particular chocolate brands when one brand has a ‘two
for the price of one’ special offer.
e Choosing a Year 8 class and a Year 12 class to gather data relating to the use of the
athletics oval after school.
WORKED
Example
27.
Maths A Yr 12 - Ch. 04 Page 193 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
193
Australian currency
4 Why does this graph produce a biased visual impression?
Value of A$ compared with US $1
51c
50c
49c
9 May 11 May 12 May
Date
5 Comment on the following statement:
‘University tests have demonstrated that Double-White toothpaste is consistently used
by the majority of teenagers and is more effective than most other toothpastes.’
6 Surveys are conducted on samples to determine the characteristics of the population.
Discuss whether the samples selected would provide a reliable indication of the population’s characteristics.
Sample
Population
a Year 11 students
Student drivers
b Year 12 students
Students with part-time jobs
c Residents attending a
Residents of a suburb
neighbourhood watch meeting
d Students in the school choir
Music students in the school
e Cars in a shopping centre car park Models of Holden cars on the road
f Males at a football match
Popular TV programs
g Users of the local library
Popular teenage magazines
es
in
ion v
t i gat
n inv
io
es
Bias
It is important that a sample is chosen randomly to avoid bias.
Consider the following situation.
The government wants to improve sporting facilities in Brisbane. They decide to
survey 1000 people about what facilities they would like to see improved. To do
this, they choose the ﬁrst 1000 people through the gate at a football match at
Lang Park.
In this situation it is likely that the results will be biased towards improving
facilities for football. It is also unlikely that the survey will be representative of
the whole population in terms of equality between men and women, age of the
participants and ethnic backgrounds.
Questions can also create bias. Consider asking the question, ‘Is football your
favourite sport?’ The question invites the response that football is the favourite
sport rather than allowing a free choice from a variety of sports by the respondent.
Consider each of the following surveys and discuss:
a any advantages, disadvantages and possible causes of bias
b a way in which a truly representative sample could be obtained.
t i gat
28.
Maths A Yr 12 - Ch. 04 Page 194 Wednesday, September 11, 2002 4:07 PM
194
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
1 Surveying food product choices by interviewing customers of a large
supermarket chain as they emerge from the store between 9.00 am and 2.00 pm
on a Wednesday.
2 Researching the popularity of a government decision by stopping people at
random in a central city mall.
3 Using a telephone survey of 500 people selected at random from the phone book
to ﬁnd if all Australian States should have Daylight Saving Time in summer.
4 A bookseller uses a public library database to survey for the most popular
novels over the last three months.
5 An interview survey about violence in sport taken at a rugby league football
venue as spectators leave.
Contingency tables
Fair
11
25
9
45
Dark
19
51
28
98
Red
17
27
13
57
Total
47
103
50
200
Male
col
Total
air
Fair
le h
Dark
Hair colour of 200 couples
60
50
40
30
20
10
Fair
0
Red
Dark
Dark
Red
Fair
Female hair colour
Ma
Red
Frequency
Female
ou
r
When sample data are collected, it is often useful to break the data into categories. A
two-way frequency table or contingency table displays data that have been classiﬁed
into different types.
Consider, for example, data collected on the hair colour of 200 couples. It may be
represented in a table such as the one below.
These data could be represented as a 3-dimensional bar chart, as shown below.
29.
Maths A Yr 12 - Ch. 04 Page 195 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
195
Female
Red
Fair
Total
Fair
Male
Dark
24%
56%
20%
100%
Dark
19%
52%
29%
Hair colour of male
Although this graph displays the data so that comparisons are readily visible, the
chart is difﬁcult to read and ﬁgures can not be read accurately.
If we considered representing the data as a 2-dimensional segmented bar chart, this
could be done in two ways.
Splitting the data into categories based on the hair colour of the male and calculating
percentages in each category would yield the following ﬁgures and segmented bar
graph:
Hair colour of 200 couples
Fair
100%
Red
30%
47%
23%
100%
Dark
Red
0 20% 40% 60% 80%100%
Hair colour of female
Red
Dark
Fair
Splitting the data into categories based on the hair colour of the female and calculating percentages in each category would yield the following ﬁgures and segmented
bar graph:
100%
Female
Hair colour of 200 couples
80%
Red
Dark
Fair
Fair
23%
24%
18%
Dark
41%
50%
56%
Red
36%
26%
26%
Total
100%
100%
100%
60%
40%
20%
Male
It is obvious that the interpretation of the
data depends on the reference basis. We may
wish to interview those couples where the
male is fair haired and the female dark
haired. Note that this represents 25 couples.
What if we talk about percentages? Comparing the percentages in the two tables, it
can be seen that:
1. 56% of fair-haired males have female
partners with dark hair
2. 24% of dark-haired females have male
partners with fair hair.
These percentages have vastly different
values, yet they both describe the same set of
25 couples of fair-haired males and darkhaired females. It is important, particularly
when dealing with contingency tables, to
consider the reference basis for percentages.
0
Red Dark Fair
Hair colour of female
Hair colour of male
Red
Dark
Fair
30.
Maths A Yr 12 - Ch. 04 Page 196 Wednesday, September 11, 2002 4:07 PM
196
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
WORKED Example 6
A new test was designed to assess the reading ability of students
entering high school.
The results were used to determine if the students’ reading
level was adequate to cope with high school. The students’
results were then checked against existing records.
Of the 150 adequate readers who sat for the test, 147 of them
passed.
Of the 50 inadequate readers who sat for the test, 9 of them
passed.
Present this information in a contingency table.
THINK
WRITE
Draw up the table showing the number of
students whose reading was adequate and
the number of students for whom the
results of the new test were conﬁrmed.
Test results
Passed
Did not
pass
Total
147
3
150
9
41
50
156
44
Adequate
readers
Inadequate
readers
Total
When information on a test is presented in a contingency table, conclusions can be
made about the accuracy of the test.
WORKED Example 7
A batch of sniffer dogs is trained by customs to smell drugs in suitcases. Before they are
used at airports they must pass a test. The results of that test are shown in the contingency
table below.
Test results
Detected
Not detected
Total
No of bags with drugs
24
1
25
No. of bags without drugs
11
164
175
Total
35
165
a
b
c
d
How many bags did the sniffer dogs examine?
In how many bags did the dogs detect drugs?
In what percentage of bags without drugs did the dogs incorrectly detect drugs?
Based on the above results, what percentage of the time will the dogs not detect a bag
carrying drugs?
31.
Maths A Yr 12 - Ch. 04 Page 197 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
THINK
WRITE
a Add both total columns; they should
give the same result.
a 200 bags were examined.
b The total of the detected column.
b The dogs detected drugs in 35 bags.
c There were 175 bags without drugs but
dogs incorrectly detected them in
11 bags. Write this as a percentage.
c Percentage incorrectly detected
d Of 25 bags with drugs, 1 went
undetected. Write this as a percentage.
197
1
d Percentage not detected = ----- × 100%
25
Percentage not detected = 4%
=
11
-------175
× 100%
= 6.3%
As a result of studying a contingency table, we should also be able to make judgements about the information given in the tables. In the previous worked example
only one bag out of 25 with drugs went undetected. Although the dogs incorrectly
detected drugs in 11 bags that did not have drugs, they still have an overall accuracy of 94% as shown by the calculation [(24 + 164) ÷ 200] × 100%.
Many contingency tables will require you to make your own value judgements about
the conclusions established. For example, the 94% overall accuracy recorded may be
considered ‘very acceptable’.
WORKED Example 8
The contingency table at right shows the
Full-time
Part-time
composition of the employees of a small law
ﬁrm.
Female
4
11
a Extend the table to show totals in all
Male
30
5
categories and an overall total.
b Draw a table showing percentages with
respect to type of employment (full or part-time).
c Redraw the table showing percentages based on the gender of the employee.
d What percentage of females work full time?
e What percentage of full-time workers are female?
f Explain why, in the workforce in general, it would be easier to estimate an answer to
part d than it would to obtain an estimate for part e.
THINK
WRITE
a Add the numbers in the cells
for all the rows and columns
and enter the totals. Check
that the overall total is
consistent for the rows and
columns.
a
Full-time
Part-time
Total
4
11
15
Male
30
5
35
Total
34
16
50
Female
Continued over page
32.
Maths A Yr 12 - Ch. 04 Page 198 Wednesday, September 11, 2002 4:07 PM
198
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
THINK
WRITE
b Percentages are based on
totals in columns. The
totals in the columns are
on the denominator when
calculating percentages.
b
Full-time
Female
4
----34
× 100 = 12%
11
----16
× 100 = 69%
Male
30
----34
× 100 = 88%
5
----16
× 100 = 31%
Total
100%
c Percentages are based on
totals in rows. The totals
in the rows are on the
denominator when
calculating percentages.
c
d
Full time
d ----------------------------- × 100 =
Female total
This is based on
female totals in table c.
2
Write the answer.
1
This is based on fulltime totals in table b.
2
e
1
Write the answer.
f An estimate is easier if the
required sample is
smaller.
t i gat
Locality
by
longitude
es
in
ion v
t i gat
n inv
io
es
Part-time
100%
Full-time
Part-time
Total
Female
4
----15
× 100 = 27%
11
----15
× 100 = 73%
100%
Male
30
----35
× 100 = 86%
5
----35
× 100 = 14%
100%
4
----15
× 100 = 27%
Percentage of females who work full time = 27%.
Female
e --------------------------------- × 100 =
Full-time total
4
----34
× 100 = 12%
Percentage of full-time workers who are female = 12%.
f It would be easier to obtain an estimate for the percentage
of females who work full time because the number of
females is fewer than the number of full-time workers. This
means that the sample size would be smaller.
Climatic inﬂuences in Queensland
For this activity we will investigate relationships between geographical features
that inﬂuence our weather. We could pose questions such as:
What effect does latitude have on temperature?
What factor has the main inﬂuence on day length?
What part does elevation play in inﬂuencing temperature?
This investigation should be conducted using a spreadsheet. Data on Queensland
towns from the Bureau of Meteorology’s website have been collated and shown in
two spreadsheet tables which follow. Graphs have been provided for stimulus when
investigating relationships between the variables in the spreadsheet.
1 Retrieve the two Excel ﬁles from the CD provided with this book (the longitude
ﬁle does not contain the graphs displayed here).
Locality
by
latitude
2 Experiment by graphing pairs of variables to determine whether a relationship
exists between the pair. You may wish to sort the spreadsheet using a different
classiﬁcation.
33.
Maths A Yr 12 - Ch. 04 Page 199 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
199
3 Write a report on the geographical factors inﬂuencing daily temperatures and
sunlight hours. Support your conclusions by providing graphical evidence.
4 Sites on the World Wide
Web provide weather
conditions for many
places throughout the
world. Conduct a search
to collate data from
locations around the
globe. Investigate the
geographical features
which might have an
inﬂuence on their
weather.
34.
Maths A Yr 12 - Ch. 04 Page 200 Wednesday, September 11, 2002 4:07 PM
200
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
We are constantly bombarded with statistics, some of which are a valid interpretation of
the data, and some of which are not. On occasions, the misuse of statistics may be
unintentional or through ignorance, but there are occasions when misleading ﬁgures are
quoted intentionally. If the raw data are available, it is wise to check the validity of any
claims.
WORKED Example
9
The ABS data from the 1996 Census for the Chapel Hill area in Brisbane are shown here.
Note: Income ﬁgures are weekly income expressed in AUD.
Australian Bureau of Statistics
1996 Census of Population and Housing
Chapel Hill (Statistical Local Area) — Queensland
B01 Selected Characteristics — Chapel Hill
Total persons (a)
Aged 15 years and over (a)
Aboriginal
Torres Strait Islander
Both Aboriginal and Torres Strait Islander (b)
Australian born
Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA
Born overseas: Other country (d)
Born overseas: Total
Speaks English only and aged 5 years and over
Speaks language other than English (e) and aged 5 years and over
Australian citizen
Australian citizen aged 18 years and over
Unemployed
Employed
In the labour force
Not in the labour force
Enumerated in private dwelling (a)
Enumerated in non-private dwelling (a)
Persons enumerated same address 5 years ago
Persons enumerated different address 5 years ago
Overseas visitor
Chapel Hill
Median age
Median individual income
Median household income
Average household size
Male
4 824
3 761
3
0
0
3 405
696
587
1 283
3 936
459
4 239
3 027
141
2 677
2 818
854
4 815
9
2 217
2 163
54
Female
5 112
4 070
3
0
0
3 704
637
605
1 242
4 230
491
4 515
3 315
132
2 427
2 559
1 393
5 085
27
2 381
2 324
75
Persons
9 936
7 831
6
0
0
7 109
1 333
1 192
2 525
8 166
950
8 754
6 342
273
5 104
5 377
2 247
9 900
36
4 598
4 487
129
34
415
1 209
3.1
When discussing the probability of unemployment in this area, a resident proudly said
that only 5% of the unemployed in the area were male.
a Construct a contingency table displaying the employment/unemployment status of the
residents in this area.
b Use your contingency table to discuss the validity of the claim.
THINK
WRITE
a
a
1
2
Extract the employment and
unemployment ﬁgures for
males and females from the
table.
Form a contingency table
adding totals to rows and
columns.
Male
Female
Total
141
132
273
Employed
2677
2427
5104
Total
2818
2559
5377
Unemployed
35.
Maths A Yr 12 - Ch. 04 Page 201 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
201
THINK
WRITE
b
-------b P(unemployed being male) = 141 × 100
273
P(unemployed being male) = 52%
52% of the unemployed are males.
3
141
P(male being unemployed) = ----------- × 100
2818
P(male being unemployed) = 5%
5% of the males are unemployed.
The statement is not correct. The resident
should have said that only 5% of the
males were unemployed.
extension
es
in
ion v
t i gat
n inv
io
es
Measures of
location and
spread
Comparison
extension
extension of data sets
Contingency tables from census data
The table below displays data collected from the 1996 census. It shows the
numbers of males and females in various forms of employment in the 15–19 years
age bracket and the totals of all ages for each category.
Industry
Agriculture, Forestry and Fishing
Mining
Manufacturing
Electricity, Gas and Water Supply
Construction
Wholesale Trade
Retail Trade
Accommodation, Cafes and Restaurants
Transport and Storage
Communication Services
Finance and Insurance
Property and Business Services
Government Administration and Defence
Education
Health and Community Services
Cultural and Recreational Services
Personal and Other Services
Non-classiﬁable economic units
Not stated
Total
Australian bureau of statistics
15–19 years
Male
Female
9 986
2 685
1 349
339
33 831
10 168
676
191
21 162
1 664
11 441
5 367
91 818
127 466
17 917
25 019
4 135
2 560
1 213
869
2 001
4 981
11 164
13 930
5 877
2 999
3 685
4 071
3 059
13 388
6 658
7 016
4 161
11 212
3 808
2 000
9 133
8 175
243 074
244 100
Total
Male
225 679
75 497
695 007
49 427
419 394
306 456
500 105
157 519
250 385
102 016
127 364
410 414
225 316
184 287
161 489
93 066
143 942
63 045
81 643
4 272 051
Female
98 651
10 764
270 029
9 272
64 690
140 089
536 543
197 768
81 693
48 172
169 092
339 781
148 111
365 776
563 689
85 989
133 966
40 097
70 096
3 364 268
1996 Census of Population and Housing Australia 7688965.464 sq kms
Persons
324 330
86 261
965 036
58 699
484 084
446 545
1 036 648
355 287
332 078
150 188
296 456
750 195
373 427
540 063
725 178
179 055
277 908
103 142
151 739
7 636 319
D-
C
extension
AC
ER T
IVE
An error frequently occurs when statistics of this kind are quoted. The reference basis
for the probability percentage should be carefully noted.
When using the Maths Quest Maths A Year 12 CD-ROM, click here for more about
statistical measures.
M
2
Calculate the probability of an
unemployed person being a male; that is,
number of unemployed males
---------------------------------------------------------------------- × 100.
total number of unemployed
Calculate the probability of a male
being unemployed; that is,
number of unemployed males
---------------------------------------------------------------------- × 100.
total number of males
Compare these probability ﬁgures with
the statement and make a decision.
INT
1
RO
t i gat
36.
Maths A Yr 12 - Ch. 04 Page 202 Wednesday, September 11, 2002 4:07 PM
202
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Using these data, we could form a contingency table to compare the proportion of
15–19 year old males and females in, for example, the retail trade. (Conﬁrm the
ﬁgures in the table below.)
Male
Female
Total
91 818
127 466
219 284
Non-retail trade
151 256
116 634
267 890
Total
243 074
244 100
487 174
Retail trade
1 Use this table to:
a determine the percentage of male workers who are in the retail trade
b calculate the percentage of retail workers who are male
c explain why these two percentages are different
d plan a strategy to survey the workforce for an estimate of the number of
males in the retail trade.
2 Choose another category of the workforce from the census data. Construct a
contingency table, then answer questions similar to those above.
3 Reports from early recordings of census data showed that more than 50% of
Australians lived and worked on the land, providing food and clothing for our
population. Most recent reports indicate that only 4% of Australians now work
the land, providing for the remaining 96%. Use the data in the table to conﬁrm
that this is indeed true.
4 It is important for future planning that these changes are recorded and made
known. Search the World Wide Web or reference books to obtain industry data
from the 2001 census. Examine the ﬁgures, noting changing trends in industry
employment. Report on your ﬁndings.
remember
remember
1. Contingency tables can be used to display data that have been classiﬁed into
different types.
2. The table displays 2 variables which have been split into categories in a
horizontal and a vertical direction.
3. Calculations can be made with regard to a variety of reference bases.
4D
WORKED
Example
6
Contingency tables
1 A test is developed to test for infection with the ﬂu virus. To test the accuracy, the
following 500 people are tested.
• Of the 100 people who are known to have the ﬂu who are tested, the test returns 98
positive results.
• Of the 400 people who are known not to be infected with the virus who are tested,
12 false positives are returned.
37.
Maths A Yr 12 - Ch. 04 Page 203 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
203
Display this information in the contingency table below.
Test results
Accurate
Not accurate
Total
With virus
Without virus
Total
2 One thousand people take a lie detector test. Of 800 people known to be telling the
truth, the lie detector indicates that 23 are lying. Of 200 people known to be lying, the
lie detector indicates that 156 are lying.
Present this information in a contingency table.
WORKED
Example
7
3 The contingency table shown below displays the information gained from a medical
test screening for a virus. A positive test indicates that the patient has the virus.
Test results
Accurate
Not accurate
45
3
48
Without virus
922
30
952
Total
967
33
1000
With virus
Total
a How many patients were screened for the virus?
b How many positive tests were recorded? (that is, in how many tests was the virus
detected?)
c What percentage of test results were accurate?
d Based on the medical results, if a positive test is recorded what is the percentage
chance that you actually have the virus?
4 The contingency table below indicates the results of a radar surveillance system. If the
system detects an intruder, an alarm is activated.
Test results
Alarm activated
Intruders
No intruders
Total
Not activated
Total
40
8
48
4
148
152
44
156
200
a Over how many nights was the system tested?
b On how many occasions was the alarm activated?
c If the alarm is activated, what is the percentage chance that there actually is an
intruder?
d If the alarm was not activated, what is the percentage chance that there was an
intruder?
e What was the percentage of accurate results over the test period?
f Comment on the overall performance of the radar detection system.
4.2
4.3
38.
Maths A Yr 12 - Ch. 04 Page 204 Wednesday, September 11, 2002 4:07 PM
204
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
The information below is to be used in questions 5 to 7.
A test for a medical disease does not always produce the correct result. A positive test
indicates that the patient has the condition. The table indicates the results of a trial on a
number of patients who were known to either have the disease or known not to have the
disease.
Test results
Accurate
Not accurate
Total
57
3
60
Without disease
486
54
540
Total
543
57
600
With disease
5 multiple choice
The overall accuracy of the test is:
A 90%
B 90.5%
C 92.5%
D 95%
6 multiple choice
Based on the table, what is the probability that a patient who has the disease has it
detected by the test?
A 90%
B 90.5%
C 92.5%
D 95%
7 multiple choice
Which of the following statements is correct?
A The test has a greater accuracy with positive tests than with negative tests.
B The test has a greater accuracy with negative tests than with positive tests.
C The test is equally accurate with positive and negative test results.
D There is insufﬁcient information to compare positive and negative test results.
8 Airport scanning equipment is tested by scanning 200 pieces of luggage. Prohibited
items were placed in 50 bags and the scanning equipment detected 48 of them. The
equipment detected prohibited items in ﬁve bags that did not have any forbidden items
in them.
a Use the above information to complete the contingency table below.
Test results
Accurate
Not accurate
Total
Bags with prohibited items
Bags with no prohibited items
Total
b Use the table to answer the following:
i What percentage of bags with prohibited items were detected?
ii What was the percentage of ‘false positives’ among the bags that had no
prohibited items?
iii What percentage of prohibited items pass through the scanning equipment
undetected?
iv What is the overall percentage accuracy of the scanning equipment?
39.
Maths A Yr 12 - Ch. 04 Page 205 Wednesday, September 11, 2002 4:07 PM
205
Chapter 4 Populations, samples, statistics and probability
9 In some cases it is easier to count numbers in a particular category by considering a
different population. In each of the following pairs of proportions, which one would
be easier to determine?
a ii Proportion of males who are left-handed.
ii Proportion of left-handers who are males.
b ii Proportion of mathematics A students in your school who are over 16.
ii Proportion of over 16 year olds in your school who study mathematics A.
c ii Proportion of state school students who live in Queensland.
ii Proportion of Queensland school students who attend a state school.
10 Refer to the 1996 census data on industry on page 201.
a Draw up a contingency table showing the 15–19 year old males and females
8
employed in education compared with those of this age group employed in other
industries.
b Extend your table to show totals in all categories as well as an overall total.
c Draw up a table showing percentages with respect to gender.
d Redraw your table showing percentages based on industry.
e What percentage of females are employed in education?
f What percentage of those employed in education are female?
g At some period in between census times, if it were necessary to obtain an estimate
of the number of females employed in education by surveying a sample, what
approach would you recommend?
WORKED
Example
11 Repeat question 10 using the ‘totals’ data. Comment on any differences or similarities
in your answers.
Use the following data collected from the 1996 census for questions 12 and 13.
Note: Income ﬁgures are weekly income in AUD.
9
WORKED
Example
Australian Bureau of Statistics 1996 Census of Population and Housing
B01 Selected Characteristics — Inala
Total persons (a)
Aged 15 years and over (a)
Aboriginal
Torres Strait Islander
Both Aboriginal and Torres Strait Islander (b)
Australian born
Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA
Born overseas: Other country (d)
Born overseas: Total
Speaks English only and aged 5 years and over
Speaks language other than English (e) and aged 5 years and over
Australian citizen
Australian citizen aged 18 years and over
Unemployed
Employed
In the labour force
Not in the labour force
Enumerated in private dwelling (a)
Enumerated in non-private dwelling (a)
Persons enumerated same address 5 years ago
Persons enumerated different address 5 years ago
Overseas visitor
Inala
Median age
Median individual income
Median household income
Average household size
Inala (Statistical Local Area) — Queensland
Male
6 401
4 516
398
42
8
4 066
665
1 396
2 061
4 000
1 454
5 424
3 521
605
2 036
2 641
1 728
6 397
4
2 986
2 319
13
Female
6 886
5 083
482
51
12
4 468
694
1 462
2 156
4 483
1 494
5 807
4 001
349
1 403
1 752
3 144
6 886
0
3 343
2 493
15
30
187
412
2.8
Persons
13 287
9 599
880
93
20
8 534
1 359
2 858
4 217
8 483
2 948
11 231
7 522
954
3 439
4 393
4 872
13 283
4
6 329
4 812
28
40.
Maths A Yr 12 - Ch. 04 Page 206 Wednesday, September 11, 2002 4:07 PM
206
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
12 a Construct a contingency table displaying males and females ‘In the labour force’
and ‘Not in the labour force’, showing all totals.
b From your contingency table calculate:
i the percentage of females in the labour force
ii the percentage of those in the labour force who are female.
c Would it be correct to say that more than 39% of the females are in the labour
force? Explain.
4.2
13 a Construct a contingency table displaying the number of ‘Australian born’ and
‘Overseas born’ males and females in the community. Show all totals.
b Is it correct to claim that almost half the males in the community were born overseas? Explain.
Applications of statistics and
probability
By exploring data collected from samples (provided the samples have been chosen
carefully) we are able to estimate characteristics of the population. We can determine
past trends and speculate on future trends. Through a series of investigations we will
explore the application of statistics and probability to life-related situations.
Using histograms to estimate probabilities
Discrete data (the type where the scores can take only set values) can be represented
as a frequency histogram.
Continuous data (the type where the scores may take any value, usually within a
certain range) can also be represented in the form of a frequency or probability histogram. Let us construct a frequency histogram of continuous data from which we can
then estimate probabilities.
WORKED Example 10
A battery company tested a random sample of a batch of their batteries to determine their
lifetime. The results are shown below.
Lifetime (hours)
Frequency
20–<25
25–<30
30–<35
35–<40
40–<45
45–<50
6
25
70
61
30
8
a Represent the data as a frequency histogram.
b If you chose a battery from this batch, estimate the probability that the battery would
last:
ii at least 25 hours
ii less than 40 hours.
c In an advertising campaign, the battery manufacturer claims that they will replace the
battery if it does not last at least 30 hours. Based on these results, what is the
probability they will have to replace a battery?
41.
Maths A Yr 12 - Ch. 04 Page 207 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
THINK
WRITE
a Construct a frequency histogram with lifetime
on the x-axis and frequency on the y-axis.
a
b
207
b Total number of scores
= 6 + 25 + 70 + 61 + 30 + 8
= 200
Frequency
Frequency histogram
Find the total number of scores.
The total area under the curve is 1, so each
2
class interval represents a fraction of 1 in
terms of area (and probability).
ii 1 Find the total of frequencies with a
score of at least 25 hours.
1
2
3
ii
Estimated probability
total of frequencies at least 25 h
= --------------------------------------------------------------------------- × 1
total number of scores
Write the answer.
Find the total frequencies with a score
of less than 40 hours.
2
Apply the same rule as in part i.
3
c
1
Write the answer.
1
Find the total frequency for those batteries
lasting less than 30 hours.
2
Apply the probability rule.
Write the answer.
3
70
60
50
40
30
20
10
0
20 25 30 35 40 45 50
Lifetime (hours)
ii Total frequency at least 25 hours
= 25 + 70 + 61 + 30 + 8
= 194
-------P(≥25 h) = 194 × 1
200
P(≥25 h) = 0.97
The probability that the battery
would last for at least 25 hours is
0.97.
ii Total frequency less than 40 hours
= 6 + 25 + 70 + 61
= 162
-------P(<40 h) = 162 × 1
200
P(<40 h) = 0.81
The probability that the battery
would last less than 40 hours is
0.81.
c Total frequency less than 30 hours
= 6 + 25
= 31
31
P(<30 h) = -------- × 1
200
= 0.155
P(replacing battery) = 0.155
The probability that the manufacturer
will have to replace the battery is
0.155.
It should be noted that, if we are not given a table of results (as we were in the previous
worked example), but simply a frequency histogram, we would have to estimate frequencies from the histogram. In this case, the probability answers obtained would be
estimates rather than exact values.
42.
Maths A Yr 12 - Ch. 04 Page 208 Wednesday, September 11, 2002 4:07 PM
208
Interpreting histograms
The aim of this investigation is to highlight the pitfalls in interpreting the shape of
histograms. The activity is more readily conducted using a graphics calculator.
1 Consider the percentages received by a class of 36 students in their end-ofsemester test.
67, 90, 83, 85, 73, 80, 78, 79, 68, 71, 53, 65, 74, 64, 77, 56, 66, 63,
70, 49, 56, 71, 67, 58, 60, 72, 67, 57, 60, 90, 63, 88, 78, 46, 64, 81.
2 Enter the data as a list into a graphics calculator.
TI: Press STAT and select 1:Edit
then enter the data into L1 as
shown.
Casio: Enter STAT from the MENU
then enter the data into List 1.
3 Set the window for the percentage range 40 to 100 using a class interval of 10.
TI: Press WINDOW and enter
values as shown.
Casio: Press SHIFT F3
(V-WIN), then enter the values
shown.
4 Set the data to graph as a histogram.
TI: Press 2nd [STAT PLOT] and
select 1:Plot1. Set Plot 1 as shown.
Casio: Press F1 (GRPH) and
F6 (SET). Then set StatGraph1
as shown. (Press F6
to
scroll right to ﬁnd Hist.)
M
es
in
ion v
t i gat
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
5 Set all other graph plots off.
TI: Press 2nd [STAT PLOT] and
set other plots off.
Casio: Press F4 (SEL) and set
other StatGraphs off.
t i gat
43.
Maths A Yr 12 - Ch. 04 Page 209 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
209
6 Draw the histogram.
TI: Press GRAPH .
Casio: Press F6 (DRAW) and
enter the values shown.
Press F6 (DRAW).
7 On your calculator, change the range of the score to accommodate percentages
46 to 94, with a class interval of 4.
TI: Press WINDOW and enter the
new values shown.
Casio: Press EXIT then press
F1 (GPH1) and enter the new
values.
8 Draw the resulting histogram.
TI: Press GRAPH .
Casio: Press F6 (DRAW).
While the ﬁrst histogram appeared to have one modal class, this one appears
multimodal.
9 Use your calculator to investigate changing the class interval and the range of
the percentages. What do you observe?
M
D-
C
For more on probability, scatterplots, histograms and skewness, click on the icon when
using the Maths Quest Maths A year 12 CD-ROM.
AC
ER T
IVE
INT
10 All these histograms are graphical representations of the same data. While
they all indicate distributions with higher frequencies towards the middle,
some suggest bimodal or multimodal distributions. What do you conclude
from this investigation?
RO
Probability, scatterplots,
histograms and skewness
44.
Maths A Yr 12 - Ch. 04 Page 210 Wednesday, September 11, 2002 4:07 PM
210
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Using scatterplots to consider relationships between
data sets
WORKED Example 11
Are tall mothers likely to produce tall sons?
The table below details the heights of 12 mothers and their adult sons.
Height of mother (cm)
185 152 168 166 173 172 159 154 168 148 162 171
Height of son (cm)
188 162 168 172 179 182 160 148 178 152 184 180
a
b
c
d
Construct a scatterplot of the data.
Draw the line of best ﬁt.
Estimate the height of a son born to a 180-cm tall mother.
Discuss the relationship between the heights of mothers and their sons as shown by
these data.
The solution to this problem will be shown using three methods.
1. Pen and paper
2. Graphics calculator
3. Spreadsheet
It should be noted that, when a line of best ﬁt is drawn by eye, variations in answers will
occur for those dependent on the position of the line.
a
190
185
180
175
170
165
160
155
150
145
14
5
15
0
15
5
16
0
16
5
17
0
17
5
18
0
18
5
19
0
Method 1. Using pen and paper
a Plot points on a graph with height of
mother on x-axis (the independent
variable) and height of son on y-axis
(the dependent variable). This results
in a scatterplot.
WRITE/DRAW
Height of son (cm)
THINK
Height of mother (cm)
ne
Li
14
5
15
0
15
5
16
0
16
5
17
0
17
5
18
0
18
5
19
0
190
185
180
175
170
165
160
155
150
145
of
be
st
fit
b
Height of son (cm)
b Draw in the line of best ﬁt. Balance
an equal number of points either side
of the line and as close to the line as
possible.
Height of mother (cm)
45.
Maths A Yr 12 - Ch. 04 Page 211 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
WRITE/DISPLAY
c Draw a vertical line from the 180 cm
point on the x-axis to the line of best
ﬁt. From this point on the line, draw
a horizontal line to the y-axis. Read
this y-value.
c
be
of
y = 190
Li
ne
190
185
180
175
170
165
160
155
150
145
x = 180
14
5
15
0
15
5
16
0
16
5
17
0
17
5
18
0
18
5
19
0
Height of son (cm)
st
fit
THINK
211
Height of mother (cm)
From the graph
when x = 180
y = 190
So, a 180-cm tall mother could produce a son
approximately 190 cm tall.
d Look at the slope of the line and the
proximity of the points to the line.
d The slope of the line of best ﬁt is positive, indicating that, as one variable increases, the other
also increases. The points lie fairly close to the
line, so this indicates a fairly strong positive
relationship between the two variables. This
seems to support the view that tall mothers are
likely to produce tall sons.
Method 2. Using a graphics calculator
These instructions apply to the TI-83 and Casio CFX-9850 PLUS graphics calculators.
TI
Casio
a 1 Enter mother’s height and son’s
a
height into two lists.
TI: Press STAT , select 1:Edit
and enter the data.
Casio: From the MENU enter the
STAT sector and enter the data.
2 Set up the window to graph xvalues in the range 145–190, yvalues in the same range and a
scale of 5 for each.
TI: Press WINDOW then enter
the values shown.
Casio: Press SHIFT F3
(V-WIN) and enter values.
Continued over page
46.
Maths A Yr 12 - Ch. 04 Page 212 Wednesday, September 11, 2002 4:07 PM
212
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
THINK
3
4
WRITE/DISPLAY
TI
Set one graph plot on and select
scatterplot type, x as list 1 and y
as list 2.
TI: Press 2nd [STAT PLOT]
and set up Plot 1 as shown.
Casio: Press F1 (GRPH),
F6 (SET) and set StatGraph1
as shown.
Turn off all other plots.
TI: Use 2nd [STAT PLOT].
Casio: Press F4 (SEL).
5
b
Graph the relationship.
TI: Press GRAPH .
Casio: Press F6 (DRAW).
1
Enter the function that calculates
the equation of the line of best ﬁt.
TI: Press STAT , arrow across to
the CALC menu, and select
4:LinReg.
Casio: Press F1 (x).
Copy this equation into Y=.
TI: Press VARS, select
5:Statistics, arrow across to the
EQ menu, and select 1: RegEQ.
Casio: Press F5 (COPY) and
EXE to store.
Graph the line of best ﬁt on the
scatterplot.
TI: Press GRAPH .
Casio: Press F6 (DRAW).
2
3
b
M
c Use the calculator’s function to determine c
a value for y when the x-value is 180.
TI: Press 2nd [CALC] and select
1:Value.
Casio: Press MENU , select GRAPH, and
press F6 (DRAW). Then press SHIFT
F5 (SLV), F6
and F1 (Y-CAL)
and enter 180 value for x and press
EXE .
Casio
47.
Maths A Yr 12 - Ch. 04 Page 213 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
213
THINK
WRITE/DISPLAY
d Look at the angle of the straight line
and the proximity of the points to the
line.
d The line of best ﬁt predicts that a 180-cm tall
mother could produce an adult son
approximately 187 cm tall. The slope of the line
of best ﬁt is upwards, indicating that as one
variable increases the other also increases. Most
of the points lie close to the line, so it is
reasonable to assume that the relationship
between mother and son heights is quite strong.
This supports the proposal that tall mothers are
likely to produce tall sons.
Method 3. Using a spreadsheet
a 1 Open up a spreadsheet and enter
the data for the mother’s and
son’s heights in columns under
headings.
2 Use the chart wizard to graph the
data as a scatterplot.
3 Label the axes and provide a title
for the graph.
4 Adjust the range and scale on the
x- and y-axes to more appropriate
values if necessary (suggest 145
to 190 range with a scale of 5).
5 Print out a copy of the scatterplot.
a
b Draw in the line of best ﬁt. Balance
an equal number of points either side
of the line and as close to the line as
possible.
b From the scatterplot of the data above, the line
of best ﬁt is shown on the scatterplot.
c From the graph, read the
corresponding y-value for x = 180
cm.
c When x = 180, y = 187.
So a 180-cm tall mother would produce an adult
son approximately 187 cm tall.
d Look at the slope of the line and the
proximity of the points to the line.
d The slope of the line of best ﬁt is positive, indicating that, as one variable increases, the other
also increases. The points lie fairly close to the
line, so this indicates a fairly strong positive
relationship between the two variables. This
seems to support the view that tall mothers are
likely to produce tall sons.
48.
Maths A Yr 12 - Ch. 04 Page 214 Wednesday, September 11, 2002 4:07 PM
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
remember
remember
1. Frequency histograms can be used to estimate probabilities in data sets.
2. Scatterplots display the relationship between two variables.
3. Scatterplots enable past and future trends to be considered.
3 The table on the right shows the number of
goals scored by a hockey team throughout a
season.
a Show this information in a frequency
histogram.
b Are the data symmetrical?
c What is the mode(s)?
d Can the mean and median be seen for this
distribution and, if so, what are their
values?
e The probability that the team will score 5
goals is the same as their probability of
scoring what other number of goals?
12
10
8
6
4
2
0
1 2 3 4 5
Score
Frequency
2 For the distribution shown on the right:
a are the data symmetrical?
b what is the modal class(es)?
c can the mean and median be seen from the
graph and, if so, what are their values?
d which classes have the same probability of
occurring?
e which class has the least probability of
occurring?
Frequency
1 In the distribution on the right:
a is the graph symmetrical?
b what is the modal class(es)?
c can the mean and median be seen from the
graph and, if so, what are their values?
d which score has the greatest probability of
occurring?
7
6
5
4
3
2
1
0
Score
No. of goals
Frequency
0
6
1
4
2
4
3
4
4
4
5
6
4 For the distribution shown on the right:
a what is the modal score(s)?
b which score has the greatest probability of occurring?
c which score has the least probability of occurring?
Frequency
4E
Applications of statistics and
probability
0–
4
5–
10 9
–
15 14
–1
20 9
–
25 24
–2
9
214
12
10
8
6
4
2
0
1 2 3 4 5
Score
49.
Maths A Yr 12 - Ch. 04 Page 215 Wednesday, September 11, 2002 4:07 PM
215
Chapter 4 Populations, samples, statistics and probability
No. of goals
3
21–30
6
31–40
7
41–50
23
51–60
6 multiple choice
Frequency
11–20
21
8
6
4
2
0
1 23 4 5
Score
5
4
3
2
1
0
12345
Score
5
4
3
2
1
0
1 23 4 5
Score
Frequency
Which of the distributions below has the smallest standard deviation?
A 10
B 6
C 6
D
Frequency
10
5 The table at right shows the number of
goals scored by a basketball team
throughout a season.
a Draw a frequency histogram of the
data.
b What is the probability that the team
will score more than 40 goals?
Frequency
Example
Frequency
WORKED
8
7
6
5
4
3
2
1
0
1 2 3 4 5
Score
7 A movie is shown at a cinema 30 times
during the week. The number of people
attending each session of the movie is
shown in the table at right.
a Present the data in a frequency histogram.
b Are the data symmetrical?
c What is the modal class(es)?
d What is the probability of more than
150 people attending a session?
e What is the probability of having up to
100 people at a session?
No. of people
Frequency
1–50
2
51–100
3
101–150
5
151–200
10
201–250
10
8 Year 12 at Wallarwella High School sit exams in chemistry and mathematics. The
results are shown in the table below.
Mark
Chemistry
Maths
31–40
2
3
41–50
9
4
51–60
7
6
61–70
4
7
71–80
7
9
81–90
9
7
91–100
2
4
a Is either distribution symmetrical?
b State the mode of each distribution.
c In which subject is the standard deviation greater? Explain your answer.
50.
Maths A Yr 12 - Ch. 04 Page 216 Wednesday, September 11, 2002 4:07 PM
216
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
d The students were told that the probability of achieving more than 80% in chemistry was the same as it was in mathematics. Is this true? Explain.
e Is the probability of obtaining more than a pass mark (50%) greater in chemistry
or mathematics?
f What is the propability of achieving over 90% in each subject?
Note that some answers in the following questions may vary depending on the position of
the line of best ﬁt. The use of a graphics calculator is recommended, if available.
WORKED
Example
11
Interpolation/
Extrapolation
9 A drug company wishes to test the effectiveness of a drug to increase red blood cell
counts in people who have a low count. The following data were collected.
Day of experiment
Red blood cell count
4
5
6
7
8
9
210
240
230
260
260
290
Construct a scatterplot, then draw in the line of best ﬁt to ﬁnd the red blood cell count
at the beginning of the experiment (that is, on day 0).
10 A wildlife exhibition is held over 6 weekends and features still and live displays. The
number of live animals that are being exhibited varies each weekend. The number of
animals participating, together with the number of visitors to the exhibition each
weekend, is shown in the table which follows.
Number of animals
6
4
8
5
7
6
Number of visitors
311
220
413
280
379
334
Construct a scatterplot, then draw in the line of best ﬁt to ﬁnd the predicted number of
visitors if there are no live animals.
SkillS
HEET
4.4
11 A study of the dining-out habits of various income groups in a particular suburb
produces the results shown in the table below.
Weekly income ($)
100
200
300
400
500
600
700
800
900
1000
Number of restaurant
visits per year
5.8
2.6
1.4
1.2
6
4.8
11.6
4.4
12.2
9
Use the data to predict:
a the number of visits per year by a person on a weekly income of $680
b the number of visits per year by a person on a weekly income of $2000.
12 The following table represents the costs for transporting a consignment of shoes from
Brisbane factories. The cost is given in terms of distance from Brisbane. There are
two factories which can be used. The data are summarised below.
Distance from Brisbane (km)
10
20
30
40
50
60
70
80
Factory 1 cost ($)
70
70
90
100
110
120
150
180
Factory 2 cost ($)
70
75
80
100
100
115
125
135
a Draw the line of best ﬁt for each factory.
b Which factory is likely to have the lowest cost to transport to a shop in Brisbane?
c Which factory is likely to have the lowest cost to transport to Mytown, 115 kilometres from Brisbane?
d Which factory has the most ‘linear’ transport rates?
51.
Maths A Yr 12 - Ch. 04 Page 217 Wednesday, September 11, 2002 4:07 PM
217
Chapter 4 Populations, samples, statistics and probability
Year 2000 Summer Olympic Games
It has been claimed that the Summer
Olympic Games, held in Sydney in the
year 2000, were planned for September,
because past records indicated that the
chance of rain in Sydney during that
month was low. (We recall the rain that
fell at the time of one of Cathy Freeman’s
400 m races.)
Rainfall and temperature records for
Sydney are available from the table below
and graph at right.
Monthly temperatures Sydney
50
40
Temperature °c
es
in
ion v
t i gat
n inv
io
es
30
20
10
0
– 10
– 20
Jan Feb MarAprMayJun Jul AugSep Oct NovDec
Month of year
Highest Maximum Lowest Minimum
Average Maximum Average Minimum
The table shows the number of rainy days for Sydney for each month of the year
for the years 1980 to 1990.
Sydney rainfall (days of rain per month)
Year
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
1980
14
12
8
7
15
12
7
4
3
10
11
12
1981
15
15
5
8
14
7
6
7
6
12
17
13
1982
11
10
19
5
2
11
12
3
9
11
3
12
1983
9
12
9
12
13
8
8
12
9
16
8
18
1984
15
20
11
10
8
6
15
6
10
8
15
10
1985
9
10
8
19
13
6
7
10
12
18
19
16
1986
15
10
11
12
11
3
9
9
10
11
18
9
1987
14
11
10
7
9
11
11
16
4
17
8
16
1988
11
15
14
18
12
10
6
10
10
3
14
19
1989
21
11
22
23
22
20
12
9
7
6
13
15
1990
14
21
22
19
17
10
12
9
13
13
8
13
1 Identify the type of statistical calculation(s) that
would be appropriate to perform on the data.
Decide also on appropriate graphical
representations.
2 Produce the graphs and perform your calculations.
3 Analyse your calculations and graphs. Comment
on the claim made in the opening paragraph.
Would an alternative month have been more
appropriate, as far as the rainfall factor is
concerned?
t i gat
52.
Maths A Yr 12 - Ch. 04 Page 218 Wednesday, September 11, 2002 4:07 PM
218
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
4 Does the month of September appear to be a suitable one for summer sporting
events, when temperature is considered? Monthly temperature ﬁgures for
Sydney are shown in the graph.
5 Describe the temperature conditions in Sydney during the month of September.
6 Compare rainfall and temperature ﬁgures for September with those for other
months of the year.
7 Assume you are responsible for a submission to the Australian Olympic
Committee recommending a suitable month in which to hold the summer
Olympics. Prepare your report, remembering to support your recommendations
by referring to the available data, calculations and graphs.
es
in
ion v
t i gat
n inv
io
es
Sampling text to predict population
characteristics
Research shows that the letter ‘e’ is the most frequently used letter in the English
language. This activity aims to determine the frequency of its occurrence on a page
of English text using a sample selected from the page.
1 Choose a book with an extended section of continuous prose. Select 20 full
lines of text from a page (ignore incomplete lines).
2 Draw up the table below.
Line number
Number of e’s
Cumulated total
1
2
3
…
…
20
3 Count the number of e’s per line and complete the second column.
4 Using the ﬁgures in the second column, complete the ‘cumulated total’ column.
5 Use a graphics calculator, spreadsheet or graph paper to draw a scatterplot of
the number of lines against the accumulated total.
6 Draw the line of best ﬁt on your scatterplot.
7 Count the number of lines of text on your page. Extrapolate your scatterplot to
estimate the number of e’s on the page you have chosen.
8 Compare your answer with those obtained from different texts by other
members of your class. Comment on the similarities and variations in your
answers. Factors that must be taken into account when making comparisons
between the number of e’s per page of printed material include page size, font
size, line length, line spacing and so on.
Aside: The novel, A void by George Perec (Harper Collins 1994) is written entirely
without the letter ‘e’.
t i gat
53.
Maths A Yr 12 - Ch. 04 Page 219 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
es
in
ion v
t i gat
n inv
io
es
219
Comparing population characteristics
The following tables display data collated from the 1996 ABS Census relating to
the populations in two Brisbane suburbs. Note: Income ﬁgures are weekly income,
expressed in AUD.
Chapel Hill
Total persons (a)
Aged 15 years and over (a)
Aboriginal
Torres Strait Islander
Both Aboriginal and Torres Strait Islander (b)
Australian born
Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA
Born overseas: Other country (d)
Born overseas: Total
Speaks English only and aged 5 years and over
Speaks language other than English (e) and aged 5 years and over
Australian citizen
Australian citizen aged 18 years and over
Unemployed
Employed
In the labour force
Not in the labour force
Enumerated in private dwelling (a)
Enumerated in non-private dwelling (a)
Persons enumerated same address 5 years ago
Persons enumerated different address 5 years ago
Overseas visitor
Chapel Hill
Median age
Median individual income
Median household income
Average household size
Inala
Total persons (a)
Aged 15 years and over (a)
Aboriginal
Torres Strait Islander
Both Aboriginal and Torres Strait Islander (b)
Australian born
Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA
Born overseas: Other country (d)
Born overseas: Total
Speaks English only and aged 5 years and over
Speaks language other than English (e) and aged 5 years and over
Australian citizen
Australian citizen aged 18 years and over
Unemployed
Employed
In the labour force
Not in the labour force
Enumerated in private dwelling (a)
Enumerated in non-private dwelling (a)
Persons enumerated same address 5 years ago
Persons enumerated different address 5 years ago
Overseas visitor
Inala
Median age
Median individual income
Median household income
Average household size
4 824
3 761
3
0
0
3 405
696
587
1 283
3 936
459
4 239
3 027
141
2 677
2 818
854
4 815
9
2 217
2 163
54
5 112
4 070
3
0
0
3 704
637
605
1 242
4 230
491
4 515
3 315
132
2 427
2 559
1 393
5 085
27
2 381
2 324
75
9 936
7 831
6
0
0
7 109
1 333
1 192
2 525
8 166
950
8 754
6 342
273
5 104
5 377
2 247
9 900
36
4 598
4 487
129
6 886
5 083
482
51
12
4 468
694
1 462
2 156
4 483
1 494
5 807
4 001
349
1 403
1 752
3 144
6 886
0
3 343
2 493
15
13 287
9 599
880
93
20
8 534
1 359
2 858
4 217
8 483
2 948
11 231
7 522
954
3 439
4 393
4 872
13 283
4
6 329
4 812
28
34
415
1 209
3.1
6 401
4 516
398
42
8
4 066
665
1 396
2 061
4 000
1 454
5 424
3 521
605
2 036
2 641
1 728
6 397
4
2 986
2 319
13
30
187
412
2.8
1 Examine the data carefully.
2 Prepare a report comparing the residents of the two suburbs. Support your
statements by reference to ﬁgures.
t i gat
54.
Maths A Yr 12 - Ch. 04 Page 220 Wednesday, September 11, 2002 4:07 PM
220
es
in
ion v
t i gat
n inv
io
es
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
Modelling Olympic Games times
The running time for the men’s 100-m event
in the Olympic Games broke the 10-second
barrier for the ﬁrst time in 1968 when Jim
Hines of the United States clocked a time of
9.95 seconds. Knowing that records are
broken over time, is it possible to predict a
year when a runner could break the
9.5-second barrier?
We could model this situation by looking at
past times for the event. These are shown in
the table at right.
Note: The times have not decreased
consistently over the years. In fact, after 1968,
the 10-second barrier was not broken again
until 1984 when Carl Lewis of the United
States won with a time of 9.99 s. For this
reason, along with other factors which
accompany feats of human endurance, in
modelling situations such as this, any
resulting predictions can only be considered
as estimates.
1 Using a graphics calculator, spreadsheet or
graph paper, draw a scatterplot of ‘year’
(on the x-axis) against ‘time’ (on the
y-axis). This could be viewed as a time
series as the years are at equal
intervals.
2 Draw the line of best ﬁt for your scatterplot.
3 Use your line of best ﬁt to ﬁnd the year in
which the 9.5-second barrier could be
broken in the 100-m sprint. This represents
a situation when we are extrapolating data
(that is, determining a value outside the
range of data plotted). Note that the
Olympic Games are only held every four
years and that your answer may not be one
of those years. It remains to be seen how
accurate your prediction is!
4 Use some resources available to you (World
Wide Web, reference books, almanacs and
so on) to collect data on other sporting
events (such as high jump heights, shot put
distances and swimming times). Model the
situation, enabling a prediction to be
proposed.
Year
Time
(seconds)
1948
10.30
1952
10.40
1956
10.50
1960
10.20
1964
10.00
1968
9.95
1972
10.14
1976
10.06
1980
10.23
1984
9.99
1988
9.92
1992
9.96
1996
9.84
2000
9.87
t i gat
55.
Maths A Yr 12 - Ch. 04 Page 221 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
t i gat
es
in
ion v
n inv
io
es
221
Predicting test results
Over the year Sally had sat for nine mathematics tests, but had been sick at the
time of the tenth test. She had achieved above average marks for each of her nine
tests, so her teacher did not want to give her the class average for her last test. As
sometimes happens, the class as a whole found this last test more difﬁcult than the
previous ones, so generally all marks were depressed. It would not then be fair to
give Sally the average of her previous tests for this last one. How could her teacher
give Sally an estimated mark based on her past performance compared with that of
her fellow students?
Below is a table displaying the class average percentage for each test and Sally’s
percentage on the tests.
t i gat
Test
1
2
3
4
5
6
7
8
9
10
Class %
64
59
60
55
62
66
58
63
65
49
Sally’s %
72
70
77
65
75
80
71
75
75
Because Sally’s marks were consistently above the class average by 10% or more,
we could explore a relationship between the class average and Sally’s marks.
1 Using a graphics calculator, or otherwise, construct a scatterplot of the ﬁrst nine
tests, plotting the class average percentage on the x-axis and Sally’s percentage
on the y-axis.
2 Draw in the line of best ﬁt.
3 Use your line of best ﬁt and a value of 49 for the class average percentage to
determine an estimate for Sally’s performance had she sat for the tenth test.
4 Do you think this is a fair mark to award to Sally? Justify your answer.
This method can be used in a variety of situations to estimate values for missing
results. The predictions become less accurate if there is a great deal of
inconsistency in past performance.
es
in
ion v
t i gat
n inv
io
es
The door game
Imagine you are a contestant in a game on television. The conditions and rules are
as follows:
1. Three doors (door 1, door 2, and door 3) stand before you.
2. Behind one of these doors which face you and the audience lies the prize of
your dreams (a trip to Wimbledon or a Ferrari car). (The organisers know where
the prize lies.)
3. Behind each of the other two doors there is nothing.
4. You see the three closed doors before you and you have no hints.
5. With the beneﬁt (or distraction) of audience participation, you are asked to
select the door behind which you think the prize lies.
6. You tell the compere and audience which door you choose.
7. Before opening the door you have chosen, the compere tells you that ﬁrst, one
of the doors, different from the one you have chosen, will be opened; it will be
a door that does not have the prize behind it.
t i gat
56.
Maths A Yr 12 - Ch. 04 Page 222 Wednesday, September 11, 2002 4:07 PM
222
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
8. There are two doors now unopened and you are given an offer to change your
choice.
9. Of the two unopened doors, one is your original choice and the other not your
choice.
10. Using the theory of probability, would you have a better chance of winning if
you stayed with your original choice, or made the switch?
This was a game which was actually played on television for a lengthy period
many years ago. Regular viewers of the program formed their own opinions
regarding the better option by analysing the results of contestants’ choices. The
background to the game can be researched by a web search of the name Monty
Hall. You may be interested to conduct a search at this stage, or leave your search
until you have had time to formulate a logical reason for your choice.
Part I
This activity provides practical experience for the door game. The simulation could
be undertaken as a class activity, or in pairs.
1 Simulate three doors using books lying on a ﬂat surface instead of doors.
2 Underneath one of them place a small piece of paper. (The contestant is to be
unaware of the location of the paper.)
3 Ask the contestant to select one of the three books underneath which he or she
thinks this piece of paper lies.
4 Turn over one book other than one chosen by the contestant and underneath
which the object does not lie.
5 Ask the contestant whether he/she would like to choose a different book.
6 Turn over the book under which the paper lies.
7 Repeat the simulation ten times. Copy and complete the table below, recording
your results after each turn shown in the table.
Game
1
Stay with choice
Change mind
✔
Win/Lose
✔
2
✔
✖
3
✔
✔
4
5
6
7
8
9
10
57.
Maths A Yr 12 - Ch. 04 Page 223 Wednesday, September 11, 2002 4:07 PM
223
Chapter 4 Populations, samples, statistics and probability
8 Combine your results with those of other members of your class so that your
combined set consists of a large number of simulations.
9 What are your conclusions from your experiment? Is it wise to stay with your
original choice or should you change your mind?
If you conduct a web search, you will ﬁnd sites which simulate the door game
using an interactive mode.
One such site can be found at www.jaconline.com.au/maths/weblinks.
Part II
Let us consider the probabilities of the choices in the ‘door game’.
1. Since there are three doors, the probability of the prize
-being behind any of these doors is 1 .
3
This enables us to start a tree diagram as shown at right.
Prize door
1
–
3
1
2
1
–
3
3
1
–
3
2. As you have three doors from which to
select, the probability that your choice
-will be the correct selection is 1 .
3
This realisation enables us to extend the
branches of the tree as at right.
Your choice 1
2
3
1
1
–
3
1
–
3
1
–
3
–
3
Prize door
1
2
1
–
3
1
–
3
1
–
3
1
2
3
1
–
3
3
1
–
3
3. Multiplying the probabilities along the
branches enables us to ﬁll in the
probability for each selection (shown
at right).
Prize door
1
2
1
–
3
1
2
3
Your choice 1
2
3
1
1
–
3
1
–
3
1
–
9
1
–
9
1
–
9
–
3
1
–
3
1
–
9
1
2
3
1
–
9
1
–
9
3
1
–
3
1
2
3
1
–
9
1
–
9
1
–
9
58.
Maths A Yr 12 - Ch. 04 Page 224 Wednesday, September 11, 2002 4:07 PM
224
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
4. The next stage is the crucial factor. Depending on which door is opened, will
you win if you stay with your choice, or are you more likely to win if you
change your mind? Follow the results completed for door 1. The combinations
of opened doors have been completed below for the remainder of the tree.
Complete the blank spaces.
Prize door
Your choice
1
2
3
Door opened
2 or 3
1
–
9
1
–
9
1
–
9
3
No
Yes (P =
1
-9
)
2
No
Yes (P =
1
-9
)
1
–
9
3
____
____
1
–
9
1 or 3
____
____
1
–
9
1
–
3
1
____
____
1
–
9
2
____
____
1
____
____
1 or 2
____
____
1
2
Win if
Win if I
I stay?
change my mind?
-Yes (P = 1 )
No
9
1
–
3
1
2
3
3
1
–
3
1
2
3
1
–
9
1
–
9
5. Add the probabilities in the ‘Win if I stay?’ column. This gives the overall
probability of your winning if you stay with your original choice.
6. Similarly, add the probabilities in the ‘Win if I change mind?’ column. This
ﬁgure represents the overall probability of your winning if you change your
mind and choose the other unopened door.
1 Based on these probabilities, which choice should you make? How does this
agree with your experimental results from your practical simulation?
Work
ET
SHE
4.3
If you have not yet visited websites by searching for Monty Hall, it would now be
appropriate to do so. This game has created much discussion among great
mathematicians and non-mathematicians over many years.
2
For the set of scores 23, 45, 24, 19, 22, 16, 16, 27, 20, 21, use your calculator to ﬁnd:
1 the mean
2 the median
3 the mode
4 the range
5 the interquartile range
6 the standard deviation.
7 In the data set, is the distribution symmetrical?
8 Does the data set have an outlier (a score which is not typical of the data set)?
9 Which measure of central tendency is the best measure of location in this data set?
10 Explain why the interquartile range is a better measure of spread than the range.
59.
Maths A Yr 12 - Ch. 04 Page 225 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
225
summary
Data collection
• A statistical investigation can be done by either census or sample.
• A census is when an entire population takes part in the investigation.
• A sample is when a small group takes part in the investigation and the results are
taken to be representative of the whole group.
• There are many types of sample.
1. Random sample — chance is the only factor in deciding who participates.
2. Stratiﬁed sample — the sample taken is chosen so that it has the same
characteristics as the whole population.
3. Systematic sample — there is a method for deciding who participates in the
sample.
4. Accessibility sample — those easily accessed form the sample.
5. Quota sample — the number in the sample is limited by the quota.
6. Judgemental sample — a judgement is made when deciding the sample.
7. Cluster sample — clusters within the population form the sample.
Bias
If the sample is poorly chosen the results of the investigation will be biased. This
means that the results will be skewed towards one section of the population.
Estimating populations
Populations that can’t be accurately counted with ease are estimated by using the
capture–recapture technique.
Contingency tables
• Display horizontal and vertical categories of two variables of a set of data
• Calculations with regard to a variety of reference bases can be made.
Applications of statistics and probability
• Frequency histograms can be used to estimate probabilities in data sets.
• Scatterplots display the relationship between two variables and allow predictions
to be made.
Measures of central tendency and spread
•
•
•
•
•
•
•
•
The mean, median and mode are measures of central tendency in a data set.
The mean is calculated by adding the scores then dividing by the number of scores.
The median is the middle score or average of two middle scores.
The mode is the most frequently occurring score.
The range, interquartile range and standard deviation are measures of spread.
The range is the difference between the highest and lowest scores.
The interquartile range is the difference between the upper and lower quartiles.
The standard deviation is found using the population or sample functions of a
calculator.
• An outlier is a score that is much less or much greater than the other scores.
60.
Maths A Yr 12 - Ch. 04 Page 226 Wednesday, September 11, 2002 4:07 PM
226
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
CHAPTER
review
4A
1 For each of the following statistical investigations, state whether a census or a survey has
been used.
a The average price of petrol in Townsville was estimated by averaging the price at 40
petrol stations.
b The Australian Bureau of Statistics has every household in Australia complete an
information form once every ﬁve years.
c The performance of a cricketer is measured by looking at his performance in every
match he has played.
d Public opinion on an issue is sought by a telephone poll of 2000 homes.
4A
2 multiple choice
4B
4B
3 Name and describe 5 different methods for selecting a sample.
4B
4B
5 Use your random number generator to select 10 numbers between 1 and 1000.
Which of the following is an example of a census?
A A newspaper conducts an opinion poll of 2000 people.
B A product survey of 1000 homes to determine what brand of washing powder is used
C Every 200th jar of Vegemite is tested to see if it is the correct mass.
D A federal election
4 Which method of sampling has been used for each of the following?
a The quality-control department of a tyre manufacturing company road tests every 50th
tyre that comes off the production line.
b To select the students to participate in a survey, a spreadsheet random number generator
selects the roll numbers of 50 students.
c An equal number of men and women are chosen to participate in a survey on fashion.
6 The table at right shows the number of students
in each year of school.
In a survey of the school population, how many
students from each year should be chosen, if a
sample of 60 is selected using a stratiﬁed
sample?
No. of students
7
212
8
200
9
189
10
175
11
133
12
4B
Year
124
7 To estimate the ﬁsh population of a lake, 100 ﬁsh are caught, tagged and released. The next
day another 100 are caught and it is noted that 5 have tags. Estimate the population of the
lake.
61.
Maths A Yr 12 - Ch. 04 Page 227 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
227
8 Kimberley has a worm farm. To estimate the population of her farm, she catches 150 worms
and tags them before releasing them. The next day, she catches 120 worms and ﬁnds that 24
of them have tags. Estimate the population of the worm farm.
4B
9 A sample of 200 ﬁsh are caught, tagged and released back into the population. Later Barry,
Viet and Mustafa each catch a sample of ﬁsh.
Barry caught 40 ﬁsh and 3 had tags.
Viet caught 75 ﬁsh and 9 had tags.
Mustafa caught 55 ﬁsh and 7 had tags.
a Find the estimate of the population that each would have calculated.
b Give an estimate for the population, based on all three samples.
4B
10 multiple choice
Which of the following is an example of a random sample?
A The ﬁrst 50 students to arrive at school take a survey.
B Fifty students’ names are drawn from a hat and those drawn take the survey.
C Ten students from each year of the school are asked to complete a survey.
D One class in the school is asked to complete the survey.
4B
11 Carolyn is a marine biologist. She spends the day on a boat and 500 ﬁsh are netted. Carolyn
notes the types of ﬁsh netted. There are 173 blackﬁsh, 219 drummer and 108 mullet. The
ﬁsh are tagged and released back into the school from which they were caught. Another 250
are then caught and it is noted that 63 have tags. Estimate the population of the school.
4B
12 Bias can be introduced into statistics through:
a questionnaire design
b sample selection
c interpretation of statistical results.
Discuss how bias could be a result of techniques in the above three areas.
4C
13 A medical test screens 200 people for a virus. A positive test result indicates that the patient
has the virus.
1. Of 50 people known to have the virus, the test produced 48 positive results.
2. Of the remainder who were known not to have the virus, the test produced one positive
result.
Use the above information to complete the table below.
4D
Test results
Accurate
Not accurate
Total
With virus
Without virus
Total
14 The results of a lie detector test are given below.
1. Of 80 people who are known to be telling the truth, the lie detector indicates that three
are lying.
2. Of 20 people known to be lying, the lie detector indicates that 17 are lying.
Display this information in a contingency table.
4D
62.
Maths A Yr 12 - Ch. 04 Page 228 Wednesday, September 11, 2002 4:07 PM
228
4D
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
15 Below are the results of a test screening for a disease. A positive test indicates that the
patient has the disease.
Test results
Accurate
Not accurate
Total
18
2
20
Without disease
108
12
120
Total
126
14
With disease
a How many people were tested for the disease?
b How many positive test results were recorded?
c What percentage of those people with the disease were correctly diagnosed by
the test?
d If a person without the disease is chosen at random, what percentage returned a positive
test?
4D
16 A reading test for people with dyslexia is given and the results are shown in the contingency
table below.
Test results
Accurate
Not accurate
Total
With dyslexia
39
1
40
Without dyslexia
85
5
90
124
6
Total
a How many people were tested?
b What percentage of people tested positive to dyslexia?
c Based on the above results, if a person with dyslexia takes the test, what is the
percentage chance that they will be accurately diagnosed?
4D
17 multiple choice
The contingency table below shows the results of a trial on new metal detectors for aircraft.
The metal detector scans a piece of hand luggage and lights up if metal is found.
Test results
Accurate
Not accurate
Total
9
1
10
Without metal
87
3
90
Total
96
4
With metal
Based on the above results, the chance of metal going undetected in a piece of hand luggage
is:
A 10%
B 25%
C 75%
D 90%
63.
Maths A Yr 12 - Ch. 04 Page 229 Wednesday, September 11, 2002 4:07 PM
Chapter 4 Populations, samples, statistics and probability
229
18 A medical test for a disease does not always give the correct result. A positive test indicates
that the patient has the disease. The contingency table below shows the results of a new
screening test for the disease. It was tested on a group of people, some of whom were
known to be suffering from the disease, some of whom were not.
4D
Test results
Accurate
Not accurate
Total
28
2
30
Without disease
164
6
170
Total
192
8
With disease
a
b
c
d
e
How many people were tested for the disease?
What percentage of the results were accurate?
How many patients tested positive to the disease?
What percentage of patients with the disease were correctly diagnosed by the new test?
Based on the above results, what is the probability that a patient with the disease will
have the disease detected by this test?
19 The contingency table below compares the number of men and women who are right- and
left-handed.
Men
Women
Totals
158
172
330
17
15
32
175
187
4D
362
Right-handed
Left-handed
Totals
a What percentage of males are left-handed?
b What percentage of females are left-handed?
c Based on the above data, is there any signiﬁcant difference between the percentage of
male and female left-handers?
20 multiple choice
The contingency table below shows the number of men and women who work in excess of
45 hours per week.
Men
Women
Totals
≤45 hours
132
128
260
>45 hours
69
34
103
Totals
201
162
363
The percentage of men who work greater than 45 hours per week is closest to:
A 28%
B 34%
C 51%
D 67%
4D
64.
Maths A Yr 12 - Ch. 04 Page 230 Wednesday, September 11, 2002 4:07 PM
230
M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d
21 Sixty-seven primary and 47 secondary school
students were asked their attitude to the number of
school holidays which should be given. They were
asked whether there should be more, fewer or the
same number. Five primary students and 2 secondary
students wanted fewer holidays, 29 primary and 9
secondary students thought that they had enough
holidays (that is, they chose the same number) and
the rest thought that they needed to be given more
holidays.
a Form a contingency table with reference to
primary and secondary school percentages.
b Use your table to compare the opinions of
primary and secondary school students.
4E
22 Consider the data set represented by the frequency histogram on
the right.
a Are the data symmetrical?
b Can the mean and median of the data be seen?
c What is the mode of the data?
d Which score has the highest probability of occurring?
4E
Frequency
4D
8
7
6
5
4
3
2
1
0
15 16 17 18 19 20
Score
23 The table below shows the number of attempts that 20 members of
a Year 12 class took to obtain a driver’s licence.
Number of attempts
Frequency
1
11
2
4
3
2
4
2
5
0
6
1
a Show these data in a frequency histogram.
b Are the data symmetrical?
c What is the probability of a student of the class taking more than three attempts to obtain
a driver’s licence?
24 Consider this data set which measures the sales ﬁgures for a new salesperson.
Day (d)
CHAPTER
test
yourself
4
1
2
3
4
5
6
7
8
Units sold (s)
1
2
4
9
20
44
84
124
a Draw the line of best ﬁt.
b Use your line to predict the sales ﬁgures for the tenth day.
Be the first to comment