Year 12 Maths A Textbook - Chapter 4
Upcoming SlideShare
Loading in...5
×
 

Year 12 Maths A Textbook - Chapter 4

on

  • 866 views

 

Statistics

Views

Total Views
866
Views on SlideShare
866
Embed Views
0

Actions

Likes
0
Downloads
37
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Year 12 Maths A Textbook - Chapter 4 Year 12 Maths A Textbook - Chapter 4 Document Transcript

  • Maths A Yr 12 - Ch. 04 Page 167 Wednesday, September 11, 2002 4:07 PM 4 Populations, samples, statistics and probability syllabus reference eference Strand: Statistics and probability Core topic: Exploring and understanding data In this chapter chapter 4A 4B 4C 4D 4E Populations and samples Samples and sampling Bias Contingency tables Applications of statistics and probability
  • Maths A Yr 12 - Ch. 04 Page 168 Friday, September 13, 2002 9:19 AM 168 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Populations and samples Early population counts were musters, where community members were gathered and counted. In 1828, the first Australian census was conducted in New South Wales. Each State conducted its own separate census until 1886, five years after the first simultaneous census of the British Empire. In 1901, a common census was conducted throughout Australia; however, the results were not collated to form a total for Australia. The Census and Statistics Act of December 1905 provided that: ‘The Census shall be taken in the year 1911, and in every tenth year thereafter.’ During the Depression and World War II, no census was taken. The first post-war census took place in Australia in 1947. The types of questions have changed over time to reflect the changes in our society. The time required to process the responses to the questions has been reduced with the introduction of Optical Mark Reading machines (1991 census) and Intelligent Character Recognition machines which can read handwritten words in the 2001 census. Since 1961, a census has been held every five years, and the fourteenth national Census of Housing and Population was held on 7 August 2001. The 2001 census coincided with Australia’s Centenary of Federation. Participants were given the opportunity to place their census forms in a time capsule (to be held by the National Archives) for 99 years. Descendants would then have a glimpse into the lives of their forebears. Many of the skills required for this chapter were developed in Year 11 (chapters 9 and 10 of Maths Quest Maths A Year 11). Revise the methods by completing the following exercises. 1 Write each of the following as a decimal (correct to 3 decimal places). 1 ------------a 3 b ----c 65 d 124 210 80 12 8 2 Convert each of the following to percentages. -a 3 b 0.125 c 4 85 -------200 d 0.04 3 Use your calculator to generate a set of 10 random integers in the range: a 1 to 20 inclusive b 50 to 100 inclusive. 4 Round the following numbers to integers. a 3.6 b 4.02 c 2.91 d 6.5 e 0.9 5 Find the unknown in each of the following. 1 2 3 b 2 5 a -- = -b -- = ----c -- = -4 a 7 21 9 c 5 2 d -- = -d 7 e f 7 -- = -9 6 6 What types of features on a graph can cause it to be misleading? Work ET SHE 4.1 7 For the following sets of scores x: 6, 9, 8, 7, 6, 5, 8, 11, 6, 7 Calculate: a Σx b – x d mode e lower quartile g range h inter-quartile range. c f median upper quartile
  • Maths A Yr 12 - Ch. 04 Page 169 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability es in ion v t i gat n inv io es 169 Australia’s population and housing census It is important that we understand the reason for recording statistical data accurately. In our society, it is difficult to imagine a world without statistics. Try to imagine a State of Origin football match where no one kept the score! The excitement of the game would probably hold our attention for a while, but if no score was recorded, winning or losing would not be an issue, and we would soon lose interest. A census is an example of information collected from the whole population. It is not always possible or feasible to conduct a questionnaire on the whole population, so when this opportunity arises, it is vital to ensure that the questions are carefully worded and that relevant information is sought. The Australian Bureau of Statistics (ABS) is the government department responsible for administering the Australian census, then collating and analysing the responses. Their website <www.jaconline.com.au/maths/weblinks> details information about their role and it displays statistical data from many areas. Access this site to conduct your research. Prepare a report providing responses to the following: 1 What is a national housing and population census? 2 Who takes part? 3 Is it compulsory to take part? 4 What types of questions are asked in the census? How have they changed over the years? 5 Why should we have a census? 6 Who has access to the information we provide? 7 How is the census conducted? 8 Conclude your report with an expression of your opinion (agreement/ disagreement) of the answers gathered from your research. Provide constructive suggestions to improve any aspect of the gathering, collating and analysing of the census data. Populations A census represents information or data collected from every member of the population. The term population does not necessarily represent a group of people; it is also used to represent a group of objects with the same defined characteristics. So, the population under study may be the wildlife in a national forest, the number of wattle trees in a park, the soil in a farmer’s field or the number of cars in a country town. In some cases it may be possible to determine the exact extent of the population (the number of wattle trees in the park or the number of cars in a country town); however, it is often not possible to obtain an exact figure for the population (the extent of the wildlife in a forest) because circumstances are constantly changing. Sometimes it is not physically possible to consider the whole population (all the soil in a farmer’s field), as it would not be practical. It is often very costly and time consuming to consider the whole population in a study. For these reasons, we need to obtain information about the population by selecting a sample that can then be studied. A census is conducted when we obtain information from the whole population; however, a survey is conducted on a sample of the population. t i gat
  • Maths A Yr 12 - Ch. 04 Page 170 Wednesday, September 11, 2002 4:07 PM 170 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d The particular circumstances determine the status of the body being studied (whether it represents the population or a sample of the population). Consider, for example, your Mathematics A class. If we were to try to determine the number of left-handed people in your school who studied Mathematics A, and there was only one such class in your school, then your class would be regarded as the whole population. If, on the other hand, there were several Mathematics A classes in your school, then your class would be considered a sample of the population. Samples It is most important when selecting a sample from a population that the sample represents the population as closely as possible. For this to occur, the characteristics of the sample should occur in the same proportions as they do in the population. There is little point in selecting a sample where this is not the case, for analysis of the sample would lead to misleading conclusions. We often see this occurring when polls are conducted prior to an election. Quite frequently they predict a particular outcome while the election results in a different outcome. WORKED Example 1 In each of the following, state if the information was obtained by census or survey. a A school uses the roll to count the number of students absent each day. b The television ratings, in which 2000 families complete a questionnaire on what they watch over a one-week period. c A light globe manufacturer tests every hundredth light globe off the production line. d A teacher records the examination marks of her class. THINK WRITE a Every student is counted at roll call each morning. b Not every family is asked to complete a ratings questionnaire. c Not every light globe is tested. d The marks of every student are recorded. a Census b Survey c Survey d Census remember remember 1. Before beginning a statistical investigation it is important to identify the target population. 2. The information can be obtained either by: (a) Census — the entire target population is questioned, or (b) Survey — a population sample is questioned such that those selected are representative of the entire target population.
  • Maths A Yr 12 - Ch. 04 Page 171 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 4A WORKED Example 1 171 Populations and samples 1 Copy and complete the following: When we obtain data from the whole population, we conduct a _______________; however, a survey obtains data from a _______________ of the population. 2 A school conducts an election for a new school captain. Every teacher and student in the school votes. Is this an example of a census or a survey? Explain your answer. 3 A questionnaire is conducted by a council to see what sporting facilities the community needs. If 500 people who live in the community are surveyed, is this an example of a census or a survey? 4 For each of the following, state whether a census or a survey has been used. a Two hundred people in a shopping centre are asked to nominate the supermarket where they do most of their grocery shopping. b To find the most popular new car on the road, 500 new car buyers are asked what make and model car they purchased. c To find the most popular new car on the road, the make and model of every new car registered are recorded. d To find the average mark in the mathematics half-yearly examination, every student’s mark is recorded. e To test the quality of tyres on a production line, every 100th tyre is road tested. 5 For each of the following, recommend whether you would use a census or a survey to find: a the most popular television program on Monday night at 7.30 pm b the number of cars sold during a period of one year c the number of cars that pass through the tollgates on the Brisbane Gateway Bridge each day d the percentage of defective computers produced by a company. 6 An opinion poll is conducted to try to predict the outcome of an election. Two thousand people are telephoned and asked about their voting intention. Is this an example of a census or a survey? Samples and sampling When we select a sample from a population, if it has been chosen carefully, it should, upon analysis of the data, yield the same (or very similar) results to those of the population. A decision must be made regarding the size of the sample. In practice, the size chosen is the smallest one that would be considered appropriate in those circumstances and the size that would yield a proportion of the elements close to that occurring in the population.
  • Maths A Yr 12 - Ch. 04 Page 172 Friday, September 13, 2002 9:19 AM 172 es in ion v t i gat n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Sample size The aim of this investigation is to observe how the composition of a sample is affected by the sample size. 1 Take a large packet of mixed coloured jellybeans (200 or more) as the population. (Coloured disks could be substituted.) 2 Place the jellybeans in a container and mix well. Without looking, draw out a sample of 10 in such a way that each jellybean has an equal chance of being selected. This can then be considered a random sample. Count the number of red jellybeans in the sample of 10. 3 Return the sample of 10 jellybeans to the container, mixing them well with the others. Select a random sample of 20 jellybeans, using the same method as before; record the number of red ones. 4 Continue in this manner, returning each sample to the container, mixing them well, then selecting a sample containing 10 more than the previous selection. Record the number of red jellybeans in each of the samples. 5 Generate a table of the format below: Sample size Number of red jellybeans 10 Proportion of red jellybeans (as a decimal) 20 30 … 200 Whole population 6 Enter the data in the first and third columns (sample size and proportion) into a spreadsheet or graphics calculator. Graph the sample size against the proportion of red jellybeans. (Alternatively, this could be graphed on graph paper.) 7 Knowing that the proportion of red jellybeans in the whole population (the final row in the table above) represents the true answer, comment on the effect of the sample size on the composition of the sample. 8 For your particular experiment, what would be the minimum sample size which closely resembles the composition of the population? 9 If a sample is used to predict the composition or characteristics of a population, describe what you feel are the desirable qualities of the sample in order to be a reliable predictor of the composition or characteristics of the population. 10 Repeat the experiment. Comment on the similarities/differences in your results. Sampling methods Several techniques can be employed to select a sample from a population. Some common methods are random sampling, accessibility sampling, systematic sampling, quota sampling, judgmental sampling, stratified sampling, cluster sampling, and capture–recapture sampling. t i gat
  • Maths A Yr 12 - Ch. 04 Page 173 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 173 Random sampling A simple random sample is one for which each element of the population has an equal chance of being chosen. A way in which this can be achieved is by numbering each element of the population then randomly selecting items for the sample by using random digit tables, the random function on a calculator or numbers drawn from a container. es in ion v t i gat n inv io es Random sampling The aim of this investigation is to compare different random sampling techniques as methods of selecting a sample that is representative of the population. Consider selecting a random sample of ten (10) students from your mathematics A class. (Your class is the population in this investigation. You may adjust the sample size if you wish.) 1. Select a characteristic that is present in some of your class members such as brown eyes, fair hair, height above 175 cm and so on. 2. Calculate and record the proportion of the population in your class with this characteristic. 3. Have the students number off 1, 2, 3, … until all students have a number. This number for our purposes may be regarded as the population number. Task 1 Using random digit tables to select a sample Tables of randomly generated digits are published. Below are samples of sets of two-, three- and four-digit random number tables. These tables are generally much larger than the extracts shown. For our purposes, this size will be sufficient. Two-digit random number table 16 79 43 59 41 16 39 29 11 12 13 54 24 09 46 24 93 53 28 82 25 56 61 15 97 82 65 77 94 82 85 41 99 74 09 05 98 89 72 10 71 51 35 29 52 52 89 02 92 96 02 81 92 89 17 08 04 63 43 03 84 67 19 23 43 11 05 17 08 07 36 36 72 21 86 99 28 41 24 22 23 04 78 05 33 01 66 06 04 57 80 22 99 14 89 15 65 19 06 25 t i gat
  • Maths A Yr 12 - Ch. 04 Page 174 Wednesday, September 11, 2002 4:07 PM 174 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Three-digit random number table 382 093 530 260 651 344 157 738 522 592 452 981 272 886 907 683 894 946 831 521 557 374 900 425 461 145 098 792 793 388 694 914 642 153 901 642 100 851 365 840 435 104 419 685 626 383 326 376 246 586 851 474 369 272 566 488 420 696 272 547 869 681 282 129 194 236 467 014 699 196 895 662 376 612 435 080 818 396 572 809 282 274 363 903 771 370 799 277 636 313 464 680 859 249 093 848 370 303 661 495 Four-digit random number table 4070 8145 3435 0891 8504 6691 5329 3729 6800 6262 7368 6927 7980 6625 7301 0145 8729 8145 5299 3951 8859 8070 3664 1177 1821 3729 0064 3715 8166 2427 3065 6791 3344 9357 2928 3807 7301 9513 0058 4049 6776 6603 1700 5233 0925 5817 3709 3213 1282 8856 7977 8319 6074 7955 2059 4763 8885 8565 0755 5087 4843 0033 1948 2371 5640 9865 2105 5484 8890 6160 7678 3588 7213 5572 6939 2544 2461 3232 9394 0253 8521 9289 5756 9137 6540 5741 1777 2149 4079 5279 9895 0709 0323 7394 5003 2494 6829 4634 3586 6238 The following rules apply to the use of random digit tables. Step 1 Begin at any position in the table (this position being chosen randomly). Step 2 Move in any direction (vertically, horizontally) along a column or row. Step 3 Continue moving in this direction, recording the numbers as you go. Step 4 If you use a three-digit or four-digit table, and you require only one or two-digit numbers in your selection, you may choose to use the digit/s on the left, on the right, on the edges and so on. (Make some random choice, and then be consistent.) Step 5 If the same number is repeated, do not record the number a second time. Step 6 Continue recording until the required total has been reached.
  • Maths A Yr 12 - Ch. 04 Page 175 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 175 1 Use the two-digit random number table to select ten numbers within the range of numbers in your class. 2 Determine the students in your class to whom these numbers refer. 3 Calculate and record the proportion of these students with the characteristic you chose. How closely does it match the population proportion which you have previously calculated? 4 Repeat the experiment using the three-digit random number table, calculating and recording the proportion of students in this sample with your chosen characteristic. 5 Repeat using the four-digit random number table, again calculating and recording the proportion. 6 Compare the results obtained from your three samples with each other and with the population proportion. What conclusion/s can you form? Task 2 Using the random function on a calculator 1 Many scientific and graphing calculators can be set to generate random integers in the range of your population number. (Your teacher will show you, if you are unsure.) 2 Use your calculator to generate ten different random integers. 3 Relate these numbers to specific students in your class. 4 Calculate and record the proportion of students in your sample with your chosen characteristic. 5 Compare this value with the population proportion. Task 3 Using lot sampling This type of sampling is used in drawing lotto winning numbers. 1 Write numbers (up to and including your population number) on small, equally sized pieces of paper and place them in a container. Mix well. 2 Draw the numbers one at a time (without replacing them) until ten numbers have been drawn. 3 Relate these numbers to the relevant students, as before. 4 Calculate and record the proportion of students with your chosen characteristic in this sample. 5 Compare the value with the population proportion. Conclusions 1 Draw up a table to display the results of all your experiments. 2 Compare the results obtained using the various techniques. 3 Did you find one method better than any other? 4 How did the results using these three methods compare with the population result?
  • Maths A Yr 12 - Ch. 04 Page 176 Wednesday, September 11, 2002 4:07 PM 176 es in ion v t i gat n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Generating random integers using a spreadsheet This activity creates a spreadsheet to generate random integers (whole numbers) within a given range. Consider the spreadsheet below. 1 Enter the headings in cells A1, A3, A4, A6, A7, A8 and A10. 2 Leave cells B7 and B8 blank. You will enter values in these cells once you run the spreadsheet. 3 In cell B11, enter the formula =INT(RAND()*($B$8-$B$7+1))+$B$7). This formula will generate a random integer in the range of the value entered in cell B7 to the value entered in cell B8 inclusive. (You will not find a correct value appears until you enter values in cells B7 and B8.) 4 Copy this formula to the region B11 to K20. This will generate 100 random integers in this region. 5 The function F9 will recalculate different sets of random integers. Add this instruction to cell A22. 6 Enter values in B7 and B8. Notice the set of integers produced. Press the F9 key to generate a different set. Continue to generate new sets, making sure that the numbers generated are within the range of those entered in cells B7 and B8. You will find that if you generate large integers you may have to widen columns B to K. 7 Save your spreadsheet and obtain a printout. 8 You may wish to use this spreadsheet for generating two-, three- and four-digit random number tables for your own use. t i gat
  • Maths A Yr 12 - Ch. 04 Page 177 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 177 WORKED Example 2 Use the three-digit random number table on page 174 to select ten students from a numbered class of 30 according to the following rules. Rule 1 Start in the bottom left-hand corner. Rule 2 Snake up and down the columns. Rule 3 Select the two digits on the right as the student number Note: Use the three-digit random number table from page 174. THINK 1 2 3 WRITE The selected numbers must be in the range 1 to 30 inclusive. Moving up the first column on the left, reading the last two digits, there are no numbers in the range. Continue by snaking down the second column and so on, until 10 numbers have been selected (ignore the second occurrence of a number). Give 10 student numbers. Students selected have numbers 14, 4, 19, 30, 25, 29, 12, 3, 26 and 1. Accessibility sampling This method of sampling selects those items that are most accessible. Consider the following: 1. The student body in a school is investigating extended hours for the library. A survey, conducted on students who were using the library after hours, overwhelmingly supported the proposal for extended hours. 2. The same survey, conducted on students in the gymnasium after school, indicated no need for extending the library hours. As can be seen from this example, bias can be introduced into the results of a survey by carefully selecting the sample to either support or refute a cause. A sample drawn only from easily accessible items often does not represent the views of the population. Systematic sampling In this method, the sample items are selected using some system, such as every tenth item, every item on the top left-hand corner of a page or every item in the fifth position of a set of lists. Consider a survey to determine the most popular service provider for Internet subscribers. The sample could be selected by ringing every: • one hundredth name in the telephone directory • last name on each page • name at the top of each list of names on every page. Using the telephone directory to obtain survey data has the obvious disadvantage of excluding those who do not own a telephone and those with an unlisted telephone number. When using a systematic method as a sampling technique, best results are obtained if the population is first arranged randomly.
  • Maths A Yr 12 - Ch. 04 Page 178 Wednesday, September 11, 2002 4:07 PM 178 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Quota sampling The quota technique specifies a particular number of items to be surveyed. Consider the following scenario: A group of businessmen is considering establishing a grammar school (no religious affiliations) in a town of approximately 50 000 people. They decide to conduct a market survey on 1000 people. They specify the composition of the sample as follows. • The group should have 500 males and 500 females. • Within each of those groups, half should be school children and half adult. • The groups must contain people of all religious denominations — not in equal proportions, as they do not occur equally in the community. • From the 500 children selected, there should be 200 from non-government schools and 300 from government schools. Within these specified quotas, the person responsible for choosing the sample can use any sampling strategy. This leaves the sample open to bias, depending on the integrity of those selecting the sample. It also enables substitutions to occur when those originally selected for the sample are not readily accessed. Bearing in mind the problems associated with this type of sampling, this method can prove to be cost-effective and quite reliable in its predictions if the composition of the sample is appropriate and the sample is selected in an unbiased manner. Judgmental sampling Using this method of obtaining a sample, the person conducting the survey must make a judgment as to the composition of the sample. This obviously is reliant on the good judgment of those selecting the sample. Consider, for instance, undertaking a survey on the bus service(s) (or lack thereof) in a city or town. If a judgment was made to select the sample from only those who used the service(s), this could result in an entirely different outcome from what might occur if there had been a balance from both bus-users and non bus-users. Consequently, we should be wary of surveys conducted using this technique (although we probably would not be aware that this technique was the method used). It is timely to reinforce the fact that, when bombarded with statistical facts (and this occurs in our lives daily) we should not accept these figures without question. Stratified sampling When a sample is selected from a population consisting of various strata, or levels, it is important to have the strata or levels in the sample occurring in the same proportions as they do in the population. Consider the situation where a student council body is to be formed from Years 8 to 12 students in a school. It would not seem fair to have an equal number from each of the year levels if there were, for instance, twice as many Year 9 students as there were Year 12 students.
  • Maths A Yr 12 - Ch. 04 Page 179 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 179 WORKED Example 3 The number of students in a school is shown in the table. A student council is to be formed, consisting of 15 members of the student body. The composition of the council must reflect the proportions in the population. How many from each year level should be chosen? Year level Number of students 2 11 10 180 165 200 WRITE Find the total number of students. Determine the proportion in each year level. Total number of students = 750 Proportion of students: 85 Year 12s = -------750 Year 11s = Year 10s = Year 9s = Year 8s = 3 120 8 1 85 9 THINK 12 Multiply these proportions by the sample number and construct a table. Round if necessary. Year level 120 -------750 180 -------750 165 -------750 200 -------750 Number of students Number in sample 6 Write the answer. 85 -------750 × 15 = 1.7; i.e. 2 11 120 120 -------750 × 15 = 2.4; i.e. 2 10 180 180 -------750 × 15 = 3.6; i.e. 4 165 165 -------750 × 15 = 3.3; i.e. 3 200 200 -------750 × 15 = 4;.0 i.e. 4 Total Check to ensure that the total sample number is 15. 85 8 5 12 9 4 750 15 Two students should be chosen from Year 12, two from Year 11, four from Year 10, three from Year 9 and four from Year 8. Cluster sampling This method involves selecting clusters within a population and selecting a sample from within these clusters. The subgroups selected from the population should be identified. Consider the situation where a market survey is to be conducted on the cost of tertiary education. If the chosen clusters included only ones situated in poorer areas of the community, the results of the survey would differ vastly from those occurring from cluster groups consisting of only ones from more affluent areas. It is important,
  • Maths A Yr 12 - Ch. 04 Page 180 Wednesday, September 11, 2002 4:07 PM 180 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d therefore, if using this method of sampling, that the clusters are chosen to be as closely representative of the population as possible. If this is the case, the survey can yield quite reliable results in a far shorter time and at a greatly reduced cost when compared with collecting data from the whole population. Capture–recapture sampling Capture–recapture sampling is particularly useful for estimating populations of items that are difficult or impossible to count, such as plants and animals. It is often necessary to monitor wildlife numbers to prevent the occurrence of plagues and the extinction of species. The technique used is to capture a certain number of the species, tag them, then release them. At a later date, another sample is caught and the number of tagged specimens in the sample observed. From this information, the population of the species can be estimated. This method is best illustrated with a practical example. es in ion v t i gat n inv io es Capture–recapture technique Place a large number of different coloured disks in a container (without counting the number of each colour or the total). The various colours can represent the variety of wildlife (say, fish in a dam). We are interested in determining the number of yellow-belly fish in the dam (represented by the red-coloured disks in the container). 1 Mix the disks thoroughly. 2 Draw out two handfuls of disks; mark the red disks to identify them as being tagged. Count the number of tagged red disks and let this number be t. 3 Replace all these disks in the container; mix well. 4 Draw out a handful of disks; count the number of red disks in the sample (rs); count the number of tagged red disks in the sample (ts). Replace all the disks in the container and mix well. 5 Repeat the process of drawing out a handful of disks, counting the number of red disks and the number of tagged red disks in each sample. Continue until data for ten (10) samples have been obtained. 6 Draw up the table below to collate the sample data. Trial Number red tagged (ts) Number red (rs) 1 2 3 4 5 6 7 8 9 10 Total Σ ts = Σ rs = t i gat
  • Maths A Yr 12 - Ch. 04 Page 181 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 181 7 For a sufficiently large number of trials, we could say that these samples represent the population in miniature. This means that the ratio of tagged red disks in the samples to the number of red disks in the samples should be close to the ratio of the total number of red tagged disks in the population to the total number of red disks (r). ∑ ts t --------- = ∑ rs r Substitute the three known values in the equation; solve to determine the value of ‘r’. (For a capture–recapture of tagged yellow-belly, r would then represent an estimate for the number of yellow-belly fish in the dam.) 8 Count the number of red disks in the container. How close was this number to your estimate, calculated above? This method of sampling and population estimation obviously has its limitations. For the yellow-belly example, we are assuming that the situation in the lake remains relatively stable; that the types of fish are uniformly distributed throughout the lake; that the number of births is roughly equal to the number of deaths and that intensive fishing has not occurred in the lake over the time period between tagging and recapture. Any calculations of populations of species in the wild must be considered as estimates, as we can not be certain of exact numbers. WORKED Example 4 In estimating the number of fish in a lake, 500 fish are caught, tagged then released back into the dam. A week later a batch of 80 fish are caught and 25 of them are found to be tagged. Estimate the number of fish in the dam. THINK 1 The proportion of tagged fish in the population closely resembles the proportion of tagged fish in the sample caught later. 2 Form an equation. 3 Solve the equation. 4 Write the estimate. WRITE Number tagged fish Number tagged in sample ---------------------------------------------- = ------------------------------------------------------------Population of fish Size of sample 500 25 -------- = ----p 80 25p = 500 × 80 500 × 80 p = -------------------25 = 1600 There are an estimated 1600 fish in the dam.
  • Maths A Yr 12 - Ch. 04 Page 182 Wednesday, September 11, 2002 4:07 PM 182 es in ion v t i gat n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d ABS interviewer survey The ABS conducts a census every five years. To monitor changes that might occur between these times, surveys are conducted on samples of the population. The ABS selects a representative sample of the population and interviewers are allocated particular households. It is important that no substitutes occur in the sampling. The interviewer must persevere until the selected household supplies the information requested. It is a legal requirement that selected households cooperate. The following questionnaire is reproduced from the ABS website <www.jaconline.com.au/maths/weblinks>. It illustrates the format and types of questions asked by an interviewer collecting data regarding employment from a sample. MINIMUM SET OF QUESTIONS WHEN INTERVIEWER USED — Q1 to Q17 Q.1. I WOULD LIKE TO ASK ABOUT LAST WEEK, THAT IS, THE WEEK STARTING MONDAY THE … AND ENDING (LAST SUNDAY THE …/YESTERDAY). Q.2. LAST WEEK DID … DO ANY WORK AT ALL IN A JOB, BUSINESS OR FARM? K Go to Q.5 Yes K No K No More Questions Permanently unable to work K No More Questions Permanently not intending to work (if aged 65+ only) Q.3. LAST WEEK DID … DO ANY WORK WITHOUT PAY IN A FAMILY BUSINESS? K Go to Q.5 Yes K No K No More Questions Permanently not intending to work (if aged 65+ only) Q.4. DID … HAVE A JOB, BUSINESS OR FARM THAT … WAS AWAY FROM BECAUSE OF HOLIDAYS, SICKNESS OR ANY OTHER REASON? K Yes K Go to Q.13 No K No More Questions Permanently not intending to work (if aged 65+ only) Q.5. DID … HAVE MORE THAN ONE JOB OR BUSINESS LAST WEEK? Yes K No K Go to Q.7 Q.6. THE NEXT FEW QUESTIONS ARE ABOUT THE JOB OR BUSINESS IN WHICH … USUALLY WORKS THE MOST HOURS. Q.7. DOES … WORK FOR AN EMPLOYER, OR IN … OWN BUSINESS? Employer K Own business K Go to Q.10 Other/Uncertain K Go to Q.9 Q.8. IS … PAID A WAGE OR SALARY, OR SOME OTHER FORM OF PAYMENT? Wage/Salary K Go to Q.12 Other/Uncertain K t i gat
  • Maths A Yr 12 - Ch. 04 Page 183 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability Q.9. 183 WHAT ARE … (WORKING/PAYMENT) ARRANGEMENTS? K Go to Q.13 Unpaid voluntary work K Contractor/Subcontractor K Own business/Partnership K Commission only K Go to Q.12 Commission with retainer K Go to Q.12 In a family business without pay K Go to Q.12 Payment in kind K Go to Q.12 Paid by the piece/item produced K Go to Q.12 Wage/salary earner K Go to Q.12 Other Q.10. DOES … HAVE EMPLOYEES (IN THAT BUSINESS)? Yes K No K Q.11. IS THAT BUSINESS INCORPORATED? Yes No K K Q.12. HOW MANY HOURS DOES … USUALLY WORK EACH WEEK IN (THAT JOB/ THAT BUSINESS/ALL … JOBS)? 1 hour or more K No More Questions Less than 1 hour/no hours K Insert occupation questions if required Insert industry questions if required Q.13. AT ANY TIME DURING THE LAST 4 WEEKS HAS … BEEN LOOKING FOR FULL-TIME OR PART-TIME WORK? Yes, full-time work K Yes, part-time work K No K No More Questions Q.14. AT ANY TIME IN THE LAST 4 WEEKS HAS … Written, phoned or applied in person to an employer for work? Answered an advertisement for a job? Looked in newspapers? Yes No Checked factory notice boards, or used the touchscreens at Centrelink offices? K K K K K AT ANY TIME IN THE LAST 4 WEEKS HAS … Been registered with Centrelink as a jobseeker?K Checked or registered with an employment K agency? K Done anything else to find a job? K Advertised or tendered for work K Contacted friends/relatives K No More Questions Other K No More Questions Only looked in newspapers K No More Questions None of these Q.15. IF … HAD FOUND A JOB COULD … HAVE STARTED WORK LAST WEEK? Yes K No K Don’t know K
  • Maths A Yr 12 - Ch. 04 Page 184 Wednesday, September 11, 2002 4:07 PM 184 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Remaining questions are only required if Duration of Unemployment is needed for output or to derive the long term unemployed. Q.16. WHEN DID … BEGIN LOOKING FOR WORK? Enter Date ......./......./....... Less than 2 years ago DD MM YY ......./......./....... 2 years or more ago DD MM YY ......./......./....... DD MM YY 5 years or more ago Did not look for work K Q.17. WHEN DID … LAST WORK FOR TWO WEEKS OR MORE? Enter Date ......./......./....... Less than 2 years ago DD MM YY ......./......./....... 2 years or more ago DD MM YY ......./......./....... DD MM YY 5 years or more ago Has never worked (for two weeks or more) K No More Questions Reading the questionnaire carefully you will note that, although the questions are labelled 1 to 17, there are only fifteen (15) questions requiring answers (two are introductory statements to be read by the interviewer). Because of directions to forward questions, no individual would be asked all fifteen questions. 1 How many questions would be asked of those who have a job? 2 How many questions would unemployed individuals answer? 3 How many questions apply to those not in the labour force? Choose a topic of interest to you and conduct a survey 1 Design an interview questionnaire of a similar format to the ABS survey, using directions to forward questions. 2 Decide on a technique to select a representative sample of the students in your class. 3 Administer your questionnaire to this sample. 4 Collate your results. 5 Draw conclusions from your results. 6 Prepare a report which details the: a aim of your survey b design of the survey c sample selection technique d results of the survey collated in table format e conclusions.
  • Maths A Yr 12 - Ch. 04 Page 185 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 185 remember remember There are many methods for selecting a sample. Some important methods include: 1. Random sample — chance is the only factor in deciding who is surveyed. This is best done using a random number generator. 2. Stratified sample — those sampled are chosen in proportion to the entire population. 3. Systematic sample — a system is used to choose those who are to be in the sample. 4. Accessibility sample — those within easy access form the sample. 5. Quota sample — a quota is imposed on the number in the sample. 6. Judgmental sample — judgment is made regarding those to be sampled. 7. Cluster sample — those in the sample are chosen from clusters within the population. 8. Capture–recapture sample — used to estimate wildlife populations. 4B WORKED Example 2 Samples and sampling 1 Use the two-digit random number table on page 173. Start at the bottom left-hand corner then snake up and down the columns selecting 10 numbers in the range 50 to 99. 2 Use your calculator to generate 10 random integers in the range 50 to 99. 3 Use your calculator to generate a set of random two-digit integers in the range 01 to 99. Write these numbers in table format. Use your table (and some random selection technique) to select 10 random integers in the range 50 to 99. 4 Compare your answers to questions 1, 2 and 3. Does it appear that three different sets of random numbers resulted? 5 Describe the techniques employed to select samples in each of the following situations. a Drawing student numbers from a hat to select those to attend the athletics carnival b Choosing the best student in each class to form a student council body c Interviewing the students at the school tuck-shop for an opinion regarding the school uniform d Selecting those students in a classroom sitting next to a window to form a debating group e Selecting one quarter of the students from each year level to represent the school at a local function 6 For each of the following, state whether the sample used is an example of random, stratified or systematic sampling. a Every tenth tyre coming off a production line is tested for quality. b A company employs 300 men and 450 women. The sample of employees chosen for a survey contains 20 men and 30 women. c The police breathalyse the driver of every red car. d The names of the participants in a survey are drawn from a hat. e Fans at a football match fill in a questionnaire. The ground contains 8000 grandstand seats and 20 000 general admission seats. The questionnaire is then given to 40 people in the grandstand and 100 people who paid for a general admission seat.
  • Maths A Yr 12 - Ch. 04 Page 186 Wednesday, September 11, 2002 4:07 PM 186 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 7 multiple choice Which of the following is an example of a systematic sample? A The first 20 students who arrive at school each day participate in the survey. B Twenty students to participate in the survey are chosen by a random number generator. C Twenty students to participate in the survey are selected in proportion to the number of students in each school year. D Ten boys and 10 girls are chosen to participate in the survey. 8 multiple choice Which of the following statistical investigations would be practical to complete by census? A A newspaper wants to know public opinion on a political issue. B A local council wants to know if a skateboard ramp would be popular with young people in the area. C An author wants a cricket player’s statistics for a book being written. D An advertising agency wants to know the most watched program on television. WORKED Example 3 9 The table below shows the number of students in each year at a NSW secondary school. Year 7 8 9 10 11 12 Total No. of students 90 110 90 80 70 60 500 If a survey is to be given to 50 students at the school, how many from each Year should be chosen if a stratified sample is used? 10 A company employs 300 men and 200 women. If a survey of 60 employees using a stratified sample is completed, how many people of each sex participated? 11 The table below shows the age and gender of the staff of a corporation. Age Male Female 20–29 61 44 30–39 40 50 40–49 74 16 50–59 5 10 A survey of 50 employees is to be done. Using a stratified survey, suggest the breakdown of people to participate in terms of age and sex. 4.1 12 The fish population of a river is to be estimated. A sample of 400 fish are caught, tagged and released. The next day another sample of 400 fish are caught and 40 of 4 them have tags. Estimate the fish population of the river. WORKED Example 13 A colony of bats live near a school. Wildlife officers try to estimate the bat population by catching 60 bats and tagging them. These bats are then released and another 60 are caught, 9 of which had tags. Estimate the size of the bat population living near the school.
  • Maths A Yr 12 - Ch. 04 Page 187 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 187 14 A river’s fish population is to be estimated. On one day 1000 fish are caught, tagged and released. The next day another 1000 fish are caught. Estimate the population of the river if in the second sample of fish: a 100 had tags b 40 had tags c 273 had tags. 15 A certain fish population is said to be endangered if the population falls below 15 000. A sample of 1000 fish are caught, tagged and released. The next day another sample of 1200 fish are caught, 60 of which had tags. Is the fish population endangered? 16 To estimate the population of a lake, 300 fish were caught. These 300 fish (150 trout, 100 bream and 50 perch) were tagged and released. A second sample of fish were then caught. Of 100 trout, 24 had tags; of 100 bream, 20 had tags; and of 100 perch, 8 had tags. a Estimate the number of trout in the lake. b Estimate the number of bream in the lake. c Estimate the number of perch in the lake. 17 The kangaroo population in a national park is to be estimated. On one day, 100 kangaroos were caught and tagged before being released. (Note: For each sample taken, the kangaroos are released after the number with tags is counted.) a The next day 100 were caught, 12 of which had tags. Estimate the population. b The following day another estimate was done. This time 200 were caught and 20 had tags. Estimate the population again. c A third estimate was done by catching 150 and this time 17 had tags. What will the third estimate for the population be? d For a report, the average of the three estimates is taken. Calculate this average. 1 For each of the following (1 to 3), state whether a census or survey has been used. 1 A school votes to elect a school captain. 2 Five hundred drivers complete a questionnaire on the state of a major highway. 3 All insurance customers complete a questionnaire when renewing their policies. For each of the following (4 to 10), state the type of sample that has been taken. 4 A computer selects 500 phone numbers. 5 Every one thousandth person in the telephone book is selected. 6 Private and business telephone numbers are chosen in proportion to the number of private and business listings. 7 Residents from three suburbs are selected from a town. 8 The best runners from each year level are selected. 9 Twenty students are chosen from a class. 10 All the student numbers are placed in a hat then a sample is chosen.
  • Maths A Yr 12 - Ch. 04 Page 188 Wednesday, September 11, 2002 4:07 PM 188 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Bias No doubt you have heard the comment, ‘There are lies, damned lies and statistics’. This implies that we should be wary of statistical figures quoted. Indeed, we should always make informed decisions of our own and not simply accept the mass of statistics that bombards us through the media. Bias can be introduced into statistics by: 1. questionnaire design 2. sampling bias 3. the interpretation of results. Bias in questionnaire design Consider a survey designed to collect data on opinions relating to culling kangaroo numbers in Australia. The questions may be designed to be emotive in nature. Respondents in these situations feel obliged to show compassion. Posing a question in the form, ‘The kangaroo is identified as a native Australian animal, not found anywhere else in the world. Would you be in favour of culling kangaroos in Australia?’, would almost certainly encourage a negative response. Using a leading question (one which leads the respondent to answer in a particular way) can cause bias to creep into responses. Rephrasing the question in the form, ‘As you know, kangaroos cause massive damage on many farming properties. You’d agree that their numbers need culling, wouldn’t you?’, would encourage a positive response. Using terminology that is unfamiliar to a large proportion of those being surveyed would certainly produce unreliable responses. ‘Do you think we need to cull herbivorous marsupial mammals in Australia?’, would cause most respondents to answer according to their understanding of the terms used. If the survey was conducted by an interviewer, the term could be explained. In the case of a self-administered survey, there would be no indication of whether the question was understood or not. Sampling bias As discussed previously, an ideal sample should reflect the characteristics of the population. Statistical calculations performed on the sample would then be a reliable indication of the population’s features. Selecting a sample using a non-random method, as discussed earlier, generally tends to introduce an element of bias.
  • Maths A Yr 12 - Ch. 04 Page 189 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 189 Particular responses can be selected from all those received. In collecting information on a local issue, an interviewer on a street corner may record responses from many passers-by. From all the data collected, a sample could be chosen to support the issue, or alternatively another sample could be chosen to refute the same issue. A sample may be selected under abnormal conditions. Consider a survey to determine which lemonade was more popular – Kirks or Schweppes. Collecting data one week when one of the brands was on special at half price would certainly produce misleading results. Data are often collected by radio and television stations via telephone polls. A ‘Yes’ response is recorded on a given phone-in number, while the ‘No’ respondents are asked to ring a different phone-in number. This type of sampling does not produce a representative sample of the population. Only those who are highly motivated tend to ring and there is no monitoring of the number of times a person might call, recording multiple votes. When data are collected from mailing surveys, bias results if the non-response rate is high (even if the selected sample was a random one). The responses received often represent only those with strong views on the subject, while those with more moderate views tend to lack representation in their correct proportion. Statistical interpretation bias Once the data have been collected, collated and subjected to statistical calculations, bias may still occur in the interpretation of the results. Misleading graphs can be drawn leading to a biased interpretation of the data. Graphical representations of a set of data can give a visual impression of ‘little change’ or ‘major change’ depending on the scales used on the axes (we learned about misleading graphs in Year 11). The use of terms such as ‘majority’, ‘almost all’ and ‘most’ are open to interpretation. When we consider that 50.1% ‘for’ and 49.9% ‘against’ represents a ‘majority for’ an issue, the true figures have been hidden behind words with very broad meanings. Although we would probably not learn the real facts, we should be wary of statistical issues quoted in such terms. es in ion v t i gat n inv io es Bias in statistics The aim of this investigation is to study statistical data that you suspect to be biased. Conduct a search of newspapers, magazines or any printed material to collect instances of quoted statistics that you believe to be biased. There are occasions when television advertisements quote statistical figures as a result of questionable sampling techniques. For each example, discuss: 1 the purpose of the survey 2 how the data might have been collected 3 the question(s) that may have been asked (try to pose the question(s) in a variety of ways to influence different outcomes) 4 ways in which bias might be introduced 5 variations in interpretation of the data. t i gat
  • Maths A Yr 12 - Ch. 04 Page 190 Thursday, September 12, 2002 11:09 AM 190 t i gat t i gat es in ion v t i gat n inv io es es in ion v n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Biased sampling Discuss the problem that would be caused by each of the following biased samples. 1 A survey is to be conducted to decide the most popular sport in a local community. A sample of 100 people was questioned at a local football match. 2 A music store situated in a shopping centre wants to know the type of music that it should stock. A sample of 100 people was surveyed. The sample was taken from people who passed by the store between 10 and 11 am on a Tuesday. 3 A newspaper conducting a Gallup poll on an election took a sample of 1000 people from the Gold Coast. Spreadsheets creating misleading graphs t i gat We looked at creating misleading graphs in the Year 11 text. Let us practise that investigation again to reinforce the techniques used to produce misleading graphs. Consider the data in this table. Year Wages % increase in wages Profits % increase in profits 1985 1990 1995 2000 6 25 1 20 9 50 1·5 50 13 44 2·5 66 20 54 5 100 Graph 2 We shall use a spreadsheet to produce misleading graphs based on these data. Graph 1 Graph 3
  • Maths A Yr 12 - Ch. 04 Page 191 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 191 1 Enter the data as indicated in the spreadsheet (see page 190). 2 Graph the data using the Chart Wizard. You should obtain a graph similar to Graph 1. 3 Copy and paste the graph twice within the spreadsheet. 4 Graph 2 gives the impression that the wages are a great deal higher than the profits. This effect was obtained by reducing the horizontal axis. Experiment with shortening the horizontal length and lengthening the vertical axis. 5 In Graph 3 we get the impression that the wages and profits are not very different. This effect was obtained by lengthening the horizontal axis and shortening the vertical axis. Experiment with various combinations. 6 Print out your three graphs and examine their differences. Note that all three graphs have been drawn from the same data using valid scales. A cursory glance leaves us with three different impressions. Clearly, it is important to look carefully at the scales on the axes of graphs. Another method which could be used to change the shape of a graph is to change the scale of the axes. 7 Right click on the axis value, enter the Format axis option, click on the Scale tab, then experiment with changing the scale values on both axes. Techniques such as these are used to create different visual impressions of the same data. 8 Use the data in the table to create a spreadsheet, then produce two graphs depicting the percentage increase in both wages and profits over the years giving the impression that: a the profits of the company have not grown at the expense of wage increases (the percentage increase in wages is similar to the percentage increase in profits) b the company appears to be exploiting its employees (the percentage increase in profits is greater than that for wages). WORKED Example 5 Discuss why the following selected samples could provide bias in the statistics collected. a In order to determine the extent of unemployment in a community, a committee phoned two households (randomly selected) from each page of the local telephone book during the day. b A newspaper ran a feature article on the use of animals to test cosmetics. A form beneath the article invited responses to the article. THINK WRITE a a Phoning two randomly selected households per page of the telephone directory is possibly a representative sample. However, those without a home phone and those with unlisted numbers could not form part of the sample. An unanswered call during the day would not necessarily imply that the resident was at work. 1 Consider phone book selection. 2 Consider those with no phone contact. 3 Consider the hours of contact. Continued over page
  • Maths A Yr 12 - Ch. 04 Page 192 Wednesday, September 11, 2002 4:07 PM 192 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d THINK WRITE b b Selecting a sample from a circulated newspaper excludes those who do not have access to the paper. In emotive issues such as these, only those with strong views will bother to respond, so the sample will represent extreme points of view. 1 Consider the newspaper circulation. 2 Consider the urge to respond. remember remember Bias can be introduced at each of the following stages: 1. questionnaire design 2. sampling bias 3. interpretation of results. 4C Bias 1 Rewrite the following questions, removing any elements or words that might contribute to bias in responses. a The poor homeless people, through no fault of their own, experience great hardship during the freezing winter months. Would you contribute to a fund to build a shelter to house our homeless? b Most people think that, since we’ve developed as a nation in our own right and broken many ties with Great Britain, we should adopt our own national flag. You’d agree with this, wouldn’t you? c You’d know that our Australian 50 cent coin is in the shape of a dodecagon, wouldn’t you? d Many in the workforce toil long hours for low wages. By comparison, politicians seem to get life pretty easy when you take into account that they only work for part of the year and they receive all those perks and allowances. You’d agree, wouldn’t you? 2 Rewrite parts a to d in question 1 so that the expected response is reversed. 3 What forms of sampling bias can you identify in the following samples? a Choosing a sample from students on a bus travelling to a sporting venue to answer 5 a questionnaire regarding sporting facilities at their school b Sampling using ‘phone-in’ responses to an issue viewed on a television program. c Promoting the results of a mail-response survey when fewer than half the selected sample replied. d Comparing the popularity of particular chocolate brands when one brand has a ‘two for the price of one’ special offer. e Choosing a Year 8 class and a Year 12 class to gather data relating to the use of the athletics oval after school. WORKED Example
  • Maths A Yr 12 - Ch. 04 Page 193 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 193 Australian currency 4 Why does this graph produce a biased visual impression? Value of A$ compared with US $1 51c 50c 49c 9 May 11 May 12 May Date 5 Comment on the following statement: ‘University tests have demonstrated that Double-White toothpaste is consistently used by the majority of teenagers and is more effective than most other toothpastes.’ 6 Surveys are conducted on samples to determine the characteristics of the population. Discuss whether the samples selected would provide a reliable indication of the population’s characteristics. Sample Population a Year 11 students Student drivers b Year 12 students Students with part-time jobs c Residents attending a Residents of a suburb neighbourhood watch meeting d Students in the school choir Music students in the school e Cars in a shopping centre car park Models of Holden cars on the road f Males at a football match Popular TV programs g Users of the local library Popular teenage magazines es in ion v t i gat n inv io es Bias It is important that a sample is chosen randomly to avoid bias. Consider the following situation. The government wants to improve sporting facilities in Brisbane. They decide to survey 1000 people about what facilities they would like to see improved. To do this, they choose the first 1000 people through the gate at a football match at Lang Park. In this situation it is likely that the results will be biased towards improving facilities for football. It is also unlikely that the survey will be representative of the whole population in terms of equality between men and women, age of the participants and ethnic backgrounds. Questions can also create bias. Consider asking the question, ‘Is football your favourite sport?’ The question invites the response that football is the favourite sport rather than allowing a free choice from a variety of sports by the respondent. Consider each of the following surveys and discuss: a any advantages, disadvantages and possible causes of bias b a way in which a truly representative sample could be obtained. t i gat
  • Maths A Yr 12 - Ch. 04 Page 194 Wednesday, September 11, 2002 4:07 PM 194 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 1 Surveying food product choices by interviewing customers of a large supermarket chain as they emerge from the store between 9.00 am and 2.00 pm on a Wednesday. 2 Researching the popularity of a government decision by stopping people at random in a central city mall. 3 Using a telephone survey of 500 people selected at random from the phone book to find if all Australian States should have Daylight Saving Time in summer. 4 A bookseller uses a public library database to survey for the most popular novels over the last three months. 5 An interview survey about violence in sport taken at a rugby league football venue as spectators leave. Contingency tables Fair 11 25 9 45 Dark 19 51 28 98 Red 17 27 13 57 Total 47 103 50 200 Male col Total air Fair le h Dark Hair colour of 200 couples 60 50 40 30 20 10 Fair 0 Red Dark Dark Red Fair Female hair colour Ma Red Frequency Female ou r When sample data are collected, it is often useful to break the data into categories. A two-way frequency table or contingency table displays data that have been classified into different types. Consider, for example, data collected on the hair colour of 200 couples. It may be represented in a table such as the one below. These data could be represented as a 3-dimensional bar chart, as shown below.
  • Maths A Yr 12 - Ch. 04 Page 195 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 195 Female Red Fair Total Fair Male Dark 24% 56% 20% 100% Dark 19% 52% 29% Hair colour of male Although this graph displays the data so that comparisons are readily visible, the chart is difficult to read and figures can not be read accurately. If we considered representing the data as a 2-dimensional segmented bar chart, this could be done in two ways. Splitting the data into categories based on the hair colour of the male and calculating percentages in each category would yield the following figures and segmented bar graph: Hair colour of 200 couples Fair 100% Red 30% 47% 23% 100% Dark Red 0 20% 40% 60% 80%100% Hair colour of female Red Dark Fair Splitting the data into categories based on the hair colour of the female and calculating percentages in each category would yield the following figures and segmented bar graph: 100% Female Hair colour of 200 couples 80% Red Dark Fair Fair 23% 24% 18% Dark 41% 50% 56% Red 36% 26% 26% Total 100% 100% 100% 60% 40% 20% Male It is obvious that the interpretation of the data depends on the reference basis. We may wish to interview those couples where the male is fair haired and the female dark haired. Note that this represents 25 couples. What if we talk about percentages? Comparing the percentages in the two tables, it can be seen that: 1. 56% of fair-haired males have female partners with dark hair 2. 24% of dark-haired females have male partners with fair hair. These percentages have vastly different values, yet they both describe the same set of 25 couples of fair-haired males and darkhaired females. It is important, particularly when dealing with contingency tables, to consider the reference basis for percentages. 0 Red Dark Fair Hair colour of female Hair colour of male Red Dark Fair
  • Maths A Yr 12 - Ch. 04 Page 196 Wednesday, September 11, 2002 4:07 PM 196 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d WORKED Example 6 A new test was designed to assess the reading ability of students entering high school. The results were used to determine if the students’ reading level was adequate to cope with high school. The students’ results were then checked against existing records. Of the 150 adequate readers who sat for the test, 147 of them passed. Of the 50 inadequate readers who sat for the test, 9 of them passed. Present this information in a contingency table. THINK WRITE Draw up the table showing the number of students whose reading was adequate and the number of students for whom the results of the new test were confirmed. Test results Passed Did not pass Total 147 3 150 9 41 50 156 44 Adequate readers Inadequate readers Total When information on a test is presented in a contingency table, conclusions can be made about the accuracy of the test. WORKED Example 7 A batch of sniffer dogs is trained by customs to smell drugs in suitcases. Before they are used at airports they must pass a test. The results of that test are shown in the contingency table below. Test results Detected Not detected Total No of bags with drugs 24 1 25 No. of bags without drugs 11 164 175 Total 35 165 a b c d How many bags did the sniffer dogs examine? In how many bags did the dogs detect drugs? In what percentage of bags without drugs did the dogs incorrectly detect drugs? Based on the above results, what percentage of the time will the dogs not detect a bag carrying drugs?
  • Maths A Yr 12 - Ch. 04 Page 197 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability THINK WRITE a Add both total columns; they should give the same result. a 200 bags were examined. b The total of the detected column. b The dogs detected drugs in 35 bags. c There were 175 bags without drugs but dogs incorrectly detected them in 11 bags. Write this as a percentage. c Percentage incorrectly detected d Of 25 bags with drugs, 1 went undetected. Write this as a percentage. 197 1 d Percentage not detected = ----- × 100% 25 Percentage not detected = 4% = 11 -------175 × 100% = 6.3% As a result of studying a contingency table, we should also be able to make judgements about the information given in the tables. In the previous worked example only one bag out of 25 with drugs went undetected. Although the dogs incorrectly detected drugs in 11 bags that did not have drugs, they still have an overall accuracy of 94% as shown by the calculation [(24 + 164) ÷ 200] × 100%. Many contingency tables will require you to make your own value judgements about the conclusions established. For example, the 94% overall accuracy recorded may be considered ‘very acceptable’. WORKED Example 8 The contingency table at right shows the Full-time Part-time composition of the employees of a small law firm. Female 4 11 a Extend the table to show totals in all Male 30 5 categories and an overall total. b Draw a table showing percentages with respect to type of employment (full or part-time). c Redraw the table showing percentages based on the gender of the employee. d What percentage of females work full time? e What percentage of full-time workers are female? f Explain why, in the workforce in general, it would be easier to estimate an answer to part d than it would to obtain an estimate for part e. THINK WRITE a Add the numbers in the cells for all the rows and columns and enter the totals. Check that the overall total is consistent for the rows and columns. a Full-time Part-time Total 4 11 15 Male 30 5 35 Total 34 16 50 Female Continued over page
  • Maths A Yr 12 - Ch. 04 Page 198 Wednesday, September 11, 2002 4:07 PM 198 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d THINK WRITE b Percentages are based on totals in columns. The totals in the columns are on the denominator when calculating percentages. b Full-time Female 4 ----34 × 100 = 12% 11 ----16 × 100 = 69% Male 30 ----34 × 100 = 88% 5 ----16 × 100 = 31% Total 100% c Percentages are based on totals in rows. The totals in the rows are on the denominator when calculating percentages. c d Full time d ----------------------------- × 100 = Female total This is based on female totals in table c. 2 Write the answer. 1 This is based on fulltime totals in table b. 2 e 1 Write the answer. f An estimate is easier if the required sample is smaller. t i gat Locality by longitude es in ion v t i gat n inv io es Part-time 100% Full-time Part-time Total Female 4 ----15 × 100 = 27% 11 ----15 × 100 = 73% 100% Male 30 ----35 × 100 = 86% 5 ----35 × 100 = 14% 100% 4 ----15 × 100 = 27% Percentage of females who work full time = 27%. Female e --------------------------------- × 100 = Full-time total 4 ----34 × 100 = 12% Percentage of full-time workers who are female = 12%. f It would be easier to obtain an estimate for the percentage of females who work full time because the number of females is fewer than the number of full-time workers. This means that the sample size would be smaller. Climatic influences in Queensland For this activity we will investigate relationships between geographical features that influence our weather. We could pose questions such as: What effect does latitude have on temperature? What factor has the main influence on day length? What part does elevation play in influencing temperature? This investigation should be conducted using a spreadsheet. Data on Queensland towns from the Bureau of Meteorology’s website have been collated and shown in two spreadsheet tables which follow. Graphs have been provided for stimulus when investigating relationships between the variables in the spreadsheet. 1 Retrieve the two Excel files from the CD provided with this book (the longitude file does not contain the graphs displayed here). Locality by latitude 2 Experiment by graphing pairs of variables to determine whether a relationship exists between the pair. You may wish to sort the spreadsheet using a different classification.
  • Maths A Yr 12 - Ch. 04 Page 199 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 199 3 Write a report on the geographical factors influencing daily temperatures and sunlight hours. Support your conclusions by providing graphical evidence. 4 Sites on the World Wide Web provide weather conditions for many places throughout the world. Conduct a search to collate data from locations around the globe. Investigate the geographical features which might have an influence on their weather.
  • Maths A Yr 12 - Ch. 04 Page 200 Wednesday, September 11, 2002 4:07 PM 200 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d We are constantly bombarded with statistics, some of which are a valid interpretation of the data, and some of which are not. On occasions, the misuse of statistics may be unintentional or through ignorance, but there are occasions when misleading figures are quoted intentionally. If the raw data are available, it is wise to check the validity of any claims. WORKED Example 9 The ABS data from the 1996 Census for the Chapel Hill area in Brisbane are shown here. Note: Income figures are weekly income expressed in AUD. Australian Bureau of Statistics 1996 Census of Population and Housing Chapel Hill (Statistical Local Area) — Queensland B01 Selected Characteristics — Chapel Hill Total persons (a) Aged 15 years and over (a) Aboriginal Torres Strait Islander Both Aboriginal and Torres Strait Islander (b) Australian born Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA Born overseas: Other country (d) Born overseas: Total Speaks English only and aged 5 years and over Speaks language other than English (e) and aged 5 years and over Australian citizen Australian citizen aged 18 years and over Unemployed Employed In the labour force Not in the labour force Enumerated in private dwelling (a) Enumerated in non-private dwelling (a) Persons enumerated same address 5 years ago Persons enumerated different address 5 years ago Overseas visitor Chapel Hill Median age Median individual income Median household income Average household size Male 4 824 3 761 3 0 0 3 405 696 587 1 283 3 936 459 4 239 3 027 141 2 677 2 818 854 4 815 9 2 217 2 163 54 Female 5 112 4 070 3 0 0 3 704 637 605 1 242 4 230 491 4 515 3 315 132 2 427 2 559 1 393 5 085 27 2 381 2 324 75 Persons 9 936 7 831 6 0 0 7 109 1 333 1 192 2 525 8 166 950 8 754 6 342 273 5 104 5 377 2 247 9 900 36 4 598 4 487 129 34 415 1 209 3.1 When discussing the probability of unemployment in this area, a resident proudly said that only 5% of the unemployed in the area were male. a Construct a contingency table displaying the employment/unemployment status of the residents in this area. b Use your contingency table to discuss the validity of the claim. THINK WRITE a a 1 2 Extract the employment and unemployment figures for males and females from the table. Form a contingency table adding totals to rows and columns. Male Female Total 141 132 273 Employed 2677 2427 5104 Total 2818 2559 5377 Unemployed
  • Maths A Yr 12 - Ch. 04 Page 201 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 201 THINK WRITE b -------b P(unemployed being male) = 141 × 100 273 P(unemployed being male) = 52% 52% of the unemployed are males. 3 141 P(male being unemployed) = ----------- × 100 2818 P(male being unemployed) = 5% 5% of the males are unemployed. The statement is not correct. The resident should have said that only 5% of the males were unemployed. extension es in ion v t i gat n inv io es Measures of location and spread Comparison extension extension of data sets Contingency tables from census data The table below displays data collected from the 1996 census. It shows the numbers of males and females in various forms of employment in the 15–19 years age bracket and the totals of all ages for each category. Industry Agriculture, Forestry and Fishing Mining Manufacturing Electricity, Gas and Water Supply Construction Wholesale Trade Retail Trade Accommodation, Cafes and Restaurants Transport and Storage Communication Services Finance and Insurance Property and Business Services Government Administration and Defence Education Health and Community Services Cultural and Recreational Services Personal and Other Services Non-classifiable economic units Not stated Total Australian bureau of statistics 15–19 years Male Female 9 986 2 685 1 349 339 33 831 10 168 676 191 21 162 1 664 11 441 5 367 91 818 127 466 17 917 25 019 4 135 2 560 1 213 869 2 001 4 981 11 164 13 930 5 877 2 999 3 685 4 071 3 059 13 388 6 658 7 016 4 161 11 212 3 808 2 000 9 133 8 175 243 074 244 100 Total Male 225 679 75 497 695 007 49 427 419 394 306 456 500 105 157 519 250 385 102 016 127 364 410 414 225 316 184 287 161 489 93 066 143 942 63 045 81 643 4 272 051 Female 98 651 10 764 270 029 9 272 64 690 140 089 536 543 197 768 81 693 48 172 169 092 339 781 148 111 365 776 563 689 85 989 133 966 40 097 70 096 3 364 268 1996 Census of Population and Housing Australia 7688965.464 sq kms Persons 324 330 86 261 965 036 58 699 484 084 446 545 1 036 648 355 287 332 078 150 188 296 456 750 195 373 427 540 063 725 178 179 055 277 908 103 142 151 739 7 636 319 D- C extension AC ER T IVE An error frequently occurs when statistics of this kind are quoted. The reference basis for the probability percentage should be carefully noted. When using the Maths Quest Maths A Year 12 CD-ROM, click here for more about statistical measures. M 2 Calculate the probability of an unemployed person being a male; that is, number of unemployed males ---------------------------------------------------------------------- × 100. total number of unemployed Calculate the probability of a male being unemployed; that is, number of unemployed males ---------------------------------------------------------------------- × 100. total number of males Compare these probability figures with the statement and make a decision. INT 1 RO t i gat
  • Maths A Yr 12 - Ch. 04 Page 202 Wednesday, September 11, 2002 4:07 PM 202 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Using these data, we could form a contingency table to compare the proportion of 15–19 year old males and females in, for example, the retail trade. (Confirm the figures in the table below.) Male Female Total 91 818 127 466 219 284 Non-retail trade 151 256 116 634 267 890 Total 243 074 244 100 487 174 Retail trade 1 Use this table to: a determine the percentage of male workers who are in the retail trade b calculate the percentage of retail workers who are male c explain why these two percentages are different d plan a strategy to survey the workforce for an estimate of the number of males in the retail trade. 2 Choose another category of the workforce from the census data. Construct a contingency table, then answer questions similar to those above. 3 Reports from early recordings of census data showed that more than 50% of Australians lived and worked on the land, providing food and clothing for our population. Most recent reports indicate that only 4% of Australians now work the land, providing for the remaining 96%. Use the data in the table to confirm that this is indeed true. 4 It is important for future planning that these changes are recorded and made known. Search the World Wide Web or reference books to obtain industry data from the 2001 census. Examine the figures, noting changing trends in industry employment. Report on your findings. remember remember 1. Contingency tables can be used to display data that have been classified into different types. 2. The table displays 2 variables which have been split into categories in a horizontal and a vertical direction. 3. Calculations can be made with regard to a variety of reference bases. 4D WORKED Example 6 Contingency tables 1 A test is developed to test for infection with the flu virus. To test the accuracy, the following 500 people are tested. • Of the 100 people who are known to have the flu who are tested, the test returns 98 positive results. • Of the 400 people who are known not to be infected with the virus who are tested, 12 false positives are returned.
  • Maths A Yr 12 - Ch. 04 Page 203 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 203 Display this information in the contingency table below. Test results Accurate Not accurate Total With virus Without virus Total 2 One thousand people take a lie detector test. Of 800 people known to be telling the truth, the lie detector indicates that 23 are lying. Of 200 people known to be lying, the lie detector indicates that 156 are lying. Present this information in a contingency table. WORKED Example 7 3 The contingency table shown below displays the information gained from a medical test screening for a virus. A positive test indicates that the patient has the virus. Test results Accurate Not accurate 45 3 48 Without virus 922 30 952 Total 967 33 1000 With virus Total a How many patients were screened for the virus? b How many positive tests were recorded? (that is, in how many tests was the virus detected?) c What percentage of test results were accurate? d Based on the medical results, if a positive test is recorded what is the percentage chance that you actually have the virus? 4 The contingency table below indicates the results of a radar surveillance system. If the system detects an intruder, an alarm is activated. Test results Alarm activated Intruders No intruders Total Not activated Total 40 8 48 4 148 152 44 156 200 a Over how many nights was the system tested? b On how many occasions was the alarm activated? c If the alarm is activated, what is the percentage chance that there actually is an intruder? d If the alarm was not activated, what is the percentage chance that there was an intruder? e What was the percentage of accurate results over the test period? f Comment on the overall performance of the radar detection system. 4.2 4.3
  • Maths A Yr 12 - Ch. 04 Page 204 Wednesday, September 11, 2002 4:07 PM 204 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d The information below is to be used in questions 5 to 7. A test for a medical disease does not always produce the correct result. A positive test indicates that the patient has the condition. The table indicates the results of a trial on a number of patients who were known to either have the disease or known not to have the disease. Test results Accurate Not accurate Total 57 3 60 Without disease 486 54 540 Total 543 57 600 With disease 5 multiple choice The overall accuracy of the test is: A 90% B 90.5% C 92.5% D 95% 6 multiple choice Based on the table, what is the probability that a patient who has the disease has it detected by the test? A 90% B 90.5% C 92.5% D 95% 7 multiple choice Which of the following statements is correct? A The test has a greater accuracy with positive tests than with negative tests. B The test has a greater accuracy with negative tests than with positive tests. C The test is equally accurate with positive and negative test results. D There is insufficient information to compare positive and negative test results. 8 Airport scanning equipment is tested by scanning 200 pieces of luggage. Prohibited items were placed in 50 bags and the scanning equipment detected 48 of them. The equipment detected prohibited items in five bags that did not have any forbidden items in them. a Use the above information to complete the contingency table below. Test results Accurate Not accurate Total Bags with prohibited items Bags with no prohibited items Total b Use the table to answer the following: i What percentage of bags with prohibited items were detected? ii What was the percentage of ‘false positives’ among the bags that had no prohibited items? iii What percentage of prohibited items pass through the scanning equipment undetected? iv What is the overall percentage accuracy of the scanning equipment?
  • Maths A Yr 12 - Ch. 04 Page 205 Wednesday, September 11, 2002 4:07 PM 205 Chapter 4 Populations, samples, statistics and probability 9 In some cases it is easier to count numbers in a particular category by considering a different population. In each of the following pairs of proportions, which one would be easier to determine? a ii Proportion of males who are left-handed. ii Proportion of left-handers who are males. b ii Proportion of mathematics A students in your school who are over 16. ii Proportion of over 16 year olds in your school who study mathematics A. c ii Proportion of state school students who live in Queensland. ii Proportion of Queensland school students who attend a state school. 10 Refer to the 1996 census data on industry on page 201. a Draw up a contingency table showing the 15–19 year old males and females 8 employed in education compared with those of this age group employed in other industries. b Extend your table to show totals in all categories as well as an overall total. c Draw up a table showing percentages with respect to gender. d Redraw your table showing percentages based on industry. e What percentage of females are employed in education? f What percentage of those employed in education are female? g At some period in between census times, if it were necessary to obtain an estimate of the number of females employed in education by surveying a sample, what approach would you recommend? WORKED Example 11 Repeat question 10 using the ‘totals’ data. Comment on any differences or similarities in your answers. Use the following data collected from the 1996 census for questions 12 and 13. Note: Income figures are weekly income in AUD. 9 WORKED Example Australian Bureau of Statistics 1996 Census of Population and Housing B01 Selected Characteristics — Inala Total persons (a) Aged 15 years and over (a) Aboriginal Torres Strait Islander Both Aboriginal and Torres Strait Islander (b) Australian born Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA Born overseas: Other country (d) Born overseas: Total Speaks English only and aged 5 years and over Speaks language other than English (e) and aged 5 years and over Australian citizen Australian citizen aged 18 years and over Unemployed Employed In the labour force Not in the labour force Enumerated in private dwelling (a) Enumerated in non-private dwelling (a) Persons enumerated same address 5 years ago Persons enumerated different address 5 years ago Overseas visitor Inala Median age Median individual income Median household income Average household size Inala (Statistical Local Area) — Queensland Male 6 401 4 516 398 42 8 4 066 665 1 396 2 061 4 000 1 454 5 424 3 521 605 2 036 2 641 1 728 6 397 4 2 986 2 319 13 Female 6 886 5 083 482 51 12 4 468 694 1 462 2 156 4 483 1 494 5 807 4 001 349 1 403 1 752 3 144 6 886 0 3 343 2 493 15 30 187 412 2.8 Persons 13 287 9 599 880 93 20 8 534 1 359 2 858 4 217 8 483 2 948 11 231 7 522 954 3 439 4 393 4 872 13 283 4 6 329 4 812 28
  • Maths A Yr 12 - Ch. 04 Page 206 Wednesday, September 11, 2002 4:07 PM 206 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 12 a Construct a contingency table displaying males and females ‘In the labour force’ and ‘Not in the labour force’, showing all totals. b From your contingency table calculate: i the percentage of females in the labour force ii the percentage of those in the labour force who are female. c Would it be correct to say that more than 39% of the females are in the labour force? Explain. 4.2 13 a Construct a contingency table displaying the number of ‘Australian born’ and ‘Overseas born’ males and females in the community. Show all totals. b Is it correct to claim that almost half the males in the community were born overseas? Explain. Applications of statistics and probability By exploring data collected from samples (provided the samples have been chosen carefully) we are able to estimate characteristics of the population. We can determine past trends and speculate on future trends. Through a series of investigations we will explore the application of statistics and probability to life-related situations. Using histograms to estimate probabilities Discrete data (the type where the scores can take only set values) can be represented as a frequency histogram. Continuous data (the type where the scores may take any value, usually within a certain range) can also be represented in the form of a frequency or probability histogram. Let us construct a frequency histogram of continuous data from which we can then estimate probabilities. WORKED Example 10 A battery company tested a random sample of a batch of their batteries to determine their lifetime. The results are shown below. Lifetime (hours) Frequency 20–<25 25–<30 30–<35 35–<40 40–<45 45–<50 6 25 70 61 30 8 a Represent the data as a frequency histogram. b If you chose a battery from this batch, estimate the probability that the battery would last: ii at least 25 hours ii less than 40 hours. c In an advertising campaign, the battery manufacturer claims that they will replace the battery if it does not last at least 30 hours. Based on these results, what is the probability they will have to replace a battery?
  • Maths A Yr 12 - Ch. 04 Page 207 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability THINK WRITE a Construct a frequency histogram with lifetime on the x-axis and frequency on the y-axis. a b 207 b Total number of scores = 6 + 25 + 70 + 61 + 30 + 8 = 200 Frequency Frequency histogram Find the total number of scores. The total area under the curve is 1, so each 2 class interval represents a fraction of 1 in terms of area (and probability). ii 1 Find the total of frequencies with a score of at least 25 hours. 1 2 3 ii Estimated probability total of frequencies at least 25 h = --------------------------------------------------------------------------- × 1 total number of scores Write the answer. Find the total frequencies with a score of less than 40 hours. 2 Apply the same rule as in part i. 3 c 1 Write the answer. 1 Find the total frequency for those batteries lasting less than 30 hours. 2 Apply the probability rule. Write the answer. 3 70 60 50 40 30 20 10 0 20 25 30 35 40 45 50 Lifetime (hours) ii Total frequency at least 25 hours = 25 + 70 + 61 + 30 + 8 = 194 -------P(≥25 h) = 194 × 1 200 P(≥25 h) = 0.97 The probability that the battery would last for at least 25 hours is 0.97. ii Total frequency less than 40 hours = 6 + 25 + 70 + 61 = 162 -------P(<40 h) = 162 × 1 200 P(<40 h) = 0.81 The probability that the battery would last less than 40 hours is 0.81. c Total frequency less than 30 hours = 6 + 25 = 31 31 P(<30 h) = -------- × 1 200 = 0.155 P(replacing battery) = 0.155 The probability that the manufacturer will have to replace the battery is 0.155. It should be noted that, if we are not given a table of results (as we were in the previous worked example), but simply a frequency histogram, we would have to estimate frequencies from the histogram. In this case, the probability answers obtained would be estimates rather than exact values.
  • Maths A Yr 12 - Ch. 04 Page 208 Wednesday, September 11, 2002 4:07 PM 208 Interpreting histograms The aim of this investigation is to highlight the pitfalls in interpreting the shape of histograms. The activity is more readily conducted using a graphics calculator. 1 Consider the percentages received by a class of 36 students in their end-ofsemester test. 67, 90, 83, 85, 73, 80, 78, 79, 68, 71, 53, 65, 74, 64, 77, 56, 66, 63, 70, 49, 56, 71, 67, 58, 60, 72, 67, 57, 60, 90, 63, 88, 78, 46, 64, 81. 2 Enter the data as a list into a graphics calculator. TI: Press STAT and select 1:Edit then enter the data into L1 as shown. Casio: Enter STAT from the MENU then enter the data into List 1. 3 Set the window for the percentage range 40 to 100 using a class interval of 10. TI: Press WINDOW and enter values as shown. Casio: Press SHIFT F3 (V-WIN), then enter the values shown. 4 Set the data to graph as a histogram. TI: Press 2nd [STAT PLOT] and select 1:Plot1. Set Plot 1 as shown. Casio: Press F1 (GRPH) and F6 (SET). Then set StatGraph1 as shown. (Press F6 to scroll right to find Hist.) M es in ion v t i gat n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 5 Set all other graph plots off. TI: Press 2nd [STAT PLOT] and set other plots off. Casio: Press F4 (SEL) and set other StatGraphs off. t i gat
  • Maths A Yr 12 - Ch. 04 Page 209 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 209 6 Draw the histogram. TI: Press GRAPH . Casio: Press F6 (DRAW) and enter the values shown. Press F6 (DRAW). 7 On your calculator, change the range of the score to accommodate percentages 46 to 94, with a class interval of 4. TI: Press WINDOW and enter the new values shown. Casio: Press EXIT then press F1 (GPH1) and enter the new values. 8 Draw the resulting histogram. TI: Press GRAPH . Casio: Press F6 (DRAW). While the first histogram appeared to have one modal class, this one appears multimodal. 9 Use your calculator to investigate changing the class interval and the range of the percentages. What do you observe? M D- C For more on probability, scatterplots, histograms and skewness, click on the icon when using the Maths Quest Maths A year 12 CD-ROM. AC ER T IVE INT 10 All these histograms are graphical representations of the same data. While they all indicate distributions with higher frequencies towards the middle, some suggest bimodal or multimodal distributions. What do you conclude from this investigation? RO Probability, scatterplots, histograms and skewness
  • Maths A Yr 12 - Ch. 04 Page 210 Wednesday, September 11, 2002 4:07 PM 210 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Using scatterplots to consider relationships between data sets WORKED Example 11 Are tall mothers likely to produce tall sons? The table below details the heights of 12 mothers and their adult sons. Height of mother (cm) 185 152 168 166 173 172 159 154 168 148 162 171 Height of son (cm) 188 162 168 172 179 182 160 148 178 152 184 180 a b c d Construct a scatterplot of the data. Draw the line of best fit. Estimate the height of a son born to a 180-cm tall mother. Discuss the relationship between the heights of mothers and their sons as shown by these data. The solution to this problem will be shown using three methods. 1. Pen and paper 2. Graphics calculator 3. Spreadsheet It should be noted that, when a line of best fit is drawn by eye, variations in answers will occur for those dependent on the position of the line. a 190 185 180 175 170 165 160 155 150 145 14 5 15 0 15 5 16 0 16 5 17 0 17 5 18 0 18 5 19 0 Method 1. Using pen and paper a Plot points on a graph with height of mother on x-axis (the independent variable) and height of son on y-axis (the dependent variable). This results in a scatterplot. WRITE/DRAW Height of son (cm) THINK Height of mother (cm) ne Li 14 5 15 0 15 5 16 0 16 5 17 0 17 5 18 0 18 5 19 0 190 185 180 175 170 165 160 155 150 145 of be st fit b Height of son (cm) b Draw in the line of best fit. Balance an equal number of points either side of the line and as close to the line as possible. Height of mother (cm)
  • Maths A Yr 12 - Ch. 04 Page 211 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability WRITE/DISPLAY c Draw a vertical line from the 180 cm point on the x-axis to the line of best fit. From this point on the line, draw a horizontal line to the y-axis. Read this y-value. c be of y = 190 Li ne 190 185 180 175 170 165 160 155 150 145 x = 180 14 5 15 0 15 5 16 0 16 5 17 0 17 5 18 0 18 5 19 0 Height of son (cm) st fit THINK 211 Height of mother (cm) From the graph when x = 180 y = 190 So, a 180-cm tall mother could produce a son approximately 190 cm tall. d Look at the slope of the line and the proximity of the points to the line. d The slope of the line of best fit is positive, indicating that, as one variable increases, the other also increases. The points lie fairly close to the line, so this indicates a fairly strong positive relationship between the two variables. This seems to support the view that tall mothers are likely to produce tall sons. Method 2. Using a graphics calculator These instructions apply to the TI-83 and Casio CFX-9850 PLUS graphics calculators. TI Casio a 1 Enter mother’s height and son’s a height into two lists. TI: Press STAT , select 1:Edit and enter the data. Casio: From the MENU enter the STAT sector and enter the data. 2 Set up the window to graph xvalues in the range 145–190, yvalues in the same range and a scale of 5 for each. TI: Press WINDOW then enter the values shown. Casio: Press SHIFT F3 (V-WIN) and enter values. Continued over page
  • Maths A Yr 12 - Ch. 04 Page 212 Wednesday, September 11, 2002 4:07 PM 212 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d THINK 3 4 WRITE/DISPLAY TI Set one graph plot on and select scatterplot type, x as list 1 and y as list 2. TI: Press 2nd [STAT PLOT] and set up Plot 1 as shown. Casio: Press F1 (GRPH), F6 (SET) and set StatGraph1 as shown. Turn off all other plots. TI: Use 2nd [STAT PLOT]. Casio: Press F4 (SEL). 5 b Graph the relationship. TI: Press GRAPH . Casio: Press F6 (DRAW). 1 Enter the function that calculates the equation of the line of best fit. TI: Press STAT , arrow across to the CALC menu, and select 4:LinReg. Casio: Press F1 (x). Copy this equation into Y=. TI: Press VARS, select 5:Statistics, arrow across to the EQ menu, and select 1: RegEQ. Casio: Press F5 (COPY) and EXE to store. Graph the line of best fit on the scatterplot. TI: Press GRAPH . Casio: Press F6 (DRAW). 2 3 b M c Use the calculator’s function to determine c a value for y when the x-value is 180. TI: Press 2nd [CALC] and select 1:Value. Casio: Press MENU , select GRAPH, and press F6 (DRAW). Then press SHIFT F5 (SLV), F6 and F1 (Y-CAL) and enter 180 value for x and press EXE . Casio
  • Maths A Yr 12 - Ch. 04 Page 213 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 213 THINK WRITE/DISPLAY d Look at the angle of the straight line and the proximity of the points to the line. d The line of best fit predicts that a 180-cm tall mother could produce an adult son approximately 187 cm tall. The slope of the line of best fit is upwards, indicating that as one variable increases the other also increases. Most of the points lie close to the line, so it is reasonable to assume that the relationship between mother and son heights is quite strong. This supports the proposal that tall mothers are likely to produce tall sons. Method 3. Using a spreadsheet a 1 Open up a spreadsheet and enter the data for the mother’s and son’s heights in columns under headings. 2 Use the chart wizard to graph the data as a scatterplot. 3 Label the axes and provide a title for the graph. 4 Adjust the range and scale on the x- and y-axes to more appropriate values if necessary (suggest 145 to 190 range with a scale of 5). 5 Print out a copy of the scatterplot. a b Draw in the line of best fit. Balance an equal number of points either side of the line and as close to the line as possible. b From the scatterplot of the data above, the line of best fit is shown on the scatterplot. c From the graph, read the corresponding y-value for x = 180 cm. c When x = 180, y = 187. So a 180-cm tall mother would produce an adult son approximately 187 cm tall. d Look at the slope of the line and the proximity of the points to the line. d The slope of the line of best fit is positive, indicating that, as one variable increases, the other also increases. The points lie fairly close to the line, so this indicates a fairly strong positive relationship between the two variables. This seems to support the view that tall mothers are likely to produce tall sons.
  • Maths A Yr 12 - Ch. 04 Page 214 Wednesday, September 11, 2002 4:07 PM M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d remember remember 1. Frequency histograms can be used to estimate probabilities in data sets. 2. Scatterplots display the relationship between two variables. 3. Scatterplots enable past and future trends to be considered. 3 The table on the right shows the number of goals scored by a hockey team throughout a season. a Show this information in a frequency histogram. b Are the data symmetrical? c What is the mode(s)? d Can the mean and median be seen for this distribution and, if so, what are their values? e The probability that the team will score 5 goals is the same as their probability of scoring what other number of goals? 12 10 8 6 4 2 0 1 2 3 4 5 Score Frequency 2 For the distribution shown on the right: a are the data symmetrical? b what is the modal class(es)? c can the mean and median be seen from the graph and, if so, what are their values? d which classes have the same probability of occurring? e which class has the least probability of occurring? Frequency 1 In the distribution on the right: a is the graph symmetrical? b what is the modal class(es)? c can the mean and median be seen from the graph and, if so, what are their values? d which score has the greatest probability of occurring? 7 6 5 4 3 2 1 0 Score No. of goals Frequency 0 6 1 4 2 4 3 4 4 4 5 6 4 For the distribution shown on the right: a what is the modal score(s)? b which score has the greatest probability of occurring? c which score has the least probability of occurring? Frequency 4E Applications of statistics and probability 0– 4 5– 10 9 – 15 14 –1 20 9 – 25 24 –2 9 214 12 10 8 6 4 2 0 1 2 3 4 5 Score
  • Maths A Yr 12 - Ch. 04 Page 215 Wednesday, September 11, 2002 4:07 PM 215 Chapter 4 Populations, samples, statistics and probability No. of goals 3 21–30 6 31–40 7 41–50 23 51–60 6 multiple choice Frequency 11–20 21 8 6 4 2 0 1 23 4 5 Score 5 4 3 2 1 0 12345 Score 5 4 3 2 1 0 1 23 4 5 Score Frequency Which of the distributions below has the smallest standard deviation? A 10 B 6 C 6 D Frequency 10 5 The table at right shows the number of goals scored by a basketball team throughout a season. a Draw a frequency histogram of the data. b What is the probability that the team will score more than 40 goals? Frequency Example Frequency WORKED 8 7 6 5 4 3 2 1 0 1 2 3 4 5 Score 7 A movie is shown at a cinema 30 times during the week. The number of people attending each session of the movie is shown in the table at right. a Present the data in a frequency histogram. b Are the data symmetrical? c What is the modal class(es)? d What is the probability of more than 150 people attending a session? e What is the probability of having up to 100 people at a session? No. of people Frequency 1–50 2 51–100 3 101–150 5 151–200 10 201–250 10 8 Year 12 at Wallarwella High School sit exams in chemistry and mathematics. The results are shown in the table below. Mark Chemistry Maths 31–40 2 3 41–50 9 4 51–60 7 6 61–70 4 7 71–80 7 9 81–90 9 7 91–100 2 4 a Is either distribution symmetrical? b State the mode of each distribution. c In which subject is the standard deviation greater? Explain your answer.
  • Maths A Yr 12 - Ch. 04 Page 216 Wednesday, September 11, 2002 4:07 PM 216 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d d The students were told that the probability of achieving more than 80% in chemistry was the same as it was in mathematics. Is this true? Explain. e Is the probability of obtaining more than a pass mark (50%) greater in chemistry or mathematics? f What is the propability of achieving over 90% in each subject? Note that some answers in the following questions may vary depending on the position of the line of best fit. The use of a graphics calculator is recommended, if available. WORKED Example 11 Interpolation/ Extrapolation 9 A drug company wishes to test the effectiveness of a drug to increase red blood cell counts in people who have a low count. The following data were collected. Day of experiment Red blood cell count 4 5 6 7 8 9 210 240 230 260 260 290 Construct a scatterplot, then draw in the line of best fit to find the red blood cell count at the beginning of the experiment (that is, on day 0). 10 A wildlife exhibition is held over 6 weekends and features still and live displays. The number of live animals that are being exhibited varies each weekend. The number of animals participating, together with the number of visitors to the exhibition each weekend, is shown in the table which follows. Number of animals 6 4 8 5 7 6 Number of visitors 311 220 413 280 379 334 Construct a scatterplot, then draw in the line of best fit to find the predicted number of visitors if there are no live animals. SkillS HEET 4.4 11 A study of the dining-out habits of various income groups in a particular suburb produces the results shown in the table below. Weekly income ($) 100 200 300 400 500 600 700 800 900 1000 Number of restaurant visits per year 5.8 2.6 1.4 1.2 6 4.8 11.6 4.4 12.2 9 Use the data to predict: a the number of visits per year by a person on a weekly income of $680 b the number of visits per year by a person on a weekly income of $2000. 12 The following table represents the costs for transporting a consignment of shoes from Brisbane factories. The cost is given in terms of distance from Brisbane. There are two factories which can be used. The data are summarised below. Distance from Brisbane (km) 10 20 30 40 50 60 70 80 Factory 1 cost ($) 70 70 90 100 110 120 150 180 Factory 2 cost ($) 70 75 80 100 100 115 125 135 a Draw the line of best fit for each factory. b Which factory is likely to have the lowest cost to transport to a shop in Brisbane? c Which factory is likely to have the lowest cost to transport to Mytown, 115 kilometres from Brisbane? d Which factory has the most ‘linear’ transport rates?
  • Maths A Yr 12 - Ch. 04 Page 217 Wednesday, September 11, 2002 4:07 PM 217 Chapter 4 Populations, samples, statistics and probability Year 2000 Summer Olympic Games It has been claimed that the Summer Olympic Games, held in Sydney in the year 2000, were planned for September, because past records indicated that the chance of rain in Sydney during that month was low. (We recall the rain that fell at the time of one of Cathy Freeman’s 400 m races.) Rainfall and temperature records for Sydney are available from the table below and graph at right. Monthly temperatures Sydney 50 40 Temperature °c es in ion v t i gat n inv io es 30 20 10 0 – 10 – 20 Jan Feb MarAprMayJun Jul AugSep Oct NovDec Month of year Highest Maximum Lowest Minimum Average Maximum Average Minimum The table shows the number of rainy days for Sydney for each month of the year for the years 1980 to 1990. Sydney rainfall (days of rain per month) Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1980 14 12 8 7 15 12 7 4 3 10 11 12 1981 15 15 5 8 14 7 6 7 6 12 17 13 1982 11 10 19 5 2 11 12 3 9 11 3 12 1983 9 12 9 12 13 8 8 12 9 16 8 18 1984 15 20 11 10 8 6 15 6 10 8 15 10 1985 9 10 8 19 13 6 7 10 12 18 19 16 1986 15 10 11 12 11 3 9 9 10 11 18 9 1987 14 11 10 7 9 11 11 16 4 17 8 16 1988 11 15 14 18 12 10 6 10 10 3 14 19 1989 21 11 22 23 22 20 12 9 7 6 13 15 1990 14 21 22 19 17 10 12 9 13 13 8 13 1 Identify the type of statistical calculation(s) that would be appropriate to perform on the data. Decide also on appropriate graphical representations. 2 Produce the graphs and perform your calculations. 3 Analyse your calculations and graphs. Comment on the claim made in the opening paragraph. Would an alternative month have been more appropriate, as far as the rainfall factor is concerned? t i gat
  • Maths A Yr 12 - Ch. 04 Page 218 Wednesday, September 11, 2002 4:07 PM 218 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 4 Does the month of September appear to be a suitable one for summer sporting events, when temperature is considered? Monthly temperature figures for Sydney are shown in the graph. 5 Describe the temperature conditions in Sydney during the month of September. 6 Compare rainfall and temperature figures for September with those for other months of the year. 7 Assume you are responsible for a submission to the Australian Olympic Committee recommending a suitable month in which to hold the summer Olympics. Prepare your report, remembering to support your recommendations by referring to the available data, calculations and graphs. es in ion v t i gat n inv io es Sampling text to predict population characteristics Research shows that the letter ‘e’ is the most frequently used letter in the English language. This activity aims to determine the frequency of its occurrence on a page of English text using a sample selected from the page. 1 Choose a book with an extended section of continuous prose. Select 20 full lines of text from a page (ignore incomplete lines). 2 Draw up the table below. Line number Number of e’s Cumulated total 1 2 3 … … 20 3 Count the number of e’s per line and complete the second column. 4 Using the figures in the second column, complete the ‘cumulated total’ column. 5 Use a graphics calculator, spreadsheet or graph paper to draw a scatterplot of the number of lines against the accumulated total. 6 Draw the line of best fit on your scatterplot. 7 Count the number of lines of text on your page. Extrapolate your scatterplot to estimate the number of e’s on the page you have chosen. 8 Compare your answer with those obtained from different texts by other members of your class. Comment on the similarities and variations in your answers. Factors that must be taken into account when making comparisons between the number of e’s per page of printed material include page size, font size, line length, line spacing and so on. Aside: The novel, A void by George Perec (Harper Collins 1994) is written entirely without the letter ‘e’. t i gat
  • Maths A Yr 12 - Ch. 04 Page 219 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability es in ion v t i gat n inv io es 219 Comparing population characteristics The following tables display data collated from the 1996 ABS Census relating to the populations in two Brisbane suburbs. Note: Income figures are weekly income, expressed in AUD. Chapel Hill Total persons (a) Aged 15 years and over (a) Aboriginal Torres Strait Islander Both Aboriginal and Torres Strait Islander (b) Australian born Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA Born overseas: Other country (d) Born overseas: Total Speaks English only and aged 5 years and over Speaks language other than English (e) and aged 5 years and over Australian citizen Australian citizen aged 18 years and over Unemployed Employed In the labour force Not in the labour force Enumerated in private dwelling (a) Enumerated in non-private dwelling (a) Persons enumerated same address 5 years ago Persons enumerated different address 5 years ago Overseas visitor Chapel Hill Median age Median individual income Median household income Average household size Inala Total persons (a) Aged 15 years and over (a) Aboriginal Torres Strait Islander Both Aboriginal and Torres Strait Islander (b) Australian born Born overseas: Canada, Ireland, NZ, South Africa, UK (c) and USA Born overseas: Other country (d) Born overseas: Total Speaks English only and aged 5 years and over Speaks language other than English (e) and aged 5 years and over Australian citizen Australian citizen aged 18 years and over Unemployed Employed In the labour force Not in the labour force Enumerated in private dwelling (a) Enumerated in non-private dwelling (a) Persons enumerated same address 5 years ago Persons enumerated different address 5 years ago Overseas visitor Inala Median age Median individual income Median household income Average household size 4 824 3 761 3 0 0 3 405 696 587 1 283 3 936 459 4 239 3 027 141 2 677 2 818 854 4 815 9 2 217 2 163 54 5 112 4 070 3 0 0 3 704 637 605 1 242 4 230 491 4 515 3 315 132 2 427 2 559 1 393 5 085 27 2 381 2 324 75 9 936 7 831 6 0 0 7 109 1 333 1 192 2 525 8 166 950 8 754 6 342 273 5 104 5 377 2 247 9 900 36 4 598 4 487 129 6 886 5 083 482 51 12 4 468 694 1 462 2 156 4 483 1 494 5 807 4 001 349 1 403 1 752 3 144 6 886 0 3 343 2 493 15 13 287 9 599 880 93 20 8 534 1 359 2 858 4 217 8 483 2 948 11 231 7 522 954 3 439 4 393 4 872 13 283 4 6 329 4 812 28 34 415 1 209 3.1 6 401 4 516 398 42 8 4 066 665 1 396 2 061 4 000 1 454 5 424 3 521 605 2 036 2 641 1 728 6 397 4 2 986 2 319 13 30 187 412 2.8 1 Examine the data carefully. 2 Prepare a report comparing the residents of the two suburbs. Support your statements by reference to figures. t i gat
  • Maths A Yr 12 - Ch. 04 Page 220 Wednesday, September 11, 2002 4:07 PM 220 es in ion v t i gat n inv io es M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d Modelling Olympic Games times The running time for the men’s 100-m event in the Olympic Games broke the 10-second barrier for the first time in 1968 when Jim Hines of the United States clocked a time of 9.95 seconds. Knowing that records are broken over time, is it possible to predict a year when a runner could break the 9.5-second barrier? We could model this situation by looking at past times for the event. These are shown in the table at right. Note: The times have not decreased consistently over the years. In fact, after 1968, the 10-second barrier was not broken again until 1984 when Carl Lewis of the United States won with a time of 9.99 s. For this reason, along with other factors which accompany feats of human endurance, in modelling situations such as this, any resulting predictions can only be considered as estimates. 1 Using a graphics calculator, spreadsheet or graph paper, draw a scatterplot of ‘year’ (on the x-axis) against ‘time’ (on the y-axis). This could be viewed as a time series as the years are at equal intervals. 2 Draw the line of best fit for your scatterplot. 3 Use your line of best fit to find the year in which the 9.5-second barrier could be broken in the 100-m sprint. This represents a situation when we are extrapolating data (that is, determining a value outside the range of data plotted). Note that the Olympic Games are only held every four years and that your answer may not be one of those years. It remains to be seen how accurate your prediction is! 4 Use some resources available to you (World Wide Web, reference books, almanacs and so on) to collect data on other sporting events (such as high jump heights, shot put distances and swimming times). Model the situation, enabling a prediction to be proposed. Year Time (seconds) 1948 10.30 1952 10.40 1956 10.50 1960 10.20 1964 10.00 1968 9.95 1972 10.14 1976 10.06 1980 10.23 1984 9.99 1988 9.92 1992 9.96 1996 9.84 2000 9.87 t i gat
  • Maths A Yr 12 - Ch. 04 Page 221 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability t i gat es in ion v n inv io es 221 Predicting test results Over the year Sally had sat for nine mathematics tests, but had been sick at the time of the tenth test. She had achieved above average marks for each of her nine tests, so her teacher did not want to give her the class average for her last test. As sometimes happens, the class as a whole found this last test more difficult than the previous ones, so generally all marks were depressed. It would not then be fair to give Sally the average of her previous tests for this last one. How could her teacher give Sally an estimated mark based on her past performance compared with that of her fellow students? Below is a table displaying the class average percentage for each test and Sally’s percentage on the tests. t i gat Test 1 2 3 4 5 6 7 8 9 10 Class % 64 59 60 55 62 66 58 63 65 49 Sally’s % 72 70 77 65 75 80 71 75 75 Because Sally’s marks were consistently above the class average by 10% or more, we could explore a relationship between the class average and Sally’s marks. 1 Using a graphics calculator, or otherwise, construct a scatterplot of the first nine tests, plotting the class average percentage on the x-axis and Sally’s percentage on the y-axis. 2 Draw in the line of best fit. 3 Use your line of best fit and a value of 49 for the class average percentage to determine an estimate for Sally’s performance had she sat for the tenth test. 4 Do you think this is a fair mark to award to Sally? Justify your answer. This method can be used in a variety of situations to estimate values for missing results. The predictions become less accurate if there is a great deal of inconsistency in past performance. es in ion v t i gat n inv io es The door game Imagine you are a contestant in a game on television. The conditions and rules are as follows: 1. Three doors (door 1, door 2, and door 3) stand before you. 2. Behind one of these doors which face you and the audience lies the prize of your dreams (a trip to Wimbledon or a Ferrari car). (The organisers know where the prize lies.) 3. Behind each of the other two doors there is nothing. 4. You see the three closed doors before you and you have no hints. 5. With the benefit (or distraction) of audience participation, you are asked to select the door behind which you think the prize lies. 6. You tell the compere and audience which door you choose. 7. Before opening the door you have chosen, the compere tells you that first, one of the doors, different from the one you have chosen, will be opened; it will be a door that does not have the prize behind it. t i gat
  • Maths A Yr 12 - Ch. 04 Page 222 Wednesday, September 11, 2002 4:07 PM 222 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 8. There are two doors now unopened and you are given an offer to change your choice. 9. Of the two unopened doors, one is your original choice and the other not your choice. 10. Using the theory of probability, would you have a better chance of winning if you stayed with your original choice, or made the switch? This was a game which was actually played on television for a lengthy period many years ago. Regular viewers of the program formed their own opinions regarding the better option by analysing the results of contestants’ choices. The background to the game can be researched by a web search of the name Monty Hall. You may be interested to conduct a search at this stage, or leave your search until you have had time to formulate a logical reason for your choice. Part I This activity provides practical experience for the door game. The simulation could be undertaken as a class activity, or in pairs. 1 Simulate three doors using books lying on a flat surface instead of doors. 2 Underneath one of them place a small piece of paper. (The contestant is to be unaware of the location of the paper.) 3 Ask the contestant to select one of the three books underneath which he or she thinks this piece of paper lies. 4 Turn over one book other than one chosen by the contestant and underneath which the object does not lie. 5 Ask the contestant whether he/she would like to choose a different book. 6 Turn over the book under which the paper lies. 7 Repeat the simulation ten times. Copy and complete the table below, recording your results after each turn shown in the table. Game 1 Stay with choice Change mind ✔ Win/Lose ✔ 2 ✔ ✖ 3 ✔ ✔ 4 5 6 7 8 9 10
  • Maths A Yr 12 - Ch. 04 Page 223 Wednesday, September 11, 2002 4:07 PM 223 Chapter 4 Populations, samples, statistics and probability 8 Combine your results with those of other members of your class so that your combined set consists of a large number of simulations. 9 What are your conclusions from your experiment? Is it wise to stay with your original choice or should you change your mind? If you conduct a web search, you will find sites which simulate the door game using an interactive mode. One such site can be found at www.jaconline.com.au/maths/weblinks. Part II Let us consider the probabilities of the choices in the ‘door game’. 1. Since there are three doors, the probability of the prize -being behind any of these doors is 1 . 3 This enables us to start a tree diagram as shown at right. Prize door 1 – 3 1 2 1 – 3 3 1 – 3 2. As you have three doors from which to select, the probability that your choice -will be the correct selection is 1 . 3 This realisation enables us to extend the branches of the tree as at right. Your choice 1 2 3 1 1 – 3 1 – 3 1 – 3 – 3 Prize door 1 2 1 – 3 1 – 3 1 – 3 1 2 3 1 – 3 3 1 – 3 3. Multiplying the probabilities along the branches enables us to fill in the probability for each selection (shown at right). Prize door 1 2 1 – 3 1 2 3 Your choice 1 2 3 1 1 – 3 1 – 3 1 – 9 1 – 9 1 – 9 – 3 1 – 3 1 – 9 1 2 3 1 – 9 1 – 9 3 1 – 3 1 2 3 1 – 9 1 – 9 1 – 9
  • Maths A Yr 12 - Ch. 04 Page 224 Wednesday, September 11, 2002 4:07 PM 224 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 4. The next stage is the crucial factor. Depending on which door is opened, will you win if you stay with your choice, or are you more likely to win if you change your mind? Follow the results completed for door 1. The combinations of opened doors have been completed below for the remainder of the tree. Complete the blank spaces. Prize door Your choice 1 2 3 Door opened 2 or 3 1 – 9 1 – 9 1 – 9 3 No Yes (P = 1 -9 ) 2 No Yes (P = 1 -9 ) 1 – 9 3 ____ ____ 1 – 9 1 or 3 ____ ____ 1 – 9 1 – 3 1 ____ ____ 1 – 9 2 ____ ____ 1 ____ ____ 1 or 2 ____ ____ 1 2 Win if Win if I I stay? change my mind? -Yes (P = 1 ) No 9 1 – 3 1 2 3 3 1 – 3 1 2 3 1 – 9 1 – 9 5. Add the probabilities in the ‘Win if I stay?’ column. This gives the overall probability of your winning if you stay with your original choice. 6. Similarly, add the probabilities in the ‘Win if I change mind?’ column. This figure represents the overall probability of your winning if you change your mind and choose the other unopened door. 1 Based on these probabilities, which choice should you make? How does this agree with your experimental results from your practical simulation? Work ET SHE 4.3 If you have not yet visited websites by searching for Monty Hall, it would now be appropriate to do so. This game has created much discussion among great mathematicians and non-mathematicians over many years. 2 For the set of scores 23, 45, 24, 19, 22, 16, 16, 27, 20, 21, use your calculator to find: 1 the mean 2 the median 3 the mode 4 the range 5 the interquartile range 6 the standard deviation. 7 In the data set, is the distribution symmetrical? 8 Does the data set have an outlier (a score which is not typical of the data set)? 9 Which measure of central tendency is the best measure of location in this data set? 10 Explain why the interquartile range is a better measure of spread than the range.
  • Maths A Yr 12 - Ch. 04 Page 225 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 225 summary Data collection • A statistical investigation can be done by either census or sample. • A census is when an entire population takes part in the investigation. • A sample is when a small group takes part in the investigation and the results are taken to be representative of the whole group. • There are many types of sample. 1. Random sample — chance is the only factor in deciding who participates. 2. Stratified sample — the sample taken is chosen so that it has the same characteristics as the whole population. 3. Systematic sample — there is a method for deciding who participates in the sample. 4. Accessibility sample — those easily accessed form the sample. 5. Quota sample — the number in the sample is limited by the quota. 6. Judgemental sample — a judgement is made when deciding the sample. 7. Cluster sample — clusters within the population form the sample. Bias If the sample is poorly chosen the results of the investigation will be biased. This means that the results will be skewed towards one section of the population. Estimating populations Populations that can’t be accurately counted with ease are estimated by using the capture–recapture technique. Contingency tables • Display horizontal and vertical categories of two variables of a set of data • Calculations with regard to a variety of reference bases can be made. Applications of statistics and probability • Frequency histograms can be used to estimate probabilities in data sets. • Scatterplots display the relationship between two variables and allow predictions to be made. Measures of central tendency and spread • • • • • • • • The mean, median and mode are measures of central tendency in a data set. The mean is calculated by adding the scores then dividing by the number of scores. The median is the middle score or average of two middle scores. The mode is the most frequently occurring score. The range, interquartile range and standard deviation are measures of spread. The range is the difference between the highest and lowest scores. The interquartile range is the difference between the upper and lower quartiles. The standard deviation is found using the population or sample functions of a calculator. • An outlier is a score that is much less or much greater than the other scores.
  • Maths A Yr 12 - Ch. 04 Page 226 Wednesday, September 11, 2002 4:07 PM 226 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d CHAPTER review 4A 1 For each of the following statistical investigations, state whether a census or a survey has been used. a The average price of petrol in Townsville was estimated by averaging the price at 40 petrol stations. b The Australian Bureau of Statistics has every household in Australia complete an information form once every five years. c The performance of a cricketer is measured by looking at his performance in every match he has played. d Public opinion on an issue is sought by a telephone poll of 2000 homes. 4A 2 multiple choice 4B 4B 3 Name and describe 5 different methods for selecting a sample. 4B 4B 5 Use your random number generator to select 10 numbers between 1 and 1000. Which of the following is an example of a census? A A newspaper conducts an opinion poll of 2000 people. B A product survey of 1000 homes to determine what brand of washing powder is used C Every 200th jar of Vegemite is tested to see if it is the correct mass. D A federal election 4 Which method of sampling has been used for each of the following? a The quality-control department of a tyre manufacturing company road tests every 50th tyre that comes off the production line. b To select the students to participate in a survey, a spreadsheet random number generator selects the roll numbers of 50 students. c An equal number of men and women are chosen to participate in a survey on fashion. 6 The table at right shows the number of students in each year of school. In a survey of the school population, how many students from each year should be chosen, if a sample of 60 is selected using a stratified sample? No. of students 7 212 8 200 9 189 10 175 11 133 12 4B Year 124 7 To estimate the fish population of a lake, 100 fish are caught, tagged and released. The next day another 100 are caught and it is noted that 5 have tags. Estimate the population of the lake.
  • Maths A Yr 12 - Ch. 04 Page 227 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 227 8 Kimberley has a worm farm. To estimate the population of her farm, she catches 150 worms and tags them before releasing them. The next day, she catches 120 worms and finds that 24 of them have tags. Estimate the population of the worm farm. 4B 9 A sample of 200 fish are caught, tagged and released back into the population. Later Barry, Viet and Mustafa each catch a sample of fish. Barry caught 40 fish and 3 had tags. Viet caught 75 fish and 9 had tags. Mustafa caught 55 fish and 7 had tags. a Find the estimate of the population that each would have calculated. b Give an estimate for the population, based on all three samples. 4B 10 multiple choice Which of the following is an example of a random sample? A The first 50 students to arrive at school take a survey. B Fifty students’ names are drawn from a hat and those drawn take the survey. C Ten students from each year of the school are asked to complete a survey. D One class in the school is asked to complete the survey. 4B 11 Carolyn is a marine biologist. She spends the day on a boat and 500 fish are netted. Carolyn notes the types of fish netted. There are 173 blackfish, 219 drummer and 108 mullet. The fish are tagged and released back into the school from which they were caught. Another 250 are then caught and it is noted that 63 have tags. Estimate the population of the school. 4B 12 Bias can be introduced into statistics through: a questionnaire design b sample selection c interpretation of statistical results. Discuss how bias could be a result of techniques in the above three areas. 4C 13 A medical test screens 200 people for a virus. A positive test result indicates that the patient has the virus. 1. Of 50 people known to have the virus, the test produced 48 positive results. 2. Of the remainder who were known not to have the virus, the test produced one positive result. Use the above information to complete the table below. 4D Test results Accurate Not accurate Total With virus Without virus Total 14 The results of a lie detector test are given below. 1. Of 80 people who are known to be telling the truth, the lie detector indicates that three are lying. 2. Of 20 people known to be lying, the lie detector indicates that 17 are lying. Display this information in a contingency table. 4D
  • Maths A Yr 12 - Ch. 04 Page 228 Wednesday, September 11, 2002 4:07 PM 228 4D M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 15 Below are the results of a test screening for a disease. A positive test indicates that the patient has the disease. Test results Accurate Not accurate Total 18 2 20 Without disease 108 12 120 Total 126 14 With disease a How many people were tested for the disease? b How many positive test results were recorded? c What percentage of those people with the disease were correctly diagnosed by the test? d If a person without the disease is chosen at random, what percentage returned a positive test? 4D 16 A reading test for people with dyslexia is given and the results are shown in the contingency table below. Test results Accurate Not accurate Total With dyslexia 39 1 40 Without dyslexia 85 5 90 124 6 Total a How many people were tested? b What percentage of people tested positive to dyslexia? c Based on the above results, if a person with dyslexia takes the test, what is the percentage chance that they will be accurately diagnosed? 4D 17 multiple choice The contingency table below shows the results of a trial on new metal detectors for aircraft. The metal detector scans a piece of hand luggage and lights up if metal is found. Test results Accurate Not accurate Total 9 1 10 Without metal 87 3 90 Total 96 4 With metal Based on the above results, the chance of metal going undetected in a piece of hand luggage is: A 10% B 25% C 75% D 90%
  • Maths A Yr 12 - Ch. 04 Page 229 Wednesday, September 11, 2002 4:07 PM Chapter 4 Populations, samples, statistics and probability 229 18 A medical test for a disease does not always give the correct result. A positive test indicates that the patient has the disease. The contingency table below shows the results of a new screening test for the disease. It was tested on a group of people, some of whom were known to be suffering from the disease, some of whom were not. 4D Test results Accurate Not accurate Total 28 2 30 Without disease 164 6 170 Total 192 8 With disease a b c d e How many people were tested for the disease? What percentage of the results were accurate? How many patients tested positive to the disease? What percentage of patients with the disease were correctly diagnosed by the new test? Based on the above results, what is the probability that a patient with the disease will have the disease detected by this test? 19 The contingency table below compares the number of men and women who are right- and left-handed. Men Women Totals 158 172 330 17 15 32 175 187 4D 362 Right-handed Left-handed Totals a What percentage of males are left-handed? b What percentage of females are left-handed? c Based on the above data, is there any significant difference between the percentage of male and female left-handers? 20 multiple choice The contingency table below shows the number of men and women who work in excess of 45 hours per week. Men Women Totals ≤45 hours 132 128 260 >45 hours 69 34 103 Totals 201 162 363 The percentage of men who work greater than 45 hours per week is closest to: A 28% B 34% C 51% D 67% 4D
  • Maths A Yr 12 - Ch. 04 Page 230 Wednesday, September 11, 2002 4:07 PM 230 M a t h s Q u e s t M a t h s A Ye a r 1 2 f o r Q u e e n s l a n d 21 Sixty-seven primary and 47 secondary school students were asked their attitude to the number of school holidays which should be given. They were asked whether there should be more, fewer or the same number. Five primary students and 2 secondary students wanted fewer holidays, 29 primary and 9 secondary students thought that they had enough holidays (that is, they chose the same number) and the rest thought that they needed to be given more holidays. a Form a contingency table with reference to primary and secondary school percentages. b Use your table to compare the opinions of primary and secondary school students. 4E 22 Consider the data set represented by the frequency histogram on the right. a Are the data symmetrical? b Can the mean and median of the data be seen? c What is the mode of the data? d Which score has the highest probability of occurring? 4E Frequency 4D 8 7 6 5 4 3 2 1 0 15 16 17 18 19 20 Score 23 The table below shows the number of attempts that 20 members of a Year 12 class took to obtain a driver’s licence. Number of attempts Frequency 1 11 2 4 3 2 4 2 5 0 6 1 a Show these data in a frequency histogram. b Are the data symmetrical? c What is the probability of a student of the class taking more than three attempts to obtain a driver’s licence? 24 Consider this data set which measures the sales figures for a new salesperson. Day (d) CHAPTER test yourself 4 1 2 3 4 5 6 7 8 Units sold (s) 1 2 4 9 20 44 84 124 a Draw the line of best fit. b Use your line to predict the sales figures for the tenth day.