2.
Organising and graphing quantitative data in a frequencydistribution table.• Frequency table consists of a number of classes and each observation is counted and recorded as the frequency of the class.• If n observations need to be classified into a frequency table, determine: – Number of classes: c 1 3,3log n xmax xmin – Class width c 2
3.
Organising and graphing quantitative data in a frequencydistribution table.Example:The following data represents the number of telephone calls receivedfor two days at a municipal call centre. The data was measured perhour. 8 11 12 20 18 10 14 18 16 9 5 7 11 12 15 14 16 9 17 11 6 18 9 15 13 12 11 6 10 8 11 13 22 11 11 14 11 10 9 19 14 17 9 3 3 16 8 2 3
14.
Histograms Number of telephone calls per hour at a municipal call centre 14 Number of hours 12 10 8 6 4 2 0 2 5 8 11 14 17 20 23 Number of calls 14
15.
DefinitionsFrequency PolygonA line graph of a frequency distribution and offers auseful alternative to a histogram. Frequency polygon isuseful in conveying the shape of the distributionOgiveA graphic representation of the cumulative frequencydistribution. Used for approximating the number ofvalues less than or equal to a specified value 15
17.
Frequency polygons Number of telephone calls per hour at a municipal call centre (x) 14 3,5 Number of hours 12 6,5 10 8 9,5 6 12,5 4 2 15,5 0 18,5 0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5 24.5 21,5 Arbitrary mid-points to Number of calls close the polygon. 17
19.
Ogives Ogive of number of call received at a call centre per hour 100 number of hours 90 % Cumulative 80 70 60 50 40 30 20 10 0 2 5 8 11 14 17 20 23 Number of calls None of the hours had less than 2 calls. 19
20.
Ogives Ogive of number of call received20% of thehours had at a call centre per hourmore than 17 calls 100 number of hours per hour. 90 % Cumulative 80 7080% of the 60hours had 50 less than 40 30 17 calls 20 per hour. 10 0 2 5 8 11 14 17 20 23 50% of Number ofhad less the hours calls than 12 calls per hour. 20
21.
Exam question 2A garbage removal company would like to start charging by theweight of a customers bin rather than by the number of bins putout. They select a sample of 25 customers and weigh theirgarbage bins. The weights in kg are given below:-14.5 5.2 16.0 14.7 15.6 18.9 13.5 24.6 24.5 7.413.2 23.4 13.9 12.0 22.5 31.4 16.1 10.9 25.1 22.114.8 15.1 4.9 17.0 10.31. Construct a frequency table to describe the data. Include afrequency and relative (%) frequency column. (Hint: start theclass intervals with the whole number just smaller than thelowest value in the dataset)
22.
Procedure1. Calculate the range of the dataset2. Calculate the no of classes3. Calculate the class width4. Construct table showing the intervals calculated in 1 to 35. Put in the tally for each interval and then show as frequency6. Calculate the relative (%) frequency 13 marks
23.
Range31.4 - 4.9 = 26.5No of classesK or c= 1+3.3lognn = 25 K or c= 3.3 log (25) = 5.61 ≈ 6Class Width xmax xmin = 26.5/6 = 4.41 ≈ 5Class width c
25.
Exam question 22. Comment on the interval 4% of bins weighed betweencontaining the lowest 29 & 34 kgpercentage3. In which interval do the data Largest no. of bins weighedtend to cluster? Which between 14 & 19kg. Wedescriptive statistics measure, assume mode will fall in thiscan we assume, would be interval (highest frequency)found in this interval?4. Comment on the shape of +ve skewed as morethe distribution without values located in lowerdrawing a graph . Give reasons intervals 7 MARKS
29.
• QUARTILES – Order data in ascending order. – Divide data set into four quarters. 25% 25% 25% 25%Min Q1 Q2 Q3 Max 29
30.
Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine Q1 for the sample of nine measurements: •Order the measurements−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9Q1 is the n 1 1 4 9 1 1 4 2,5th valueFind difference between data for 2 & 32-(-3)=5 and multiply by the decimal portion of value : 5 x 0.5 = 2.5 30Add to smallest figure: -3 + 2.5: Q1 = 0.5
31.
Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine Q3 for the sample of nine measurements:−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9Q3 is the n 1 3 4 9 1 3 4 7,5th valueQ3 = 5 + 0,5(6 − 5) = 5,5 31
32.
Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Interquartile range = Q3 – Q1Q3 = 5,5Q1 = −0,5Interquartile range= 5,5 – (−0,5)=6 32
33.
INTERQUARTILE RANGE (IQR)• Difference between the third and first quartiles• Indicates how far apart the first and third quartiles are IQR = Q3 – Q1 33
34.
BOX & WHISKER PLOT• Provides a graphical summary of data based on 5 summary measures or values – First quartile, median, third quartile ,lower limit, upper limit• Box and whisker plot detects outliers in a data set LL = Q1 – 1,5 (IQR) UL = Q3 + 1,5 (IQR) 34
35.
BOX-AND-WISKER PLOTMe = 12,38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11Q3 = 15,67Q1 = 9,36 UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14IRR = 6,31 1,5(IQR) IQR 1,5(IQR) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28• Any value smaller than −0,11 will be an outlier.• Any value larger than 25,14 will be an outlier. 35
36.
Exam question 3The Tubeka brothers spent the following amounts in Rand on groceries overthe last 8 weeks:- 54 56 89 67 74 57 43 511. Calculate a five number summary table2. Construct a box and whisker plot for the data3. Determine whether there are any outliers. Show calculations 20 MARKSPROCEDURE1. Reorder the data set2. Identify maximum and minimum values in dataset3. Calculate median4. Calculate Q1 & Q35. Construct plot6. Calculate upper & lower limits for dataset to determine if outliers present
37.
43 51 54 56 57 67 74 89xmin = 43 xmax = 89 median = (56+57)/2 = 56.5 Q1 = 51.75 Q3 = 72.25Q1 = (n+1) (1/4) = (8+1) x ¼ = 2.25 valueBetween 51 & 5454-51 = 3 multiply by decimal portion of value 3x 0.25 = 0.75 and add the lower valueQ1 = 51 + 0.75 = 51.75Q3 = (n+1) (¾) = (8+1) x ¾ = 6.75 valueBetween 67 & 7474 – 67 = 7 multiply by decimal portion of value 7 x 0.75 = 5.25 and add lower valueQ3 = 67 + 5.25 = 72.25
40.
• ARITHMETIC MEAN – Data is given in a frequency table – Only an approximate value of the meanx fx i i f iwhere f i frequency of the i th class interval xi = class midpoint of the i th class interval 40
41.
• MEDIAN – Data is given in a frequency table. – First cumulative frequency ≥ n/2 will indicate the median class interval. – Median can also be determined from the ogive. ui li n Fi 1 M e li 2 fi where li = lower boundary of the median interval ui = upper boundary of the median interval Fi -1 = cumulative frequency of interval foregoing median interval fi = frequency of the median interval 41
42.
• MODE – Class interval that has the largest frequency value will contain the mode. – Mode is the class midpoint of this class. – Mode must be determined from the histogram. 42
43.
Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour.To calculate the Number of Number ofmean for the sample calls hours fi xiof the 48 hours: [2–under 5) 3 3,5 determine the class [5–under 8) 4 6,5 midpoints [8–under 11) 11 9,5 [11–under 14) 13 12,5 [14–under 17) 9 15,5 [17–under 20) 6 18,5 [20–under 23) 2 21,5 n = 48 43
44.
Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. x fi xi Number of Number of calls hours fi xi fi [2–under 5) 3 3,5 597 [5–under 8) 4 6,5 48 [8–under 11) 11 9,5 12, 44 [11–under 14) 13 12,5Average number [14–under 17) 9 15,5of calls per hour [17–under 20) 6 18,5is 12,44. [20–under 23) 2 21,5 n = 48 44
45.
Exam question 3The number of overtime hours worked by 40 part-time employees of asecurity company in 1 week is shown in the following frequencydistribution:- Hours per Frequency (f) week 2.1 - < 2.8 12 2.8 - < 3.5 13 3.5 - < 4.2 7 4.2 - < 4.9 5 4.9 - < 5.6 2 5.6 - < 6.3 11. Estimate the mean number of overtime hours worked2. What % of employees worked at least 4.2 hours overtime? 8 marks
46.
Exam question 3 Procedure 1. Calculate the midpoint x for each interval (lower limit + upper limit/2) 2. Multiply f by the midpoint x 3. Total the fx and f columns 4. Divide ∑fx by ∑f
49.
• PERCENTILES – Order data in ascending order. – Divide data set into hundred parts. 10% 90%Min P10 Max 80% 20%Min P80 Max 50% 50%Min P50 = Q2 Max 49
50.
Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine P20 for the sample of nine measurements:−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9P20 is the n 1 9 1 2 p 100 20 100 nd valueP20 = −3 50
51.
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour. Number of Number of P60 calls hours fi F= np/100 [2–under 5) 3 3= 48(60)/100 [5–under 8) 4 7= 28,8 [8–under 11) 11 18The first cumulative [11–under 14) 13 31frequency ≥ 28,8 [14–under 17) 9 40 [17–under 20) 6 46 [20–under 23) 2 48 n = 48 51
52.
Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour. P60 Number of Number of u p l p 100 Fp1 np calls hours fi F lp fp [2–under 5) 3 3 11 14 11 28,8 18 [5–under 8) 4 7 13, 49 13 [8–under 11) 11 18 [11–under 14) 13 3160% of the time less [14–under 17) 9 40than 13,49 or 40% of [17–under 20) 6 46the time more than13,49 calls per hour. [20–under 23) 2 48 n = 48 52
53.
Exam question 31. John, one of the part-time workers was told he falls on the 70th percentile. Calculate the value and explain what it means.PROCEDURE1. Calculate the cumulative frequencies2. Calculate which class the required percentile falls into by using P =np/1003. Once you have identified the class use the percentile formula given in the tables book to calculate the value. Take CARE to order the calculation correctly. 4 MARKS
54.
Exam question 3 P = np/100 = 40*70/100 Hours per Frequency Cumulative =28 week (f) F2.1 - < 2.8 12 12 P70 = 3.5 + [ (4.2-3.5)(28-25)]/72.8 - < 3.5 13 25 = 3.5 + 0.83.5 - < 4.2 7 32 =3.84.2 - < 4.9 5 374.9 - < 5.6 2 39 70% of the workers worked fewer hours overtime than John. 70% of5.6 - < 6.3 1 40 the workers worked fewer than 3.8 hrs. 30% of the workers worked 40 more overtime hours than John. 30% of the employees worked more than 3.8hrs.
56.
Confidence interval – An interval is calculated around the sample statisticPopulation parameterincluded in interval Confidence interval 56
57.
Confidence interval – An upper and lower limit within in which the Example: population parameter is expected to lie Meaning of a 90% confidence interval: – Limits will vary from sample to sample – Specify the probability thatsamples taken from 90% of all possible the interval will include the parameter produce an interval that will population will include the population parameter – Typical used 90%, 95%, 99% – Probability denoted by • (1 – α) known as the level of confidence • α is the significance level 57
58.
• An interval estimate consists of a range of values with an upper & lower limit• The population parameter is expected to lie within this interval with a certain level of confidence• Limits of an interval vary from sample to sample therefore we must also specify the probability that an interval will contain the parameter• Ideally probability should be as high as possible 58
59.
SO REMEMBER•We can choose the probability•Probability is denoted by (1-α)•Typical values are 0.9 (90%); 0.95 (95%) and 0.99 (99%)•The probability is known as the LEVEL OF CONFIDENCE•α is known as the SIGNIFICANCE LEVEL•α corresponds to an area under a curve•Since we take the confidence level into account when weestimate an interval, the interval is called CONFIDENCEINTERVAL 59
60.
Confidence interval for Population Mean, n ≥ 30- population need not be normally distributed- sample will be approximately normal CI ( )1 x Z1 , if is known 2 n s CI ( )1 x Z1 , if is not known 2 n 60
61.
Example :CI ( )1 x Z1 , if is known 2 n 90% confidence interval s CI ( )1 x Z1 , if is not known 2 n 1 – 0,90 0,10 1 90% of all sample 0,10 means fall in this area 0, 05 2 2These 2 areas added Confidence leveltogether = α i.e. 10% 1–α =1-α 1-α 0, 05 0, 05 2 = 0,90 2 2 x Lower conf limit Upper conf limit 61
63.
• Confidence interval for Population Mean, n < 30 – For a small sample from a normal population and σ is known, the normal distribution can be used. – If σ is unknown we use s to estimate σ – We need to replace the normal distribution with the t- distribution ▬ standard normal s CI ( )1 x tn 1;1 ▬ t-distribution 2 n 63
65.
• Example – The manager of a small departmental store is concerned about the decline of his weekly sales. 99% confident the mean weekly – He calculated the average and standard deviation of his sales for the past 12 weeks, x =sales will be between R12400 and s = R1346 R11 193,14 and R13 606,86 – Estimate with 99% confidence the population mean sales of the departmental store. t11;0.995 s 1346 x tn 1;1 12400 3,106 2 n 12 12400 1206,86 11193,14 ; 13606,86 65
66.
• Confidence interval for Population proportion – Each element in the population can be classified as a success or failure number of successes x ˆ Sample proportion p = – Proportion always between 0 and 1 size = sample n – For large samples the sample proportion is approximately normal ˆ p p (1 p ) ˆ ˆ CI ( p )1 p z1 ˆ 2 n 66
67.
Exam question 71. In a sample of 200 residents of Johannesburg, 120 reported they believed the property taxes were too high. Develop a 95% confidence interval for the proportion of the residents who believe the tax rate is too high. Interpret your answer2. The time it takes a mechanic to tune an engine in a sample of 20 tune ups is known to be normally distributed with a sample mean of 45 minutes and a sample standard deviation of 14 minutes. Develop a 95% confidence interval estimate for the mean time it will take the mechanic for all engine tune ups. Interpret your answer 15 MARKS
68.
Exam question 7PROCEDURE1. Determine what measure your are looking at: mean, proportion or standard deviation2. Select appropriate formula based on 1. and sample size (t for small sample sizes <30; z for larger sample sizes)3. Put the numbers into the formula and calculate the confidence intervals
69.
Exam question 71. ˆ Sample proportion p = number of successes = x In a sample of 200 residents of sample size n Johannesburg, 120 reported they believed the property p (1 p ) ˆ ˆ taxes were too high. Develop a CI ( p )1 p z1 ˆ 2 n 95% confidence interval for 𝑝 = 120/200 = 0.6 the proportion of theZ 1-α = 1.96 residents who believe the tax 2 rate is too high. Interpret yourCI = 0.6 +/_1.96 √( 0.6 0.4 )/200 answerCI = 0.6 +/- 0.070.53<CI<0.67At CL of 95% between 53% and 67% ofresidents believe tax rate is too high
70.
Exam question 7 The time it takes a mechanic s CI ( )1 x t n 1;1 to tune an engine in a 2 n sample of 20 tune ups is known to be normally 14= 45 +/- 2.093 √20 distributed with a sample mean of 45 minutes and a sample standard deviation= 45 +/- 6.55 of 14 minutes. Develop a 95% confidence interval38.45< µ < 51.55 estimate for the mean timeAt a confidence level of 95% the it will take the mechanic forpopulation average time to complete a all engine tune ups.tune up is between 38.45 and 51.55 Interpret your answerminutes
72.
STEPS OF A HYPOTHESIS TESTStep 1 • State the null and alternative hypothesesStep 2 • State the values of αStep 3 • Calculate the value of the test statisticStep 4 • Determine the critical valueStep 5 • Make a decision using decision rule or graphStep 6 • Draw a conclusion 72
73.
• Hypothesis test for Population Mean, n < 30 – If σ is unknown we use s to estimate σ – We need to replace the normal distribution with the t-distribution with (n - 1) degrees of freedom Testing H0: μ = μ0 for n < 30 Alternative Decision rule: Test statistic hypothesis Reject H0 if H1: μ ≠ μ0 |t| ≥ tn - 1;1- α/2 x 0 t H1: μ > μ0 t ≥ tn-1;1- α s n H1: μ < μ0 t ≤ -tn-1;1- α 73
74.
• Hypothesis testing for Population proportion number of successes x – Sample proportion p = ˆ = sample size n – Proportion always between 0 and 1 Testing H0: p = p0 for n ≥ 30 Alternative Decision rule: Test statistic hypothesis Reject H0 if H1: p ≠ p0 |z| ≥ Z1- α/2 p p0 ˆ z H1: p > p0 z ≥ Z1- α p0 (1 p0 ) H1: p < p0 z ≤ -Z1- α n 74
75.
Exam question 81. Oliver Tambo airport wants to test the claim that on average cars remain in the short term car park area longer than 42.5 minutes. The research team drew a random sample of 24 cars and found that the average time that these cars remained in the short term parking area was 40 minutes with a sample standard deviation of 2 minutes. Test the claim at 10% level of significance and interpret.2. The Gautrain Authority add a bus route if more than 55% of commuters indicate they would use the route. A sample of 70 commuters revealed that 42 would use a route from Sandton to Auckland Park. Does this route meet the Gautrain criteria. Use 0.05 significance level 16 MARKS
76.
Exam question 8Procedure1. State H0 and Ha2. Determine the critical value from the appropriate test table using α, and n3. Compute test statistic (t or z value??)4. Draw conclusion
77.
Exam question 8State hypothesis Oliver Tambo airport wantsH0: µ = 42.5 to test the claim that onHa: µ > 42.5 average cars remain in theDetermine critical value short term car park areatn-1; 1- α = t 23; 0.9 = 1.319 longer than 42.5 minutes.Reject H0 if the test statistic is > The research team drew a1.319 random sample of 24 carsCalculate test statistic and found that the average x 0 time that these cars t s remained in the short term n parking area was 40 minutesT= 40-42.5 = -6.12 with a sample standard 2 deviation of 2 minutes. Test √24 the claim at 10% level ofDo not reject H0 significance and interpret.
78.
Exam question 8State hypothesis The Gautrain AuthorityH0: p = 0.55 add a bus route if moreHa: p > 0.55 than 55% of commutersDetermine critical value indicate they would useα = 0.05 Z = 1.64 the route. A sample of 70Reject H0 if Z test > 1.64 commuters revealed thatCalculate test statistic 42 would use a route from number of successes x Sandton to Auckland Park. ˆ Sample proportion p = = sample size n Does this route meet the p p0 ˆ z Gautrain criteria. Use 0.05 p0 (1 p0 ) n significance level 0.6−0.55Z= = 0.84 √((0.55)(0.45)/70Do not reject H0
80.
Coefficient of correlation• The coefficient of correlation is used to measure the strength of association between two variables.• The coefficient values range between -1 and 1. – If r = -1 (negative association) or r = +1 (positive association) every point falls on the regression line. – If r = 0 there is no linear pattern.• The coefficient can be used to test for linear relationship between two variables. 80
81.
Perfect positive High positive Low positive r = +1 r = +0,9 r = +0,3Y Y Y X X XPerfect negative High negative No Correlation r = -1 r = -0,8 r=0Y Y Y X X X 81
82.
Exam question 10The cost of repairing cars that were involved in accidents is one reasonthat insurance premiums are so high. In an experiment 5 cars weredriven into a wall. The speeds were varied between 20km/hr and80km/hr (X). The costs of repair (Y) were estimated and listed below:- SPEED (Km/h) (X) COST OF REPAIR (R’000) (Y) 20 3 30 5 40 8 60 24 80 341. Use calculator to calculate coefficient of correlation. Interpret your answer2. Calculate and interpret the coefficient of determination for this data3. Use your calculator to construct regression line equation and predict repair cost at 50km/h 10 MARKS
83.
Exam question 101. Put data into calculator2. Select regression function and select r3. Calculate coefficient of determination = r2 x100%4. Interpret results5. Using Y = A + BX select regression function oncalculator and determine values for A & B6. Put x = 50 into formula and calculate result
84.
Exam question 101. r = 0.98There is a very strong relationship between therepair cost and speed.2. r2 x 100% = 0.982 x 100 = 96%96% of the variation in the cost of repair isexplained by the variation in the speed at whichthe car crashed3. Y = -10.7 +0.55xX = 50 Y = 16.8
Be the first to comment