Successfully reported this slideshow.
Upcoming SlideShare
×

# Statistics lecture 4 (ch3)

1,659 views

Published on

Measures of Location & Dispersion

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Statistics lecture 4 (ch3)

1. 1. 1
2. 2. NEXT LECTURE• Please bring scientific calculator and calculator manual 2
3. 3. NUBE Test• You will not be asked to draw graphs – simply interpret them• The following topics will be included in test 1:- – Discounts, percentages and commissions – Multiple choice questions – Graphs – Mean, median, mode and standard deviations – Dispersion – Box and whisker plot – Probabilities – Probability distributions – Sampling distributions 3
4. 4. NUBE Test• REMEMBER - YOU WILL NOT BE GIVEN ALL OF THE FORMULAE IN THE TEST, YOU MUST REMEMBER THE ONES THAT ARE NOT IN THE PRINT OUT GIVEN TO YOU:-• Attached please find the NUBE6112 FORMULAE AND TABLES which students will need for all tests and exams. Please can you print these back to back and have them laminated because these are to be used year on year. It has also been confirmed by the IIE that any formulas that aren’t in the sheets, students are expected to know from their lecturers, so could you please pass this info on to your lecturers. The IIE were only given permission to print what is on the sheet, which is also at the back of the textbook. 4
5. 5. • Properties to describe numerical data: – Central tendency – Dispersion – Shape• Measures calculated for: – Sample data • Statistics – Entire population • Parameters 5
6. 6. Measures of location• Arithmetic mean• Median• Mode 6
7. 7. UNGROUPED or raw data refers to data asthey were collected, that is, before they aresummarised or organised in any way or formGROUPED data refers to data summarised ina frequency table 7
8. 8. ARITHMETIC MEAN- This is the most commonly used measure and is also called the mean. sum of sample observationsSample mean = number of sample observations n ∑x i x= i =1 n Sample size 8
9. 9. ARITHMETIC MEAN - This is the most commonly used measure and is also called the mean. sum of observations Population mean = number of observations NMean ∑xi Xi = observations of the population µ= i =1 ∑ = “the sum of” N Population size 9
10. 10. • MEDIAN – Half the values in data set is smaller than median. – Half the values in data set is larger than median. – Order the data from small to large.• Position of median – If n is odd: • The median is the (n+1)/2 th observation. – If n is even: • Calculate (n+1)/2 • The median is the average of the values before and after (n+1)/2. 10
11. 11. • MODE – Is the observation in the data set that occurs the most frequently. – Order the data from small to large. – If no observation repeats there is no mode. – If one observation occurs more frequently: • Unimodal – If two or more observation occur the same number of times: • Multimodal – Used for nominal scaled variables. 11
12. 12. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4The mean of the sample of nine measurements is given by: 9 ∑x ix= i =1 n x1 + x2 + x3 + x4 + x5 + x6 + x6 + x58 + x−4 2 2 5 5 8 8 −3 5 −3 5 2 2 67 5 −4 = 9 n 9 9 26 = = 2,89 9 12
13. 13. Example – Given the following data set: 2 5 8 −3 5 2 6 5 −4The median of the sample of nine measurements is given by: Odd number−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9 (n+1)/2 = (9+1)/2 = 5th measurement Median = 5 13
14. 14. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4 3Determine the median of the sample of ten measurements. •:Order the measurements Even number−4 −3 2 2 3 5 5 5 6 8 1 2 3 4 5 6 7 8 9 10(n+1)/2 = (10+1)/2 = 5,5th measurementMedian = (3+5)/2 = 4 14
15. 15. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine the mode of the sample of nine measurements. •Order the measurements−4 −3 2 2 5 5 5 6 8 Mode = 5 •Unimodal 15
16. 16. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4 2Determine the mode of the sample of ten measurements. •Order the measurements−4 −3 2 2 2 5 5 5 6 8 Mode = 2 and 5 •Multimodal 16
17. 17. Concept questions 1 - 12 p 64 –Elementary Statistics for Business &Economics 17
18. 18. • ARITHMETIC MEAN – Data is given in a frequency table – Only an approximate value of the meanx= ∑f x i i ∑f iwhere f i = frequency of the i th class interval xi = class midpoint of the i th class interval 18
19. 19. • MEDIAN – Data is given in a frequency table. – First cumulative frequency ≥ n/2 will indicate the median class interval. – Median can also be determined from the ogive. ( ui − li ) ( n − Fi −1 ) 2 M e = li + fi where li = lower boundary of the median interval ui = upper boundary of the median interval Fi -1 = cumulative frequency of interval foregoing median interval fi = frequency of the median interval 19
20. 20. • MODE – Class interval that has the largest frequency value will contain the mode. – Mode is the class midpoint of this class. – Mode must be determined from the histogram. 20
21. 21. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. To calculate the Number of Number of mean for the sample calls hours fi xi of the 48 hours: [2–under 5) 3 3,5 determine the class [5–under 8) 4 6,5 midpoints [8–under 11) 11 9,5 [11–under 14) 13 12,5 [14–under 17) 9 15,5 [17–under 20) 6 18,5 [20–under 23) 2 21,5 21 n = 48
22. 22. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. x= ∑ fi xi Number of calls Number of hours fi xi ∑ fi [2–under 5) 3 3,5 597 = [5–under 8) 4 6,5 48 [8–under 11) 11 9,5 = 12, 44 [11–under 14) 13 12,5 Average number [14–under 17) 9 15,5 of calls per hour [17–under 20) 6 18,5 is 12,44. [20–under 23) 2 21,5 22 n = 48
23. 23. Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour. To calculate the Number of Number of median for the calls hours fi F sample of the 48: [2–under 5) 3 3 hours: [5–under 8) 4 7 determine the [8–under 11) 11 18 cumulative [11–under 14) 13 31 frequencies [14–under 17) 9 40n/2 = 48/2 = 24The first cumulative [17–under 20) 6 46frequency ≥ 24 [20–under 23) 2 48 23 n = 48
24. 24. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Median Number of Number of ( u −l ) ( n − F ) calls hours fi F i i 2 i −1 = li + fi [2–under 5) 3 3 = 11 + ( 14 − 11) ( 24 − 18 ) [5–under 8) 4 7 13 [8–under 11) 11 18 = 12,38 [11–under 14) 13 3150% of the time less [14–under 17) 9 40than 12,38 or 50% of [17–under 20) 6 46the time more than12,38 calls per hour. [20–under 23) 2 48 24 n = 48
25. 25. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of calls at a call centre The median can be determined 48 40 form the ogive. Number of hours 32 24 n/2 = 48/2 = 24 16 8 0 Median = 12,4 2 5 8 11 A 14 17 20 23 Read at A. Number of calls 25
26. 26. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour.To calculate the Number of Number ofmode for the sample calls hours fi xiof the 48 hours: [2–under 5) 3 3,5 draw the histogram [5–under 8) 4 6,5 [8–under 11) 11 9,5 [11–under 14) 13 12,5 [14–under 17) 9 15,5 [17–under 20) 6 18,5 [20–under 23) 2 21,5 26 n = 48
27. 27. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of calls at a call centre Mode = 12,3 14 Read at A. Number of hours 12 10 8 6 4 2 0 2 5 8 11 A of14 Number calls 17 20 23 27
28. 28. Relationship between mean, median, and mode• If a distribution is symmetrical: – the mean, median and mode are the same and lie at centre of distribution• If a distribution is non-symmetrical: Mean – skewed to the left or to the right Mode Median – three measures differ A positively skewed distribution A negatively skewed distribution (skewed to the right) (skewed to the left) Mode Mean Mean Mode 28 Median Median
29. 29. MEAN – very affected by outliers (values that are verysmall or very large relative to the majority of the values in adataset). Therefore MEAN not best measure where outliersexistMEDIAN – not affected by outliers so better to use this thanmean when they exist. Disadvantage is that its calculationdoes not include all the values in a datasetMODE – not affected by outliers. Disadvantage is that itonly includes values with highest frequency in itscalculation. When distribution is skewed median mayprovide a better description of data 29
30. 30. Group Classwork• Get into groups of 4• Read p75 – 76 Module Manual – Choosing between the mean, median & mode• Read p 67 – 69 - Elementary Statistics for Business Economics – Relationship between mean, median and mode & When to use the mean, median & mode• Complete Izimvo Exchange 1 p 83 Module Manual 30
31. 31. Concept questions 13 – 25 p69 –Elementary Statistics for Business andEconomics 31
32. 32. Measures of dispersion• Range• Variance• Standard deviation• Coefficient of variation 32
33. 33. • Range – The range of a set of measurements is the difference between the largest and smallest values in the data set. – Its major advantage is the ease with which it can be computed. – Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. 33
34. 34. • Variance and standard deviation Determine how far the observations are from their mean. Where: – x = sample mean – x = values of the sample – n = sample size 34
35. 35. • Variance and standard deviation Determine how far the observations are from their mean. ∑( x − µ) 2 Population variance = σ 2 = N ∑( x − µ ) 2 Population standard deviation = σ = N Where: – μ = population mean – x = values of the population – N = population size 35
36. 36. • Coefficient of variation – Measures the standard deviation relative to the mean. – It is expressed as a percentage. – Used to compare samples that are measured in different units. s CV = ×100 x 36
37. 37. Example - Given the following data sets:1The means-3 the same but the dispersion of Dataset 8 st : -4 are 2 2 5 5 5 6 1 much larger than the dispersion of Data set 2.2nd : 0 1 2 3 3 4 5 5−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 89 23x= ≈ 2,9 8 37
38. 38. Example – Given the following data sets:1st: −4 −3 2 2 5 5 5 6 82nd : 0 1 2 3 3 4 5 5The range of the measurements is given by:Largest value – smallest value= 8 – (−4) =5−0= 12 =5 38
39. 39. Example – Given the following data sets:1st: −4 −3 2 2 5 5 5 6 82nd: 0 1 2 3 3 4 5 5The variance of the measurements is given by: 39
40. 40. Example – Given the following data sets:1st: −4 −3 2 2 5 5 5 6 82nd : 0 1 2 3 3 4 5 5The standard deviation of the measurements is given by: 40
41. 41. Example – Given the following data sets:1st: −4 −3 2 2 5 5 5 6 82nd : 0 1 2 3 3 4 5 5The coefficient of variation of the measurements isgiven by: s 4, 08CV = 100% = 100 = 140, 69% x 2,9 s 1,81CV = 100% = 100 = 62, 41% x 2,9 41
42. 42. P75 Elementary Statistics for Business and EconomicsBy applying the value of the std dev in combination with the value of the mean, we areable to define where the majority of the data values are clustered usingCHEBYCHEFF’s THEORUM•At least 75% of the values in any sample will be within k= 2 std dev of thesample mean•At least 89% of the values in any sample will be within k=3 std dev of the mean•At least 94% of the values in any sample will be within k=4 std dev of thesample meanNOTE: k= the number of std dev distances to either side of the mean 42
43. 43. EXAMPLEAssume a data set has a mean of 50 and a std dev of5.Then 75% of the values in the data set occur in theinterval:-Mean + 2 std dev = 50 +/- 2(5)=50 +/- 10= from 40 to 60 43
44. 44. Classwork/Homework• Concept questions 26 -35 , p 76 – Elementary Statistics for Business and Economics 44
45. 45. • Variance and standard deviation Where: – f = frequencies of class intervals – x = class midpoints of class intervals – n = sample size 45
46. 46. • Variance and standard deviation ∑ fx ( ∑ fx ) 2 2 − 1 Population variance = σ 2 = N N ∑ fx ( ∑ fx ) 2 2 − 1 Population standard deviation = σ = N N Where: – f = frequencies of class intervals – x = class midpoints of class intervals – N = population size 46
47. 47. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of Number of calls hours fi xi [2–under 5) 3 3,5 [5–under 8) 4 6,5 [8–under 11) 11 9,5 [11–under 14) 13 12,5 [14–under 17) 9 15,5 [17–under 20) 6 18,5 [20–under 23) 2 21,5 47 n = 48
48. 48. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of Number of calls hours fi xi [2–under 5) 3 3,5 [5–under 8) 4 6,5 [8–under 11) 11 9,5 [11–under 14) 13 12,5 [14–under 17) 9 15,5 [17–under 20) 6 18,5 [20–under 23) 2 21,5 48 n = 48
49. 49. Concept Questions 36 – 41, p 80 –Elementary Statistics for Business &Economics 49
50. 50. • Quartiles• Percentiles• Interquartile range 50
51. 51. • QUARTILES – Order data in ascending order. – Divide data set into four quarters. 25% 25% 25% 25%Min Q1 Q2 Q3 Max 51
52. 52. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine Q1 for the sample of nine measurements: •Order the measurements−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9Q1 is the ( n + 1) () 1 4 = ( 9 + 1) () 1 4 = 2,5th valueQ1 = −3 + 0,5(2 − (−3)) = −0,5 52
53. 53. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine Q3 for the sample of nine measurements:−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9Q3 is the ( n + 1) () 3 4 = ( 9 + 1) () 3 4 = 7,5th valueQ3 = 5 + 0,5(6 − 5) = 5,5 53
54. 54. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Interquartile range = Q3 – Q1Q3 = 5,5Q1 = −0,5Interquartile range= 5,5 – (−0,5)=6 54
55. 55. • PERCENTILES – Order data in ascending order. – Divide data set into hundred parts. 10% 90%Min P10 Max 80% 20%Min P80 Max 50% 50%Min P50 = Q2 Max 55
56. 56. Example – Given the following data set:2 5 8 −3 5 2 6 5 −4Determine P20 for the sample of nine measurements:−4 −3 2 2 5 5 5 6 8 1 2 3 4 5 6 7 8 9P20 is the ( n + 1) ( ) = ( 9 + 1) ( ) = 2 p 100 20 100 nd valueP20 = −3 56
57. 57. Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour. Number of Number of To calculate Q1 calls hours fi F for the sample of the 48 hours: [2–under 5) 3 3 [5–under 8) 4 7 [8–under 11) 11 18 [11–under 14) 13 31n/4 = 48/4 = 12 [14–under 17) 9 40The first cumulative [17–under 20) 6 46frequency ≥ 12 [20–under 23) 2 48 57 n = 48
58. 58. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Q1 Number of Number of ( uQ − lQ ) ( n − FQ −1 ) 4 calls hours fi F= lQ1 + 1 1 1 fQ1 [2–under 5) 3 3= 8+ ( 11 − 8) ( 12 − 7 ) [5–under 8) 4 7 11 [8–under 11) 11 18= 9,36 [11–under 14) 13 3125% of the time less [14–under 17) 9 40than 9,36 or 75% of [17–under 20) 6 46the time more than9,36 calls per hour. [20–under 23) 2 48 58 n = 48
59. 59. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of Number ofQ3 calls hours fi F = 3n/4 = 3(48)/4 [2–under 5) 3 3 = 36 [5–under 8) 4 7 The first cumulative [8–under 11) 11 18 frequency ≥ 36 [11–under 14) 13 31 [14–under 17) 9 40 [17–under 20) 6 46 [20–under 23) 2 48 59 n = 48
60. 60. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour.Q3 Number of Number of ( uQ − lQ ) ( 34n − FQ −1 )= lQ3 + 3 3 3 calls hours fi F f Q3 [2–under 5) 3 3 ( 17 − 14 ) ( 36 − 31)= 14 + [5–under 8) 4 7 9= 15, 67 [8–under 11) 11 18 [11–under 14) 13 3175% of the time less [14–under 17) 9 40than 15,67 or 25% of [17–under 20) 6 46the time more than15,67 calls per hour. [20–under 23) 2 48 60 n = 48
61. 61. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour. Number of Number of Q3 = 15,67 calls hours fi FQ1 = 9,36 [2–under 5) 3 3 [5–under 8) 4 7IRR [8–under 11) 11 18 [11–under 14) 13 31= 15,67 – 9,36 [14–under 17) 9 40= 6,31 [17–under 20) 6 46 [20–under 23) 2 48 61 n = 48
62. 62. Example – The following data represents the number of telephone calls received for two days at a municipal call centre. The data was measured per hour. Number of Number of P60 calls hours fi F= np/100 [2–under 5) 3 3= 48(60)/100 [5–under 8) 4 7= 28,8 [8–under 11) 11 18The first cumulative [11–under 14) 13 31frequency ≥ 28,8 [14–under 17) 9 40 [17–under 20) 6 46 [20–under 23) 2 48 62 n = 48
63. 63. Example – The following data represents the number oftelephone calls received for two days at a municipal call centre.The data was measured per hour.P60 Number of Number of ( u p − l p ) ( np − Fp −1 ) calls hours fi F 100= lp + fp [2–under 5) 3 3= 11 + ( 14 − 11) ( 28,8 − 18) [5–under 8) 4 7 13 [8–under 11) 11 18= 13, 49 [11–under 14) 13 3160% of the time less [14–under 17) 9 40than 13,49 or 40% of [17–under 20) 6 46the time more than13,49 calls per hour. [20–under 23) 2 48 63 n = 48
64. 64. Classwork/Homework• Concept Questions 42 – 53, p88 – Elementary Statistics for Business & Economics 64
65. 65. BOX-AND-WISKER PLOTMe = 12,38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11Q3 = 15,67Q1 = 9,36 UL = Q3 + 1,5(IQR) = 15,67 – 1,5(6,31) = 25,14IRR = 6,31 1,5(IQR) IQR 1,5(IQR) 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28• Any value smaller than −0,11 will be an outlier. 65• Any value larger than 25,14 will be an outlier.
66. 66. NORMAL CURVE• Bell shaped, single peaked and symmetric• Mean is located at centre of a normal curve• Total area under a normal curve =1, half of this area on the left side and half on the right side• Mean, median and mode are =• Two tails extend indefinitely to the left and to the right of the mean as they approach the horizontal axis• Two tails never touch horizontal axis• Completely described by its mean and its standard deviation. Mean specifies position of curve on horizontal axis, standard deviation specifies the shape of the curve• Smaller the std dev the less spread out and more sharply peaked the curve 66
67. 67. NORMAL CURVE & Empirical Rule• Chebycheff’s Theorum applies to any dataset irrespective of the underlying distribution• Empirical Rule applies specifically to data that follows a normal curve• Empirical Rule states that for a normal curve, approx:- – 68% of observations lie within one std dev of mean – 95% of observations lie within 2 std dev of mean – 99.7% of observations lie within 3 std dev of meanNOTE: FOR A NORMAL CURVE ANY VALUE THAT IS NOT WITHIN 3 STD DEV OF MEAN IS A SUSPECT OUTLIER 67
68. 68. Classwork/Homework• Concept Questions 61-70, p95 – Elementary Statistics for Business & Economics 68
69. 69. Classwork/Homework1. Activity 1 & 2 – Module Manual p85 – 862. Revision Exercises 1,2,3 p 87 -93 Module Manual3. Supplementary Exercises questions 1 - 12 p 100 – Elementary Statistics for Business & Economics4. Self Review Test p96 - Elementary Statistics for Business & Economics5. Read Chapter 4 – Basic Probability, p105 – 150 - Elementary Statistics for Business & Economics 69