• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
139
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
8
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Slide 1.1 Analysing Data
  • 2. Slide 1.2 Descriptive Statistics
  • 3. Slide 1.3 Descriptive statistics  Descriptive statistics provides an objective way of describing and summarising data
  • 4. Slide 1.4 Data description
  • 5. Slide 1.5 Two key measures of data description • Location – to show where the centre of the data is, giving some kind of typical or average value • Dispersion (spread) – to show how spread out the data is around this centre, giving an idea of the range of values.
  • 6. Slide 1.6 Measures of location • Three basic measures of location used: Arithmetic mean  the average value Median  the middle value Mode  the most frequent value • Three data structures: Untabulated (raw data) Tabulated (ungrouped) Tabulated (grouped) For use with Curwin & Slater, Quantitative3 Methods for Business Decisions, 6th Edition ISBN: 9781844805747
  • 7. Slide 1.7 Mean - Untabulated (raw data) The mean for untabulated data is obtained by dividing the sum of all values by the number of values in the data set. Thus, µ= ∑x Mean for population data: N Mean for sample data: x= ∑x n
  • 8. Slide 1.8 Example 1 The following are the ages of all eight employees of a small company: 53 32 61 27 39 44 49 57 Find the mean age of these employees.
  • 9. Slide 1.9 Solution 1 µ= ∑ x = 362 = 45.25 years N 8 Thus, the mean age of all eight employees of this company is 45.25 years, or 45 years and 3 months.
  • 10. Slide 1.10 Mean - tabulated (ungrouped data) Sample mean of data: x= ∑ fx n Where x is the value of the observation and f is the frequency of the observation.
  • 11. Slide 1.11 Example The number of working days lost by employees in the last quarter (Calculate the average number of working days lost) Number of days (x) Number of employees (f) 0 410 1 430 2 290 3 180 4 110 5 20 1440
  • 12. Slide 1.12 x f fx 0 410 0 1 430 430 2 290 580 3 180 540 4 110 440 5 20 100 1440 2090 x= ∑ fx = 2090 = 1.451 days lost n 1440
  • 13. Slide 1.13 Mean • Mean can be affected by outliers
  • 14. Slide 1.14 Outliers  Definition  Values that are very small or very large relative to the majority of the values in a data set are called outliers or extreme values.
  • 15. Slide 1.15 Example 3 Table 2 lists the 2000 populations (in thousands) of the five Pacific states. Table 2 Population State (thousands) Washington 5894 Oregon 3421 Alaska 627 Hawaii 1212 California 33,872 An outlier
  • 16. Slide 1.16 Solution 3  Now, to see the impact of the outlier on the value of the mean, we include the population of California and find the mean population of all five Pacific states. This mean is 5894 + 3421 + 627 + 1212 + 33,872 Mean = = 9005.2 thousand 5
  • 17. Slide 1.17 Example 3 Notice that the population of California is very large compared to the populations of the other four states. Hence, it is an outlier. Show how the inclusion of this outlier affects the value of the mean.
  • 18. Slide 1.18 Solution 3  If we do not include the population of California (the outlier) the mean population of the remaining four states (Washington, Oregon, Alaska, and Hawaii) is 5894 + 3421 + 627 + 1212 Mean = = 2788.5 thousand 4
  • 19. Slide 1.19 Mean - tabulated (grouped data) µ= ∑ fx Mean for population data: N Mean for sample data: x= ∑ fx n Where x is the midpoint and f is the frequency of a class.
  • 20. Slide 1.20 Calculate the mean of the grouped data below Weight (oz) Class Frequency f fx midpoint (x) 19.2-19.4 19.3 1 19.3 19.5-19.7 19.6 2 39.2 19.8-20.0 19.9 8 159.2 20.1-20.3 20.2 4 80.8 20.4-20.6 20.5 3 61.5 20.7-20.9 20.8 2 41.6 Total ∑ f = n = 20 ∑ fx = 401.6
  • 21. Slide 1.21 Mean • n = 20 • Ʃfx = 401.6 x= ∑ fx = 401.6 = 20.08 oz n 20
  • 22. Slide 1.22 Median  Definition  The median is the value of the middle term in a data set that has been ranked in increasing order.
  • 23. Slide 1.23 Median cont.  The calculation of the median consists of the following two steps: 1. Rank the data set in increasing order 2. Find the middle term in a data set with n values. The value of this term is the median.
  • 24. Slide 1.24 Median cont. Value of Median for Ungrouped Data  n +1 Median = Value of the   th term in a ranked data set  2 
  • 25. Slide 1.25 Example 6 The following data give the weight lost (in pounds) by a sample of five members of a health club at the end of two months of membership: 10 5 19 8 3 Find the median.
  • 26. Slide 1.26 Solution 6 First, we rank the given data in increasing order as follows: 3 5 8 10 19 There are five observations in the data set. Consequently, n = 5 and n +1 5 +1 Position of the middle term = = =3 2 2
  • 27. Slide 1.27 Solution 6 Therefore, the median is the value of the third term in the ranked data. 3 5 8 10 19 Median The median weight loss for this sample of five members of this health club is 8 pounds.
  • 28. Slide 1.28 Example 7  Table 8 lists the total revenue for the 12 top-grossing North American concert tours of all time.  Find the median revenue for these data.
  • 29. Slide 1.29 Table 8 Total Revenue Tour Artist (millions of dollars) Steel Wheels, 1989 The Rolling Stones 98.0 Magic Summer, 1990 New Kids on the Block 74.1 Voodoo Lounge, 1994 The Rolling Stones 121.2 The Division Bell, 1994 Pink Floyd 103.5 Hell Freezes Over, 1994 The Eagles 79.4 Bridges to Babylon, 1997 The Rolling Stones 89.3 Popmart, 1997 U2 79.9 Twenty-Four Seven, 2000 Tina Turner 80.2 No Strings Attached, 2000 ‘N-Sync 76.4 Elevation, 2001 U2 109.7 Popodyssey, 2001 ‘N-Sync 86.8 Black and Blue, 2001 The Backstreet Boys 82.1
  • 30. Slide 1.30 Solution 7  First we rank the given data in increasing order, as follows:  74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2  There are 12 values in this data set. Hence, n = 12 and n + 1 12 + 1 Position of the middle term = = = 6.5 2 2
  • 31. Slide 1.31 Solution 7  Therefore, the median is given by the mean of the sixth and the seventh values in the ranked data.  74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2 82.1 + 86.8 Median = = 84.45 = $84.45 million 2  Thus the median revenue for the 12 top-grossing North American concert tours of all time is $84.45 million.
  • 32. Slide 1.32 Median - tabulated (ungrouped data) Steps: • Order the observations • Calculate cummulative frequency Note: • Cummulative frequency is the number of items with a given value or less
  • 33. Slide 1.33 Example The number of working days lost by employees in the last quarter (Calculate the median number of working days lost) Number of days (x) Number of employees (f) 0 410 1 430 2 290 3 180 4 110 5 20 1440
  • 34. Slide 1.34 x f Cumulative frequency 0 410 410 1 430 840=410+430 2 290 1130=840+290 3 180 1310=1130+180 4 110 1420=1310+110 5 20 1440=1420+20 1440 The position of the median is (n+1)/2 = (1440+1)/2 =720.5 ie between 720th and 721st one day
  • 35. Slide 1.35 Advantages of using median The advantage of using the median as a measure of central tendency is that it is not influenced by outliers. Consequently, the median is preferred over the mean as a measure of central tendency for data sets that contain outliers.
  • 36. Slide 1.36 Median for grouped data • Median for a grouped data is given by:  n 2−F  median = L +  ÷c  fm  • L ≡ lower limit of the median class • n ≡ number of observation • F ≡ sum of frequency up to but excludes the median class • fm ≡ frequency of the median class • c ≡ width of the class
  • 37. Slide 1.37 Calculate the median of the grouped data below Weight (oz) Frequency (f) 19.2-19.4 1 19.5-19.7 2 19.8-20.0 8 20.1-20.3 4 20.4-20.6 3 20.7-20.9 2 Total ∑ f = n = 20
  • 38. Slide 1.38 Median • L ≡ 19.8, n ≡ 20, F ≡ 3, fm ≡ 8, c ≡ 0.3  n 2− F   20 2 − 3  med = L +  ÷c = 19.8 +  ÷0.3  fm   8  7 = 19.8 +  ÷0.3 = 19.8 + 0.2625 8 = 20.06 oz
  • 39. Slide 1.39 Mode  Definition The mode is the value that occurs with the highest frequency in a data set.
  • 40. Slide 1.40 Example 8  The following data give the speeds (in miles per hour) of eight cars that were stopped for speeding violations. 77 69 74 81 71 68 74 73 Find the mode.
  • 41. Slide 1.41 Solution 8  In this data set, 74 occurs twice and each of the remaining values occurs only once. Because 74 occurs with the highest frequency, it is the mode. Therefore, Mode = 74 miles per hour
  • 42. Slide 1.42 Mode cont. • A data set may have none or many modes, whereas it will have only one mean and only one median. – The data set with only one mode is called unimodal. – The data set with two modes is called bimodal. – The data set with more than two modes is called multimodal.
  • 43. Slide 1.43 Different patterns for the mode
  • 44. Slide 1.44 Different patterns for the mode
  • 45. Slide 1.45 Different patterns for the mode
  • 46. Slide 1.46 Mode - tabulated (ungrouped data) The number of working days lost by employees in the last quarter (Calculate the mode number of working days lost) Number of days (x) Number of employees (f) 0 410 1 430 2 290 3 180 4 110 5 20 1440
  • 47. Slide 1.47 Number of days (x) Number of employees (f) 0 410 1 430 2 290 3 180 4 110 5 20 1440 The mode correspond to the highest frequency occurring number which is one day lost
  • 48. Slide 1.48 Advantage of using the mode One advantage of the mode is that it can be calculated for both quantitative and qualitative kinds of data, whereas the mean and median can be calculated for only quantitative data.
  • 49. Slide 1.49 Example 12  The status of five students who are members of the student senate at a college are senior, sophomore, senior, junior, senior. Find the mode.
  • 50. Slide 1.50 Solution 12  Because senior occurs more frequently than the other categories, it is the mode for this data set.  We cannot calculate the mean and median for this data set.
  • 51. Slide 1.51 Mode for tabulated grouped data • For a group data, mode is given as:  d1  mode = L +  ÷c  d1 + d 2  • L ≡ lower limit of the modal class • d1 ≡ frequency of modal class minus previous class • d2 ≡ frequency of modal class minus following class • c ≡ width of the class
  • 52. Slide 1.52 Calculate the mode of the grouped data below Weight (oz) Frequency (f) 19.2-19.4 1 19.5-19.7 2 19.8-20.0 8 20.1-20.3 4 20.4-20.6 3 20.7-20.9 2 Total ∑f = n = 20
  • 53. Slide 1.53 Mode • L ≡ 19.8, d1 ≡ 6, d2 ≡ 4, c ≡ 0.3 d1 6 mode = L + c = 19.8 + 0.3 d1 + d 2 6+4 1.8 = 19.8 + = 19.8 + 0.18 10 = 19.98 oz
  • 54. Slide 1.54 Relationships among the Mean, Median, and Mode 1. This is observed with regards to the shape of the frequency distribution (Skewness). In Figure 1, the values of the mean, median, and mode are identical, and they lie at the center of the distribution.
  • 55. Slide 1.55 Figure 1 Zero Skewed (Symmetrical)
  • 56. Slide 1.56 Figure 2 Positively skewed
  • 57. Slide 1.57 Positively Skewed 2. A histogram and a frequency curve is positively skewed if the right tail is longer (Figure 2), the value of the mean > median > mode  Notice that the mode always occurs at the peak point.  The value of the mean is the largest in this case because it is sensitive to outliers that occur in the right tail.  Outliers in the right tail pull the mean to the right.
  • 58. Slide 1.58 Figure 3 Negatively skewed
  • 59. Slide 1.59 Negatively Skewed 3. A histogram and a frequency distribution is negatively skewed if the left tail is longer (Figure 3) the value of the mode > median > mean – In this case, the outliers in the left tail pull the mean to the left.