• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Statistik Chapter 2
 

Statistik Chapter 2

on

  • 11,908 views

 

Statistics

Views

Total Views
11,908
Views on SlideShare
11,876
Embed Views
32

Actions

Likes
3
Downloads
330
Comments
1

2 Embeds 32

http://matematikkkktu.blogspot.com 29
http://wanbk.page.tl 3

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • thats gooddd becaause to me it is one of my favurite subject
    Statistics
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Statistik Chapter 2 Statistik Chapter 2 Document Transcript

    • QQS1013
      ELEMENTARY STATISTIC
      CHAPTER 2
      DESCRIPTIVE STATISTICS
      2.1Introduction
      2.2Organizing and Graphing Qualitative Data
      2.3Organizing and Graphing Quantitative Data
      2.4Central Tendency Measurement
      2.5Dispersion Measurement
      2.6Mean, Variance and Standard Deviation for Grouped Data
      2.7Measure of Skewness
      OBJECTIVES
      After completing this chapter, students should be able to:
      Create and interpret graphical displays involve qualitative and quantitative data.
      Describe the difference between grouped and ungrouped frequency distribution, frequency and relative frequency, relative frequency and cumulative relative frequency.
      Identify and describe the parts of a frequency distribution: class boundaries, class width, and class midpoint.
      Identify the shapes of distributions.
      Compute, describe, compare and interpret the three measures of central tendency: mean, median, and mode for ungrouped and grouped data.
      Compute, describe, compare and interpret the two measures of dispersion: range, and standard deviation (variance) for ungrouped and grouped data.
      Compute, describe, and interpret the two measures of position: quartiles and interquartile range for ungrouped and grouped data.
      Compute, describe and interpret the measures of skewness: Pearson Coefficient of Skewness.
      Introduction
      Raw data - Data recorded in the sequence in which there are collected and before they are processed or ranked.
      Array data - Raw data that is arranged in ascending or descending order.
      Example 1
      Here is a list of question asked in a large statistics class and the “raw data” given by one of the students:
      What is your sex (m=male, f=female)?
      Answer (raw data): m
      How many hours did you sleep last night?
      Answer: 5 hours
      Randomly pick a letter – S or Q.
      Answer: S
      What is your height in inches?
      Answer: 67 inches
      What’s the fastest you’ve ever driven a car (mph)?
      Answer: 110 mph
      Example 2
      Quantitative raw data
      Qualitative raw data
      These data also called ungrouped data
      2.2Organizing and Graphing Qualitative Data
      2.2.1Frequency Distributions/ Table
      2.2.2Relative Frequency and Percentage Distribution
      2.2.3Graphical Presentation of Qualitative Data
      2.2.1Frequency Distributions / Table
      A frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories.
      It exhibits the frequencies are distributed over various categories
      Also called as a frequency distribution table or simply a frequency table.
      The number of students who belong to a certain category is called the frequency of that category.
      457200185420
      Relative Frequency and Percentage Distribution
      A relative frequency distribution is a listing of all categories along with their relative frequencies (given as proportions or percentages).
      It is commonplace to give the frequency and relative frequency distribution together.
      Calculating relative frequency and percentage of a category
      Relative Frequency of a category
      = Frequency of that category
      Sum of all frequencies
      Percentage = (Relative Frequency)* 100
      Example 3
      A sample of UUM staff-owned vehicles produced by Proton was identified and the make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj = Waja, St = Satria, P = Perdana, Sv = Savvy):
      WWPIsIsPIsWStWjIsWWWjIsWWIsWWjWjIsWjSvWWWWjStWWjSvWIsPSvWjWjWWStWWWWStStPWjSv
      Construct a frequency distribution table for these data with their relative frequency and percentage.
      Solution:
      CategoryFrequencyRelative FrequencyPercentage (%)Wira1919/50 = 0.380.38*100= 38Iswara80.1616Perdana40.088Waja100.2020Satria50.1010Savvy40.088Total501.00100
      Graphical Presentation of Qualitative Data
      Bar Graphs
      A graph made of bars whose heights represent the frequencies of respective categories.
      Such a graph is most helpful when you have many categories to represent.
      Notice that a gap is inserted between each of the bars.
      It has
      =>simple/ vertical bar chart
      => horizontal bar chart
      => component bar chart
      => multiple bar chart
      Simple/ Vertical Bar Chart
      To construct a vertical bar chart, mark the various categories on the horizontal axis and mark the frequencies on the vertical axis
      Refer to Figure 2.1 and Figure 2.2,
      27432004445
      Figure 2.1 Figure 2.2
      Horizontal Bar Chart
      To construct a horizontal bar chart, mark the various categories on the vertical axis and mark the frequencies on the horizontal axis.
      Example 4: Refer Example 3,
      left15240

      Figure 2.3
      Another example of horizontal bar chart: Figure 2.4
      center635
      Figure 2.4: Number of students at Diversity College who are immigrants, by last country of permanent residence
      Component Bar Chart
      To construct a component bar chart, all categories is in one bar and every bar is divided into components.
      The height of components should be tally with representative frequencies.
      Example 5
      Suppose we want to illustrate the information below, representing the number of people participating in the activities offered by an outdoor pursuits centre during Jun of three consecutive years.
      200420052006Climbing213436Caving101221Walking7585100Sailing363640Total142167191
      Solution:
      Figure 2.5
      Mulztiple Bar Chart
      To construct a multiple bar chart, each bars that representative any categories are gathered in groups.
      The height of the bar represented the frequencies of categories.
      Useful for making comparisons (two or more values).
      Example 6: Refer example 5,
      center165100Figure 2.6
      Another example of horizontal bar chart: Figure 2.7
      457200100330
      Figure 2.7: Preferred snack choices of students at UUM
      The bar graphs for relative frequency and percentage distributions can be drawn simply by marking the relative frequencies or percentages, instead of the class frequencies.
      Pie Chart
      A circle divided into portions that represent the relative frequencies or percentages of a population or a sample belonging to different categories.
      An alternative to the bar chart and useful for summarizing a single categorical variable if there are not too many categories.
      The chart makes it easy to compare relative sizes of each class/category.
      The whole pie represents the total sample or population. The pie is divided into different portions that represent the different categories.
      To construct a pie chart, we multiply 360o by the relative frequency for each category to obtain the degree measure or size of the angle for the corresponding categories.
      Example 7 (Table 2.6 and Figure 2.8):
      0482603314700581660
      Table 2.6 Figure 2.8
      Example 8 (Table 2.7 and Figure 2.9):
      Movie GenresFrequencyRelative FrequencyAngle SizeComedyActionRomanceDramaHorrorForeignScience Fiction543628282216160.270.180.140.140.110.080.08 360*0.27=97.2o 360*0.18=64.8o360*0.14=50.4o360*0.14=50.4o360*0.11=39.6o360*0.08=28.8o360*0.08=28.8o2001.00360o
      left24765
      Figure 2.9Figure 2.9
      Line Graph/Time Series Graph
      A graph represents data that occur over a specific period time of time.
      Line graphs are more popular than all other graphs combined because their visual characteristics reveal data trends clearly and these graphs are easy to create.
      When analyzing the graph, look for a trend or pattern that occurs over the time period.
      Example is the line ascending (indicating an increase over time) or descending (indicating a decrease over time).
      Another thing to look for is the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period.
      Two data sets can be compared on the same graph (called a compound time series graph) if two lines are used.
      Data collected on the same element for the same variable at different points in time or for different periods of time are called time series data.
      A line graph is a visual comparison of how two variables—shown on the x- and y-axes—are related or vary with each other. It shows related information by drawing a continuous line between all the points on a grid.
      Line graphs compare two variables: one is plotted along the x-axis (horizontal) and the other along the y-axis (vertical).
      The y-axis in a line graph usually indicates quantity (e.g., RM, numbers of sales litres) or percentage, while the horizontal x-axis often measures units of time. As a result, the line graph is often viewed as a time series graph
      Example 9
      A transit manager wishes to use the following data for a presentation showing how Port Authority Transit ridership has changed over the years. Draw a time series graph for the data and summarize the findings.
      YearRidership(in millions)1990199119921993199488.085.075.776.675.4

      Solution:
      The graph shows a decline in ridership through 1992 and then leveling off for the years 1993 and 1994.
      Exercise 1
      The following data show the method of payment by 16 customers in a supermarket checkout line. Here, C = cash, CK = check, CC = credit card, D = debit and O = other.
      CCKCKCCCDOCCKCCDCCCCKCKCC
      Construct a frequency distribution table.
      Calculate the relative frequencies and percentages for all categories.
      Draw a pie chart for the percentage distribution.
      The frequency distribution table represents the sale of certain product in ZeeZee Company. Each of the products was given the frequency of the sales in certain period. Find the relative frequency and the percentage of each product. Then, construct a pie chart using the obtained information.
      Type of ProductFrequencyRelative FrequencyPercentageAngle SizeABCDE13125911
      Draw a time series graph to represent the data for the number of worldwide airline fatalities for the given years.
      Year1990199119921993199419951996No. of fatalities4405109908017325571132
      A questionnaire about how people get news resulted in the following information from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).
      NNRTTRNTMRMMNRNTRMNMTRRNN
      Construct a frequency distribution for the data.
      Construct a bar graph for the data.
      The given information shows the export and import trade in million RM for four months of sales in certain year. Using the provided information, present this data in component bar graph.
      MonthExportImportSeptemberOctoberNovemberDecember2830322420281714
      The following information represents the maximum rain fall in millimeter (mm) in each state in Malaysia. You are supposed to help a meteorologist in your place to make an analysis. Based on your knowledge, present this information using the most appropriate chart and give your comment.
      StateQuantity (mm)PerlisKedahPulau PinangPerakSelangorWilayah Persekutuan Kuala LumpurNegeri SembilanMelakaJohorPahangTerengganuKelantanSarawakSabah435512163721664100339022387610501255986878456
      2.3Organizing and Graphing Quantitative Data
      2.3.1Stem and Leaf Display
      2.3.2Frequency Distribution
      2.3.3Relative Frequency and Percentage Distributions.
      2.3.4 Graphing Grouped Data
      2.3.5Shapes of Histogram
      2.3.6Cumulative Frequency Distributions.
      Stem-and-Leaf Display
      In stem and leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. Then the leaves for each stem are shown separately in a display.
      Gives the information of data pattern.
      Can detect which value frequently repeated.
      Example 10
      12 9 10 5 12 23 7
      13 11 12 31 28 37 6
      41 38 44 13 22 18 19
      Solution:
      09 5 7 6
      12 0 2 3 1 2 4 3 8 9
      25 3 8 2
      36 1 7 8
      41 4
      Frequency Distributions
      A frequency distribution for quantitative data lists all the classes and the number of values that belong to each class.
      Data presented in form of frequency distribution are called grouped data.
      0163830
      The class boundary is given by the midpoint of the upper limit of one class and the lower limit of the next class. Also called real class limit.
      To find the midpoint of the upper limit of the first class and the lower limit of the second class, we divide the sum of these two limits by 2.
      e.g.:
      class boundary
      Class Width (class size)
      Class width = Upper boundary – Lower boundary
      e.g. :
      Width of the first class = 600.5 – 400.5 = 200
      Class Midpoint or Mark

      e.g:
      Constructing Frequency Distribution Tables
      1.To decide the number of classes, we used Sturge’s formula, which is
      c = 1 + 3.3 log n
      where c is the no. of classes
      n is the no. of observations in the data set.
      2. Class width,
      This class width is rounded to a convenient number.
      3.Lower Limit of the First Class or the Starting Point
      Use the smallest value in the data set.
      Example 11
      The following data give the total home runs hit by all players of each of the 30 Major League Baseball teams during 2004 season
      Solution:
      Number of classes, c = 1 + 3.3 log 30
      = 1 + 3.3(1.48)
      = 5.89 6 class
      Class width,
      Starting Point = 135
      Table 2.10 Frequency Distribution for Data of Table 2.9
      Total Home RunsTallyf135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242|||| |||||||||| |||| ||||||||1025634
      Relative Frequency and Percentage Distributions
      Example 12 (Refer example 11)
      Table 2.11: Relative Frequency and Percentage Distributions
      Total Home RunsClass BoundariesRelative Frequency%135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.50.33330.06670.16670.20.10.133333.336.6716.67201013.33Sum1.0100%
      Graphing Grouped Data
      Histograms
      A histogram is a graph in which the class boundaries are marked on the horizontal axis and either the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies or percentages are represented by the heights of the bars.
      In histogram, the bars are drawn adjacent to each other and there is a space between y axis and the first bar.
      Example 13 (Refer example 11)
      134.5 152.5 170.5 188.5 206.5 224.5 242.5
      Figure 2.10: Frequency histogram for Table 2.10
      Polygon
      A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon.
      Example 13
      34290082550
      134.5 152.5 170.5 188.5 206.5 224.5 242.5
      134.5 152.5 170.5 188.5 206.5 224.5 242.5Figure 2.11: Frequency polygon for Table 2.10
      For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.
      Figure 2.12: Frequency distribution curve
      Shape of Histogram
      Same as polygon.
      For a very large data set, as the number of classes is increased (and the width of classes is decreased), the frequency polygon eventually becomes a smooth curve called a frequency distribution curve or simply a frequency curve.
      The most common of shapes are:
      (i) Symmetric
      Figure 2.13 & 2.14: Symmetric histograms
      (ii) Right skewed and (iii) Left skewed
      Figure 2.15 & 2.16: Right skewed and Left skewed
      Describing data using graphs helps us insight into the main characteristics of the data.
      When interpreting a graph, we should be very cautious. We should observe carefully whether the frequency axis has been truncated or whether any axis has been unnecessarily shortened or stretched.
      Cumulative Frequency Distributions
      A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class.
      Example 14: Using the frequency distribution of table 2.11,

      Total Home RunsClass BoundariesCumulative Frequency135 – 152153 – 170171 – 188189 – 206207 – 224225 – 242134.5 less than 152.5152.5 less than 170.5170.5 less than 188.5188.5 less than 206.5206.5 less than 224.5224.5 less than 242.51010+2=1210+2+5=1710+2+5+6=2310+2+5+6+3=2610+2+5+6+3+4=30
      Ogive
      An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes.
      Two type of ogive:
      (i) ogive less than
      (ii)ogive greater than
      First, build a table of cumulative frequency.
      Example 15 (Ogive Less Than)
      56633730 – 3940 – 4950 – 5960 - 6970 – 7980 - 8930Number of students (f)TotalEarnings (RM)Earnings (RM) Cumulative Frequency (F)Less than 29.5Less than 39.5Less than 49.5Less than 59.5Less than 69.5Less than 79.5Less than 89.5051117202330
      0510152025303529.539.549.559.569.579.589.5EarningsCumulative Frequency
      Figure 2.17
      Example 16 (Ogive Greater Than)
      56633730 – 3940 – 4950 – 5960 - 6970 – 7980 - 8930Number of students (f)TotalEarnings (RM)
      Cumulative Frequency (F)Earnings (RM)
      302519131070More than 29.5More than 39.5More than 49.5More than 59.5More than 69.5More than 79.5More than 89.5
      0510152025303529.539.549.559.569.579.589.5EarningsCumulative Frequency
      Figure 2.18
      Figure 2.18
      2.3.7Box-Plot
      Describe the analyze data graphically using 5 measurement: smallest value, first quartile (K1), second quartile (median or K2), third quartile (K3) and largest value.
      Smallest valueLargest value K1 Median K3Largest value K1 Median K3Largest value K1 Median K3Smallest valueSmallest valueFor symmetry data For left skewed data For right skewed data
      2.4Measures of Central Tendency
      2.4.1 Ungrouped Data
      (1) Mean
      (2) Weighted mean
      (3) Median
      (4) Mode
      Grouped Data
      (1) Mean
      (2) Median
      (3) Mode
      Relationship among mean, median & mode
      2.4.1 Ungrouped Data
      Mean
      Mean for population data:

      Mean for sample data:
      where: = the sum af all values
      N = the population size
      n = the sample size,
      µ = the population mean
      = the sample mean
      Example 17
      The following data give the prices (rounded to thousand RM) of five homes sold recently in Sekayang.
      158189265127191
      Find the mean sale price for these homes.
      Solution:
      Thus, these five homes were sold for an average price of RM186 thousand @ RM186 000.
      The mean has the advantage that its calculation includes each value of the data set.
      Weighted Mean
      Used when have different needs.
      Weight mean :

      where w is a weight.
      Example 18
      Consider the data of electricity components purchasing from a factory in the table below:
      TypeNumber of component (w)Cost/unit (x)12345120050025001000800RM3.00RM3.40RM2.80RM2.90RM3.25Total6000
      Solution:
      Mean cost of a unit of the component is RM2.97
      Median
      Median is the value of the middle term in a data set that has been ranked in increasing order.
      Procedure for finding the Median
      Step 1: Rank the data set in increasing order.
      Step 2: Determine the depth (position or location) of the median.
      Step 3: Determine the value of the Median.
      Example 19
      Find the median for the following data:
      10 5 19 8 3
      Solution:
      (1) Rank the data in increasing order
      3 5 8 10 19
      (2) Determine the depth of the Median


      (3) Determine the value of the median
      Therefore the median is located in third position of the data set.
      3 5 8 10 19
      Hence, the Median for above data = 8
      Example 20
      Find the median for the following data:
      10 5 19 8 3 15
      Solution:
      (1) Rank the data in increasing order
      35810 15 19

      (2) Determine the depth of the Median

      (3) Determine the value of the Median
      Therefore the median is located in the middle of 3rd position and 4th position of the data set.
      Hence, the Median for the above data = 9
      The median gives the center of a histogram, with half of the data values to the left of (or, less than) the median and half to the right of (or, more than) the median.
      The advantage of using the median is that it is not influenced by outliers.
      Mode
      Mode is the value that occurs with the highest frequency in a data set.
      Example 21
      1. What is the mode for given data?
      77 69 74 81 71 68 74 73
      2. What is the mode for given data?
      77 69 68 74 81 71 68 74 73
      Solution:
      1. Mode = 74 (this number occurs twice): Unimodal
      2.Mode = 68 and 74: Bimodal
      A major shortcoming of the mode is that a data set may have none or may have more than one mode.
      One advantage of the mode is that it can be calculated for both kinds of data, quantitative and qualitative.
      Grouped Data
      Mean
      Mean for population data:
      Mean for sample data:
      Where the midpoint and f is the frequency of a class.
      Example 22
      The following table gives the frequency distribution of the number of orders received each day during the past 50 days at the office of a mail-order company. Calculate the mean.
      Numberof orderf10 – 1213 – 1516 – 1819 – 214122014 n = 50
      Solution:
      Because the data set includes only 50 days, it represents a sample. The value of is calculated in the following table:
      Numberof orderfxfx10 – 1213 – 1516 – 1819 – 2141220141114172044168340280 n = 50 = 832
      The value of mean sample is:
      Thus, this mail-order company received an average of 16.64 orders per day during these 50 days.
      Median
      Step 1: Construct the cumulative frequency distribution.
      Step 2: Decide the class that contain the median.
      Class Median is the first class with the value of cumulative frequency is at least n/2.
      Step 3: Find the median by using the following formula:
      Where:
      n = the total frequency
      F = the total frequency before class median
      i = the class width
      = the lower boundary of the class median
      = the frequency of the class median
      Example 23
      Based on the grouped data below, find the median:
      Time to travel to workFrequency1 – 1011 – 2021 – 3031 – 4041 – 508141297
      Solution:
      1st Step: Construct the cumulative frequency distribution
      Time to travel to workFrequency Cumulative Frequency1 – 1011 – 2021 – 3031 – 4041 – 508141297822344350
      Class median is the 3rd class
      So, F = 22, = 12, = 21.5 and i = 10
      Therefore,
      Thus, 25 persons take less than 24 minutes to travel to work and another 25 persons take more than 24 minutes to travel to work.
      Mode
      Mode is the value that has the highest frequency in a data set.
      For grouped data, class mode (or, modal class) is the class with the highest frequency.
      To find mode for grouped data, use the following formula:
      Where:
      is the lower boundary of class mode
      is the difference between the frequency of class mode and the frequency of the class before the class mode
      is the difference between the frequency of class mode and the frequency of the class after the class mode
      i is the class width
      Example 24
      Based on the grouped data below, find the mode
      Time to travel to workFrequency1 – 1011 – 2021 – 3031 – 4041 – 508141297
      Solution:
      Based on the table,
      = 10.5, = (14 – 8) = 6, = (14 – 12) = 2 and i = 10
      We can also obtain the mode by using the histogram;
      Figure 2.19
      2.4.3 Relationship among mean, median & mode
      As discussed in previous topic, histogram or a frequency distribution curve can assume either skewed shape or symmetrical shape.
      Knowing the value of mean, median and mode can give us some idea about the shape of frequency curve.
      For a symmetrical histogram and frequency curve with one peak, the value of the mean, median and mode are identical and they lie at the center of the distribution.(Figure 2.20)
      2971800961390228600951865For a histogram and a frequency curve skewed to the right, the value of the mean is the largest that of the mode is the smallest and the value of the median lies between these two.
      Figure 2.20: Mean, median, and mode for a symmetric histogram and frequency distribution curve
      Figure 2.21: Mean, median, and mode for a histogram and frequency distribution curve skewed to
      the right
      2971800168275
      For a histogram and a frequency curve skewed to the left, the value of the mean is the smallest and that of the mode is the largest and the value of the median lies between these two.
      Figure 2.22: Mean, median, and mode for a histogram and frequency distribution curve skewed to the left
      2.5Dispersion Measurement
      The measures of central tendency such as mean, median and mode do not reveal the whole picture of the distribution of a data set.
      Two data sets with the same mean may have a completely different spreads.
      The variation among the values of observations for one data set may be much larger or smaller than for the other data set.
      2.5.1 Ungrouped data
      (1) Range
      (2) Standard Deviation
      2.5.2 Grouped data
      (1) Range
      (2) Standard deviation
      2.5.3 Relative Dispersion Measurement
      Ungrouped Data
      Range
      RANGE = Largest value – Smallest value
      Example 25:
      Find the range of production for this data set,
      Solution:
      Range = Largest value – Smallest value
      = 267 277 – 49 651
      = 217 626
      Disadvantages:
      being influenced by outliers.
      Based on two values only. All other values in a data set are ignored.
      Variance and Standard Deviation
      Standard deviation is the most used measure of dispersion.
      A Standard Deviation value tells how closely the values of a data set clustered around the mean.
      Lower value of standard deviation indicates that the data set value are spread over relatively smaller range around the mean.
      Larger value of data set indicates that the data set value are spread over relatively larger around the mean (far from mean).
      Standard deviation is obtained the positive root of the variance:
      VarianceStandard DeviationPopulationSample
      Example 26
      Let x denote the total production (in unit) of company
      CompanyProductionABCDE62931267534
      Find the variance and standard deviation,
      Solution:
      CompanyProduction (x)x2ABCDE629312675343844864915 87656251156 1156 

      Since s2 = 1182.50;
      Therefore,
      The properties of variance and standard deviation:
      (1) The standard deviation is a measure of variation of all values from the mean.
      (2)The value of the variance and the standard deviation are never negative. Also, larger values of variance or standard deviation indicate greater amounts of variation.
      (3)The value of s can increase dramatically with the inclusion of one or more outliers.
      (4) The measurement units of variance are always the square of the measurement units of the original data while the units of standard deviation are the same as the units of the original data values.
      Grouped Data
      Range
      Range = Upper bound of last class – Lower bound of first class
      ClassFrequency41 – 5051 – 6061 – 7071 – 8081 – 9091 - 10013713106 Total40
      Upper bound of last class = 100.5
      Lower bound of first class = 40.5
      Range = 100.5 – 40.5 = 60
      Variance and Standard Deviation
      VarianceStandard DeviationPopulationSample
      Example 27
      Find the variance and standard deviation for the following data:
      No. of orderf10 – 1213 – 1516 – 1819 – 214122014 Totaln = 50
      Solution:
      No. of orderfxfxfx210 – 1213 – 1516 – 1819 – 2141220141114172044168340280484235257805600 Totaln = 5085714216
      Variance, Standard Deviation,

      320040048260


      Thus, the standard deviation of the number of orders received at the office of this mail-order company during the past 50 days is 2.75.
      2.5.3 Relative Dispersion Measurement
      To compare two or more distribution that has different unit based on their dispersion Or
      To compare two or more distribution that has same unit but big different in their value of mean.
      Also called modified coefficient or coefficient of variation, CV.


      Example 28
      Given mean and standard deviation of monthly salary for two groups of worker who are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the CV for every group and determine which group is more dispersed.
      Solution:
      The monthly salary for group 1 worker is more dispersed compared to group 2.
      Measure of Position
      Determines the position of a single value in relation to other values in a sample or a population data set.
      Ungrouped Data
      Quartiles
      Interquatile Range
      Grouped Data
      Quartile
      Interquartile Range
      Quartiles
      Quartiles are three summary measures that divide ranked data set into four equal parts.
      The 1st quartiles – denoted as Q1
      The 2nd quartiles – median of a data set or Q2
      The 3rd quartiles – denoted as Q3
      Example 29
      Table below lists the total revenue for the 11 top tourism company in Malaysia
      109.7 79.9 21.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5 86.8
      Solution:
      Step 1: Arrange the data in increasing order
      76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
      Step 2: Determine the depth for Q1 and Q3
      Step 3: Determine the Q1 and Q3
      76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
      Q1 = 79.9
      Q3 = 103.5
      Table below lists the total revenue for the 12 top tourism company in Malaysia
      109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5 86.8
      Solution:
      Step 1: Arrange the data in increasing order
      74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
      Step 2: Determine the depth for Q1 and Q3
      Step 3: Determine the Q1 and Q3
      74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7 121.2
      Q1 = 79.4 + 0.25 (79.9 – 79.4) = 79.525
      Q3 = 98.0 + 0.75 (103.5 – 98.0) = 102.125
      Interquartile Range
      The difference between the third quartile and the first quartile for a data set.
      IQR = Q3 – Q1
      Example 30
      By referring to example 29, calculate the IQR.
      Solution:
      IQR = Q3 – Q1 = 102.125 – 79.525 = 22.6
      2.6.2 Grouped Data
      Quartiles
      From Median, we can get Q1 and Q3 equation as follows:

      ;
      Example 31
      Refer to example 23, find Q1 and Q3
      Solution:
      1st Step: Construct the cumulative frequency distribution
      Time to travel to workFrequency Cumulative Frequency1 – 1011 – 2021 – 3031 – 4041 – 508141297822344350
      2nd Step: Determine the Q1 and Q3
      Class Q1 is the 2nd class
      Therefore,
      Class Q3 is the 4th class
      Therefore,
      Interquartile Range
      IQR = Q3 – Q1
      Example 32:
      Refer to example 31, calculate the IQR.
      Solution:
      IQR = Q3 – Q1 = 34.3889 – 13.7143 = 20.6746
      Measure of Skewness
      To determine the skewness of data (symmetry, left skewed, right skewed)
      Also called Skewness Coefficient or Pearson Coefficient of Skewness
      If Sk +ve right skewed
      If Sk -ve left skewed
      If Sk = 0 symmetry
      If Sk takes a value in between (-0.9999, -0.0001) or (0.0001, 0.9999) approximately symmetry.
      Example 33
      The duration of cancer patient warded in Hospital Seberang Jaya recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days.
      What is the type of distribution?
      Find the skewness coefficient
      Solution:
      This distribution is right skewed because the mean is the largest value
      So, from the Sk value this distribution is right skewed.
      Exercise 2:
      A survey research company asks 100 people how many times they have been to the dentist in the last five years. Their grouped responses appear below.
      Number of VisitsNumber of Responses0 – 4165 – 92510 – 144815 – 1911
      What are the mean and variance of the data?
      A researcher asked 25 consumers: “How much would you pay for a television adapter that provides Internet access?” Their grouped responses are as follows:
      Amount ($)Number of Responses0 – 992100 – 1992200 – 2493250 – 2993300 – 3496350 – 3993400 – 4994500 – 9992
      Calculate the mean, variance, and standard deviation.
      The following data give the pairs of shoes sold per day by a particular shoe store in the last 20 days.
      85 90 89 70 79 80 83 83 75 76
      89 86 71 76 77 89 70 65 90 86
      Calculate the
      mean and interpret the value.
      median and interpret the value.
      mode and interpret the value.
      standard deviation.
      4. The followings data shows the information of serving time (in minutes) for 40 customers in a post office:
      2.04.52.52.94.22.93.52.83.22.94.03.03.82.52.33.52.13.13.64.34.72.64.13.14.62.85.12.72.64.43.53.02.73.92.92.92.53.73.32.4
      Construct a frequency distribution table with 0.5 of class width.
      Construct a histogram.
      Calculate the mode and median of the data.
      Find the mean of serving time.
      Determine the skewness of the data.
      Find the first and third quartile value of the data.
      Determine the value of interquartile range.
      5. In a survey for a class of final semester student, a group of data was obtained for the number of text books owned.

      Number of studentsNumber of text book owned1291115108553210

      Find the average number of text book for the class. Use the weighted mean.
      The following data represent the ages of 15 people buying lift tickets at a ski area.
      1525261738166021
      30532840203531
      Calculate the quartile and interquartile range.
      A student scores 60 on a mathematics test that has a mean of 54 and a standard deviation of 3, and she scores 80 on a history test with a mean of 75 and a standard deviation of 2. On which test did she perform better?
      The following table gives the distribution of the share’s price for ABC Company which was listed in BSKL in 2005.
      Price (RM)Frequency12 – 1415 – 1718 – 2021 – 2324 – 2627 - 2951425763
      Find the mean, median and mode for this data.