• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this document? Why not share!

08 measures of dispersion

on

  • 491 views

 

Statistics

Views

Total Views
491
Views on SlideShare
491
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    08 measures of dispersion 08 measures of dispersion Document Transcript

    • 14-04-2012 Research Methodology Dr. Nimit Chowdhary, ProfessorSaturday, April 14, 2012 © Dr. Nimit Chowdhary 1 To be able to compute four common measures of variability  Range  Inter-quartile range  Standard deviation  VarianceSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2 1
    • 14-04-2012 The range is the difference between the largest and the smallest values in a set of values. largest Example 2 4 9 5 7 3smallest Range = Largest – Smallest = 9 – 2 = 7Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 3  (+)Easy to calculate Data set A:  (-) Relies only on two values. 12 3 45 6 78 9  (-) Ignores variability of Range= 9 – 1 = 8 all middle values Data set B: 1111 11 11 9 Range= 9 – 1 = 8 2
    • 14-04-2012 The interquartile range is a measure of variability, based on dividing the dataset into quartiles. Quartiles divide an ordered data set into four equal parts. The values that divide each part are called the first, second and third quartiles. First, second and third quartiles are denoted by Q1, Q2 and Q3 respectively.Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5 Arrange data set in numerical order Define the quartiles- the second quartile Q2 is the median of the entire data set Q1 is the median of the data below Q2 Q3 is the median of the data above Q2 The interquartile range is IQR = Q3 –Q1Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 6 3
    • 14-04-2012Ordered data set0 1 2 3 4 5 6 7 8 9 Median is Q2 Q2 = (4 + 5)/2 Q1 = 2 Q2 = 4.5 Q3 = 2 Interquartile range = Q3 – Q1 Interquartile range = 7 – 2 = 5Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8 4
    • 14-04-2012IQR ignores outliers!0 1 2 3 4 5 6 7 8 999 Median is Q2 Q2 = (4 + 5)/2 Q1 = 2 Q2 = 4.5 Q3 = 2 Interquartile range = Q3 – Q1 Interquartile range = 7 – 2 = 5 While range is strongly influenced by outliers, IQR is not Variance is the average squared deviation from the mean  2 =  (Xi- )2 / N  2 = variance  = summation symbol Xi= element i from the data set  =mean of the data set N = number of elements in the data set 5
    • 14-04-2012 Find the variance of  Number of entries = the following N= 4 0, 1, 5, 6  Mean ==  X/ N  Deviation sum of 2  (X  ) 2 squaresVariance    = SS =  (x- )2 N 6
    • 14-04-2012  Find the variance of  Mean: =  X/ N the following = (0+1+5+6)/4= 12/4 0, 1, 5, 6 =3  Dev sum of squares= SS 2  (X  ) 2 =  (x- )2Variance    = (0-3)2 + (1-3)2 + (5-3)2 + N (6-3)2 = 9+4+4+9 = 26  Variance=  (Xi- )2 / N = 26/4 = 6.5 The standard deviation is the square root of the variance S tan dard deviation    (X i  )2 / NSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14 7
    • 14-04-2012 What happens to variability when you add a constant to each value in the data set? All measures of variability- range, interquartile range, variance, and standard deviation- stay the sameSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15  The variance and standard deviation are the most common and useful measures of variability.  These two measures provide information about how the data vary about the mean. 8
    • 14-04-2012When the data are clustered about the mean, thevariance and standard deviation will be somewhat small. When the data are widely scattered about the mean, the variance and standard deviation will be somewhat large. 9
    • 14-04-2012 The sample variance is an approximate average of the squared deviations of the data values from the sample mean. The sample variance is computed from the following formula and is denoted by s2:  What is the variance for the following sample values? 3 8 6 14 0 11  NOTE: Do not let the formula intimidate you. We will build a table to help with the computations.3-20 10
    • 14-04-2012 We will build a table to help in the computations. NOTE: The mean = 7. S2 = 132/(6 – 1) = 132/5 = 26.4 In the previous example, observe that the variance is large relative to the size of the data values. This can be observed from the plot which shows that the data values are very much spread out about the mean value of 7. 11
    • 14-04-2012 The sample standard deviation is the positive square root of the variance. NOTE: the standard deviation has the same unit as the variable. Example: The sample standard deviation for the previous example is If all of the observations have the same value, the sample variance (standard deviation) will be zero. That is, there is no variability in the data set. The variance (standard deviation) is influenced by outliers in the data set. The unit for the standard deviation is the same as that for the raw data. Thus it is preferred to use the standard deviation rather than the variance as the measure of variability. 12
    • 14-04-2012 The population variance is the average of the squared deviations of the data values from the population mean. The population variance is computed from the following formula and is denoted by s2 :  The population standard deviation is the positive square root of the population variance.  The population standard deviation is computed from the following formula and is denoted by s : 13
    • 14-04-2012 The coefficient of variation (CV) allows us to compare the variation of two (or more) different variables. Explanation of the term – sample coefficient of variation: the sample coefficient of variation is defined as the sample standard deviation divided by the sample mean of the data set. Usually, the result is expressed as a percentage.NOTE: The sample coefficient of variationstandardizes the variation by dividing it bythe sample mean. 14
    • 14-04-2012 The coefficient of variation has no units since the standard deviation and the mean have the same units, and thus cancel out each other. Because of this property, we can use this measure to compare the variations for different variables with different units. The mean number of tourists arriving at a monument over a four-month period was 90, and the standard deviation was 5. The average expenditure made at the site was Rs.5,400, and the standard deviation was Rs. 775. Compare the variations of the two variables.3-30 15
    • 14-04-2012 Since the CV is larger for the revenues, there is more variability in the recorded revenues than in the number of tickets issued. Explanation of the term – population coefficient of variation: the population coefficient of variation is defined as the population standard deviation divided by the population mean of the data set. NOTE: The population CV has the same properties as the sample CV. 16
    • 14-04-2012  Different measures of dispersion  Range  Interquartile range  Variance  Standard deviation  Concept of Coefficient of VarianceSaturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 33 17