08 measures of dispersion

865 views
716 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
865
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

08 measures of dispersion

  1. 1. 14-04-2012 1 Research Methodology Dr. NimitChowdhary,Professor Saturday, April 14, 2012 1© Dr. Nimit Chowdhary  To be able to compute four common measures of variability  Range  Inter-quartile range  Standard deviation  Variance Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 2
  2. 2. 14-04-2012 2  The range is the difference between the largest and the smallest values in a set of values.  Example 2 4 9 5 7 3 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 3 smallest largest Range = Largest – Smallest = 9 – 2 = 7  (+)Easy to calculate  (-) Relies only on two values.  (-) Ignores variability of all middle values Data set A: 1 2 3 4 5 6 7 8 9 Range= 9 – 1 = 8 Data set B: 1 1 1 1 1 1 1 1 9 Range= 9 – 1 = 8
  3. 3. 14-04-2012 3  The interquartile range is a measure of variability,based on dividing the dataset into quartiles.  Quartilesdivide an ordered data set into four equal parts.  The values that divide each part are called the first, second and third quartiles.  First, second and third quartiles are denoted byQ1, Q2 and Q3 respectively. Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 5  Arrange data set in numerical order  Define the quartiles- the second quartileQ2 is the median of the entire data set  Q1 is the median of the data belowQ2  Q3 is the median of the data aboveQ2  The interquartile range is IQR = Q3 –Q1 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 6
  4. 4. 14-04-2012 4 Ordered data set 0 1 2 3 4 5 6 7 8 9 Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 8 MedianisQ2 Q2= (4 + 5)/2 Q2= 4.5Q1 = 2 Q3 = 2 Interquartile range = Q3 – Q1 Interquartile range = 7 – 2 = 5
  5. 5. 14-04-2012 5 IQR ignores outliers! 0 1 2 3 4 5 6 7 8 999 MedianisQ2 Q2= (4 + 5)/2 Q2= 4.5Q1 = 2 Q3 = 2 Interquartile range = Q3 – Q1 Interquartile range = 7 – 2 = 5 While range is strongly influenced by outliers, IQR is not Variance is the average squared deviation from the mean 2 =  (Xi- )2 / N  2 = variance   = summation symbol  Xi= element i from the data set   =mean of the data set  N = number of elements in the data set
  6. 6. 14-04-2012 6  Find the variance of the following 0, 1, 5, 6  Number of entries = N= 4  Mean ==  X/ N  Deviation sum of squares = SS =  (x- )2 N X Variance    2 2 )(  
  7. 7. 14-04-2012 7  Find the variance of the following 0, 1, 5, 6  Mean: =  X/ N = (0+1+5+6)/4= 12/4 = 3  Dev sum of squares= SS =  (x- )2 = (0-3)2 + (1-3)2 + (5-3)2 + (6-3)2 = 9+4+4+9 = 26  Variance=  (Xi- )2 / N = 26/4 = 6.5 N X Variance    2 2 )(    The standard deviation is the square root of the variance Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 14   NXdeviationdardS i /)(tan 2 
  8. 8. 14-04-2012 8  What happens to variabilitywhen you add a constant to each value in the data set?  All measures of variability-range, interquartile range, variance, and standard deviation-stay the same Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 15  The variance and standard deviation are the most common and useful measures of variability.  These two measures provide informationabout how the data vary about the mean.
  9. 9. 14-04-2012 9 When the data are clustered about the mean, the variance and standard deviation will be somewhat small. When the data are widely scattered about the mean, the variance and standard deviation will be somewhat large.
  10. 10. 14-04-2012 10  The sample variance is an approximate average of the squared deviationsof the data values from the sample mean.  The sample variance is computed from the followingformula and is denoted by s2: 3-20  What is the variance for the following samplevalues? 3 8 6 14 0 11  NOTE: Do not let the formula intimidate you. We will build a table to help with the computations.
  11. 11. 14-04-2012 11  We will build a table to help in the computations. NOTE:The mean = 7. S2 = 132/(6 – 1) = 132/5 = 26.4  In the previous example, observe that the variance is large relative to the size of the data values.  This can be observed from the plot which shows that the data values are very much spread out about the mean value of 7.
  12. 12. 14-04-2012 12  The sample standard deviation is the positivesquare root of the variance.  NOTE: the standard deviation has the same unit as the variable.  Example:The samplestandard deviation for the previous example is  If all of the observations have the same value, the sample variance (standard deviation) will be zero. That is, there is no variability in the data set.  The variance (standard deviation) is influenced by outliers in the data set.  The unit for the standard deviation is the same as that for the raw data.  Thus it is preferred to use the standard deviation rather than the variance as the measure of variability.
  13. 13. 14-04-2012 13  The populationvariance is the average of the squared deviationsof the data values from the populationmean.  The populationvariance is computed from the following formula and is denoted by ss22 :  The populationstandard deviation is the positivesquare root of the population variance.  The populationstandard deviation is computed from the following formula and is denoted by ss :
  14. 14. 14-04-2012 14  The coefficient of variation (CV) allows us to compare the variation of two (or more) different variables.  Explanation of the term – sample coefficient of variation: the sample coefficient of variation is defined as the sample standard deviation divided by the sample mean of the data set.  Usually, the result is expressed as a percentage. NOTE:The sample coefficient of variation standardizes the variation by dividing it by the sample mean.
  15. 15. 14-04-2012 15  The coefficient of variation has no units since the standard deviationand the mean have the same units, and thus cancel out each other.  Because of this property, we can use this measure to compare the variations for different variables with different units. 3-30 The mean number of tourists arriving at a monument over a four-month period was 90, and the standard deviation was 5. The average expenditure made at the site was Rs.5,400, and the standard deviation was Rs. 775. Compare the variations of the two variables.
  16. 16. 14-04-2012 16 Since the CV is larger for the revenues, there is more variability in the recorded revenues than in the number of tickets issued.  Explanation of the term – population coefficient of variation: the population coefficient of variation is defined as the populationstandard deviation divided by the populationmean of the data set.  NOTE:The populationCV has the same properties as the sampleCV.
  17. 17. 14-04-2012 17  Different measures of dispersion  Range  Interquartile range  Variance  Standard deviation  Concept of Coefficient ofVariance Saturday, April 14, 2012 © Dr. Nimit Chowdhary Research Methodology Workshop p. 33

×