2. Introduction
• Average or the measure of central tendency tell us where the center of the data
lies but does not tell us how the items of the set are distributed around the center
• Two sets may have the same averages but items in one may scatter widely around
this center while in another case all the items may lie close to the average
• Example: Consider the minimum temperature recorded during winters in two
cities A and B
City A: 10 0 , 12 0 ,8 0 ,9 0 ,6 0 ,4 0 ,8 0
City B: 0 0 , 12 0 ,8 0 ,14 0 ,11 0 ,4 0 ,8 0
The average of both data set is 8.14 0 ,i.e. the minimum average temperature
during the week in both the cities is same as 8.14 0,however in case of city B the
values are more away or more scattered from the average of 8.14
3. What Is Dispersion?
•Thus the average does not enable us to draw a full picture of the set of
observations. A further description about the degree of scatterdness is necessary
to get a better description of the data
•The meaning of dispersion is “scatteredness.”
•The degree to which numerical data tends to spread around an average value is
called variation or dispersion of data.
No Variability in Cash Flow
Variability in Cash Flow
Mean
Mean
4. Same measures of central tendencies but
different spread (variability )
6. • The measures of dispersion ( or variability) coupled with measures of
central tendencies give a fairly good idea (not complete idea) about the
nature of distribution.
• To have complete idea about the distribution , we need measures of
skewness and kurtosis also
• Dispersion is the spread or scatter of values from measure of central
tendency
• Measure of dispersion is designed to state the extent to which individual
observations (or items) vary from their average .
• Only the amount of variation and not direction is taken into account.
• It is measured as an average deviation about central value
7. Need to study dispersion
• It is the value of dispersion which says how much reliable a central tendency is?
• Usually, a small value of dispersion indicates that measure of central tendency
is more reliable and vice ‐ versa.
• Many powerful analytical tools in statistics such as correlation analysis, the
testing of hypothesis, analysis of variance, the statistical quality control,
regression analysis are based on measure o f variation of one kind or another.
• The degree of data spread also helps in analyzing importance of different
components of a system, for example for financial analyst, it is important to
know the dispersion of a firms earnings-if the earnings are highly dispersed i.e.
varying from extremely low to very high then it indicates a higher risk to the
creditor or stock holder. Similarly for Quality Control Expert –A drug that is
average in purity but ranges from very pure to highly impure may endanger
lives.
• Dispersion is also used to compare uniformity of different data like income,
temperature, rainfall, weight, height, etc.
8. • It is useful to determine the nature and cause of the variation in order
to control the variation itself
• Health – variations in body temperature , pulse beat and blood
pressures are basic guides to diagnosis.
• A greater amount of dispersion means lack of uniformity or
consistency in the data . In such a case no average will reliably
represent the series
• It helps us to determine if central tendeny truly represents the series
9. Importance of Dispersion
• Conclusion from central tendencies alone carries no meaning wihtout
knowing the variation of various items of the series from the Average
• Inequalities in distribution of wealth and income can be measured by
dispersion
• Dispersion is used to compare and measure the concentration of
economic power and monopoly in a country
10.
11. Classification of Measures of Dispersion
Measure of dispersion is always a positive real number. If all values of individual
observations are identical with central tendency then dispersion is always zero
and as deviation in observation from central tendency increases, dispersion also
increases but it never becomes negative.
There are two types of measures of dispersion:
1. Absolute measures of dispersion: Absolute measures of dispersion are
presented in the same unit as the unit of distribution.
2. Relative measures of dispersion: Relative measures of dispersion are useful
in comparing two sets of data which have different units of measurement.
Relative measures of dispersion are pure unit less numbers and are generally
called coefficient of dispersion.
12. Absolute vs Relative Measures of Dispersion
• Absolute measures in terms of units of measurement while relative is
a ratio and is independent of units of measurement
• Absolute measures cannot compare variability of 2 distributions
expressed in different units
• Comparison of distributions with regard to their variability from
central value is done by relative measures of dispersion
13. The following are some of the important and widely used methods of
measuring dispersion:
1. Range
2. Interquartile range and Quartile deviation
3. Mean deviation or average deviation
4. Variance
5. Standard deviation
Methods of Measuring Dispersion
14. Properties of a Good Measure of Dispersion
1. Like a good measure of central tendency the good measure of
dispersion should also have similar characteristics.
2. A good measure of dispersion should be clearly defined so that
there should not be any scope of subjectivity in computation as
well as its interpretation.
3. It should be easy to compute, understand and interpret and
further, all individual observations should be used in its estimation
and also it should be free from any biasness or biasness due to any
extreme value.
4. Since dispersion is also used to estimate many statistical complex
properties of data so a dispersion should be easily applicable in any
algebraic operations.
5. Finally, such measure of dispersion should be least affected by
sampling or have high degree of sampling stability.
15. Range:
For raw data range is defined as the difference between the smallest and
the greatest values in a distribution.
Symbolically R= L-S
where L is the largest observation, S the smallest observation, and R the
range.
Range is an absolute measure of dispersion. The relative measure of
dispersion for range is called the coefficient of range and is calculated by
the following formula:
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐿 − 𝑆
𝐿 + 𝑆
Thus in coefficient of range ,the range L-S is standardized by L+S
16. In the case of a grouped data range is estimated by taking the
difference of upper limit of highest class interval and lower limit of
lowest class interval.
Symbolically R= ULI – LFI
Where ULI is Upper limit of last class interval while LFI is Lower limit of
first class interval, R is the range
To make it free from the units, relative measure of range is defined as
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝑈𝐿𝐼 − 𝐿𝐹𝐼
𝑈𝐿𝐼 + 𝐿𝐹𝐼
Range:
17. Illustration 1
Following are the wages of 8 workers of a factory. Find the range and the
coefficient of range. Wages in Dollars 1400, 1450, 1520, 1380, 1485, 1495,
1575, 1440.
18. Illustration 1 Solution
• Here Largest value =L=1575 and Smallest Value =S=1380
Range =L-S=1575−1380=195
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐿−𝑆
𝐿+𝑆
=
1575−1380
1575+1380
= 0.66
19. Illustration 2
Let us take two sets of observations. Set A contains marks of five students in Mathematics out of 25
marks and group B contains marks of the same student in English out of 100 marks.
Set A: 10, 15, 18, 20, 20
Set B: 30, 35, 40, 45, 50
Calculate values of range and coefficient of range ?
20. Illustration 2 Solution
•
• Range
• Coefficient of Range
• Set A: (Mathematics)
• 20−10=10
•
20−10
20+10
= 0.33
• Set B: (English)
• 50−30=20
•
50−30
50+30
= 0.25
21. Illustration 2 Solution
• In set A the range is 10 and in set B the range is 20. Apparently it
seems as if there is greater dispersion in set B. But this is not true.
The range of 20 in set B is for large observations and the range of 10
in set A is for small observations. Thus 20 and 10 cannot be compared
directly. Their base is not the same. Marks in Mathematics are out of
25 and marks of English are out of 100. Thus, it makes no sense to
compare 10 with 20. When we convert these two values into
coefficient of range, we see that coefficient of range for set A is
greater than that of set B. Thus there is greater dispersion or variation
in set A. The marks of students in English are more stable than their
marks in Mathematics.
22. Find the range and coefficient of range ,of the weight of the
students of a university.
Illustration 3
Weights (Kg) 60−62 63−65 66−68 69−71 72−74
Number of Students 55 18 42 27 8
23. Weights (Kg) Class Boundaries Mid Value No. of Students
60−62 59.5−62.5 61 55
63−65 62.5−65.5 64 18
66−68 65.5−68.5 67 42
69−71 68.5−71.5 70 27
72−74 71.5−74.5 73 8
Solution –Method I
Here Upper class limit of the last class = ULI=74.5
Lower class limit of the first class =LFI =59.5
Range = ULI - LFI=74.5−59.5=15 Kilogram
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝑈𝐿𝐼−𝐿𝐹𝐼
𝑈𝐿𝐼+𝐿𝐹𝐼
=
74.5−59.5
74.5+59.5
= 0.1119
24. The following distribution gives the numbers of houses and the number
of persons per house. Calculate the range and coefficient of range
Number of Persons 1 2 3 4 5 6 7 8 9 10
Number of Houses 26 113 120 95 60 42 21 14 55 44
Illustration 4
25. Illustration 4 Solution
Here largest value L = 10 and Smallest value S= 1
Range = L- S = 10- 1=9
Coefficient of range =
𝐿−𝑆
𝐿+𝑆
=
10−1
10+1
= 0.818
26. Application of range
(i) Quality Control:
In quality control of manufactured products, range is used to study the variation in the quality of the
units manufactured. Even with the most modern mechanical equipment there may be a small,
almost insignificant, difference in the different units of a commodity manufactured. Thus, if a
company is manufacturing bottles of a particular type, there may be a slight variation in the size or
shape of the bottles manufactured. In such cases a range is usually determined, and all the units
which fall within these limits are accepted while those which fall outside the limits are rejected.
(ii) Variation in Money Rates, Share values, Exchange Rates and Gold prices, etc:
Variations in money rates, share values, gold prices and exchange rates are commonly studied
through range because the fluctuations in them are not very large. In fact range as a measure of
dispersion should be generally used only when variations in the value of the variable are not much.
(iii) Weather forecasting:
Range gives an idea of the variation between maximum and minimum levels of temperature. From
day to day the range would not vary much and it is helpful in studying the vagaries of nature if
variations suddenly rise or fall.
27. Interquartile range and Quartile deviation
Interquartile range is another positional and absolute measure of data dispersion in any
series which try to minimize the error of range as a measure of dispersion by avoiding the
use of extreme values and in its place uses the difference of first Q 1and third Q 3 quartile
as a measure of dispersion.
This measure of dispersion ignores fifty per cent (first 25 per cent and last 25 per cent) of
observations.
Interquartile range (IQR) = Q 3 – Q 1
Half the distance between Q 1and Q 3 is called Semi-Interquartile Range or Quartile
Deviation (QD) .Thus
QD =
Q 3 – Q 1
2
NOTE: The use of QD is that one may say that the span Median± QD contains 50% 0f the
data. It also provides a short cut method to calculate Standard Deviation using the formula
6 Q.D. = 5 M.D. = 4 S.D
28. 1. The interquartile range is an interval, not a scalar. You should always
report both numbers, Q 1and Q 3 not just the difference between them.
You can then explain it by saying that half the sample readings were
between these two values, a quarter were smaller than the lower
quartile, and a quarter higher than the upper quartile.
2. The median is not necessarily between Q 1and Q 3 ,however in case of
symmetrical distribution it is in the middle of two quartiles.
3. The median and quartiles divide the data into equal numbers of values
but do not necessarily divide the data into equally wide intervals
Interquartile range and Quartile deviation _Important Remarks
29. Coefficient of Quartile Deviation
A relative measure of dispersion based on the quartile deviation is
called the coefficient of quartile deviation. It is defined as Coefficient
of Quartile Deviation
Coefficient of Quartile Deviation =
Q 3 – Q 1
2
Q 3 + Q 1
2
=
Q 3 – Q 1
Q 3 + Q 1
It is pure number free of any units of measurement. It can be used
for comparing the dispersion in two or more than two sets of data.
30. Following are the responses from 55 students to the question about how much
money they spent every day. Calculate range and interquartile range and interpret
your result
55 60 80 80 80 85 85 85 90 90 90
90 92 94 95 95 95 95 100 100 100 100
100 100 105 105 105 105 109 110 110 110 110
112 115 115 115 115 115 120 120 120 120 120
124 125 125 125 130 130 140 140 140 145 150
Illustration 5
31. Range = Largest Observation –Smallest Observation=150-55= 95
Q1=Value of
𝑛+1
4
th item =Value of
55+1
4
th item =Value of (14)th item= 94
Q3=Value of
3(𝑛+1)
4
th item =Value of
3(55+1)
4
th item=Value of (42)nd item=120
Interquartile range (IQR) = Q 3 – Q 1=120-94=26
Interpretation
A. Amongst all the students there is a variability of Rs.95 in their daily spending
B. 25% values lie below Q1=94. Variation from Smallest Value to Q 1 is 94 -55= Rs.39.Thus lower
25% of students have a variability of Rs.39 in their daily spending.
C. 25% values lie above Q3=120. Variation from Q3 to largest value is 150-120= Rs.30.Thus upper
25% of students have a variability of Rs.30 in their daily spending.
D. 50% values lie between Q1 and Q3. Interquartile range is Rs.26.Thus middle 50% of students
have a variability of Rs.26 in their daily spending.
Solution
39. Illustration 9
• The wheat production (in Kg) of 20 acres is given as: 1120, 1240,
1320, 1040, 1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960,
1880, 1755, 1720, 1600, 1470, 1750, and 1885. Find the quartile
deviation and coefficient of quartile deviation.
40. .
After arranging the observations in ascending order, we get 1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360,
1440, 1470, 1600, 1680, 1720, 1730, 1750, 1755, 1785, 1880, 1885, 1960.
Solution 9
Q1=Value of
𝑛+1
4
th item =Value of
20+1
4
th item =Value of (5.25)th item
=5th item+0.25(6th item−5th item)
=1240+0.25(1320−1240)=1240+20=1260
Q3=Value of
3(𝑛+1)
4
th item =Value of
3(20+1)
4
th item=Value of (15.75)th item
=15th item+0.75(16th item−15th item)
=1750+0.75(1755−1750)=1753.75
Q.D=
Q3−Q1
2
=
1753.75−1260
2
=
492.75
2
=246.875
Coefficient of Quartile Deviation=
Q3−Q1
Q3+Q1
=
1753.75−1260
1753.75+1260
=0.164
41. Calculate the quartile deviation and coefficient of quartile deviation
from the data given below:
Illustration 10
Maximum Load
(short-tons)
Number of Cables
9.3−9.7 2
9.8−10.2 5
10.3−10.7 12
10.8−11.2 17
11.3−11.7 14
11.8−12.2 6
12.3−12.7 3
12.8−13.2 1
43. Mean deviation or average deviation
The mean deviation or the average deviation is defined as the mean of the absolute
deviations of observations from some suitable average which may be the arithmetic
mean, the median or the mode. The difference (X−average) is called deviation and
when we ignore the negative sign, this deviation is written as |X−average| and is read
as mod deviations. The mean of these mod or absolute deviations is called the mean
deviation or the mean absolute deviation. Thus for sample data in which the suitable
average is the X, the mean deviation M.D is given by the relation:
46. Illustration 11
Calculate the mean deviation form (1) arithmetic mean (2) median (3) mode in respect of
the marks obtained by nine students gives below and show that the mean deviation from
median is minimum.
Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7
48. Solution 11 (Contd.)
From the above calculations, it is clear that the
mean deviation from the median has the least
value.
Further it may be interpreted that the average
absolute discrepancies in marks from their
median is 2.33
50. Solution 12
It may be interpreted that the average
absolute discrepancies in size of items
from their mean is 0.915
51. The following data represents two income groups of five and seven
workers working in two different branches of a firm. Determine the
average absolute discrepancies. In which group do you feel that such
discrepancy is less.
Illustration 13
Branch I Branch II
Income(Rs.) Income(Rs.)
4000 3000
4200 4000
4400 4200
4600 4400
4800 4600
4800
5800
52. Illustration 13 (Cont.)
ǀX- Medianǀ ǀX- Medianǀ
Income(Rs) Median=4400 Income(Rs) Median=4400
4000 400 3000 1400
4200 200 4000 400
4400 0 4200 200
4600 200 4400 0
4800 400 4600 200
4800 400
5800 1400
N=5 ∑ǀX- Medianǀ=1200 N=7 ∑ǀX- Medianǀ=4000
Branch I Branch II Branch I
MAD =
∑ǀ𝑋− 𝑀𝑒𝑑𝑖𝑎𝑛ǀ
𝑁
=
1200
5
=240
Coefficient of MAD=
𝑀𝐴𝐷
𝑀𝑒𝑑𝑖𝑎𝑛
=
240
4400
=0.054
Branch II
MAD =
∑ǀ𝑋− 𝑀𝑒𝑑𝑖𝑎𝑛ǀ
𝑁
=
4000
7
=571.43
Coefficient of MAD=
𝑀𝐴𝐷
𝑀𝑒𝑑𝑖𝑎𝑛
=
571.43
4400
=0.13
1.The average absolute discrepancies from its median for branch I is Rs.240
2.The average absolute discrepancies from its median for branch II is Rs.571.43
3.To compare the two branches for lesser average absolute discrepancy we use relative measure of dispersion i.e.
coefficient of median absolute deviation , and observe that it is less (0.054) in branch I as compared to that (0.13) of
branch II
Thus one may interpret that there is more uniformity of income in branch I as compared to that in branch II
55. Illustration 15:Calculate Mean deviation from the
Mean for following weight distribution
Use Short Cut Method , Assumed Mean=12.5&
Step Deviation 5
58. • Absolute deviations from A.M. are calculated to ignore negative sign
• Another way to ignore negative sign is to take square these deviations
• Population Variance is the average of the squared deviations from the
arithmetic mean, of each observation in the set of all of the observations.
• Population variance is denoted by σ2 ( Sigma square)
𝜎2 =
∑(𝑋𝑖−𝑋 )2
𝑁
Where 𝑋 is the population mean and N is the size of the population
Variance
59. The variance is difficult to interpret because it is expressed in squared
units. Also the variance is hard to understand because the deviations
from the mean are squared, making it too large for logical explanation.
This problem can be solved by working with the square root of the
variance, which is called the standard deviation.
Population standard deviation σ = σ2=
∑(𝑋𝑖−𝑋 )2
𝑁
Standard Deviation
60.
61.
62. The wholesale prices of a commodity for seven consecutive days in a
month are as follows:
Days : 1 2 3 4 5 6 7
Commodity price/quintal : 240 260 270 245 255 286 264
Calculate the variance and standard deviation.
Illustration 16
64. Illustration 17
• Find the mean deviation from the mean and standard deviation from
following sample observations on weight (gms) of certain product
• 19,22,20,21,18,23,21,22,20,21,21,22,21,18,21,26
70. Standard Deviation by Short Cut
Method Formula
𝝈=
∑ 𝒇𝒊
𝒅𝒊
𝟐
∑ 𝒇𝒊
−
∑ 𝒇𝒊
𝒅𝒊
∑ 𝒇𝒊
𝟐
X h where d=(mid point – A)/h , A =Assumed Mean and
h=class interval
71. A study of 100 engineering companies give the following information
Profit (Rs.in Crore) : 0-10 10-20 20-30 30-40 40-50 50-60
No. of companies : 8 12 20 30 20 10
Calculate the standard deviation of the profit earned by Short Cut
Method taking Assumed Mean A=35
Std Deviation by Short Cut Method Formulae:
Illustration 19: Short Cut Method Formulae
𝜎=
∑ 𝑓𝑖
𝑑𝑖
2
∑ 𝑓𝑖
−
∑ 𝑓𝑖
𝑑𝑖
∑ 𝑓𝑖
2
X h
75. Coefficient of Variation
• Ex If you want to compare distribution of heights and weights of
students in class(different units –ft , Kg) and compare variability
• Relative measure of dispersion and has no units
• Usually expressed as percentage
• Used to compare the variability of two or more distributions
• The distribution with higher coeff of variation is less stable , less
uniform ,less consistent, less homogenous, less equitable BUT
MORE VARIABLE
76.
77. In a small business firm, two typists are employed-typist A and typist B.
Typist A types out, on an average, 30 pages per day with a standard
deviation of 6. Typist B, on an average, types out 45 pages with a
standard deviation of 10. Which typist shows greater consistency in his
output?
Illustration 19
78. Illustration 19-Solution
Coefficient of variation of typist A =CVA=
𝜎
𝑋
× 100 =
6
30
× 100 = 20%
Coefficient of variation of typist B =CVB=
𝜎
𝑋
× 100 =
10
45
× 100 = 22.2%
Thus although typist B types out more pages, there is a greater
variation in his output as compared to that of typist A. We can say
this in a different way: Though typist A's daily output is much less, he
is more consistent than typist B.
79. Illustration 20 : Which firm has greater
variability and what is average wage
90. Standard Deviation
• Rigidly defined
• Based on all observations
• Variance and SD will be zero when all values are equal
• SD is independent of change of origin –if same value is added or
subtracted from all values , variance and SD will remain unchanged
• SD depends on change of scale – if all values are multiplied by same
quantity , SD will get multiplied by same quantity
• When number of samples are drawn from same population , SD is
least affected from sample to sample as compared to other measures
of dispersion