IQR ignores outliers!
0 1 2 3 4 5 6 7 8 999
Q2= (4 + 5)/2
Q2= 4.5Q1 = 2 Q3 = 2
Interquartile range = Q3 – Q1
Interquartile range = 7 – 2 = 5
While range is strongly influenced by
outliers, IQR is not
Variance is the average squared deviation
from the mean
2 = (Xi- )2 / N
2 = variance
= summation symbol
Xi= element i from the data set
=mean of the data set
N = number of elements in the data set
Find the variance of
0, 1, 5, 6
Number of entries =
Mean == X/ N
Deviation sum of
= SS = (x- )2
When the data are clustered about the mean, the
variance and standard deviation will be somewhat small.
When the data are widely scattered about the
mean, the variance and standard deviation will
be somewhat large.
The sample variance is an approximate
average of the squared deviationsof the
data values from the sample mean.
The sample variance is computed from the
followingformula and is denoted by s2:
What is the variance for the following
3 8 6 14 0 11
NOTE: Do not let the formula intimidate you. We will build
a table to help with the computations.
We will build a table to help in the
computations. NOTE:The mean = 7.
S2 = 132/(6 – 1)
In the previous example, observe that the
variance is large relative to the size of the data
This can be observed from the plot which
shows that the data values are very much
spread out about the mean value of 7.
The sample standard deviation is the
positivesquare root of the variance.
NOTE: the standard deviation has the same
unit as the variable.
Example:The samplestandard deviation for
the previous example is
If all of the observations have the same value, the
sample variance (standard deviation) will be zero.
That is, there is no variability in the data set.
The variance (standard deviation) is influenced by
outliers in the data set.
The unit for the standard deviation is the same as
that for the raw data.
Thus it is preferred to use the standard deviation
rather than the variance as the measure of variability.
The populationvariance is the average of the
squared deviationsof the data values from
The populationvariance is computed from
the following formula and is denoted by ss22 :
The populationstandard deviation is the
positivesquare root of the population
The populationstandard deviation is
computed from the following formula and
is denoted by ss :
The coefficient of variation (CV) allows us to
compare the variation of two (or more) different
Explanation of the term – sample coefficient of
variation: the sample coefficient of variation is
defined as the sample standard deviation divided
by the sample mean of the data set.
Usually, the result is expressed as a percentage.
NOTE:The sample coefficient of variation
standardizes the variation by dividing it by
the sample mean.
The coefficient of variation has no units
since the standard deviationand the mean
have the same units, and thus cancel out
Because of this property, we can use this
measure to compare the variations for
different variables with different units.
The mean number of tourists arriving at a
monument over a four-month period was 90,
and the standard deviation was 5. The
average expenditure made at the site was
Rs.5,400, and the standard deviation was Rs.
775. Compare the variations of the two
Since the CV is larger for the revenues, there is
more variability in the recorded revenues than in the
number of tickets issued.
Explanation of the term – population
coefficient of variation: the population
coefficient of variation is defined as the
populationstandard deviation divided by the
populationmean of the data set.
NOTE:The populationCV has the same
properties as the sampleCV.