CJ 301 – Measures of Dispersion/Variability
Think back to the description of measures of central tendency that describes these statistics as measures of how the data in a distribution are clustered, around what summary measure are most of the data points clustered.
But when comes to descriptive statistics and describing the characteristics of a distribution, averages are only half story. The other half is measures of variability.
In the most simple of terms, variability reflects how scores differ from one another. For example, the following set of scores shows some variability:
7, 6, 3, 3, 1
The following set of scores has the same mean (4) and has less variability than the previous set:
3, 4, 4, 5, 4
The next set has no variability at all – the scores do not differ from one another – but it also has the same mean as the other two sets we just showed you.
4, 4, 4, 4, 4
Variability (also called spread or dispersion) can be thought of as a measure of how different scores are from one another. It is even more accurate (and maybe even easier) to think of variability as how different scores are from one particular score. And what “score” do you think that might be? Well, instead of comparing each score to every other score in a distribution, the one score that could be used as a comparison is – that is right- the mean. So, variability becomes a measure of how much each score in a group of scores differs from the mean.
Remember what you already know about computing averages – that an average (whether it is the mean, the median or the mode) is a representative score in a set of scores. Now, add your new knowledge about variability- that it reflects how different scores are from one another. Each is important descriptive statistic. Together, these two (average and variability) can be used to describe the characteristics of a distribution and show how distribution differ from one another.
Measures of dispersion/variability describe how the data in a distribution are scattered or dispersed around, or from, the central point represented by the measure of central tendency.
We will discuss four different measures of dispersion, the range, the mean deviation, the variance, and the standard deviation.
RANGE
The range is a very simple measure of dispersion to calculate and interpret. The range is simply the difference between the highest score and the lowest score in a distribution.
Consider the following distribution that measures the “Age” of a random sample of eight police officers in a small rural jurisdiction.
Officer X = Age_
1 41
2 20
3 35
4 25
5 23
6 30
7 21
8 32
First, let’s calculate the mean as our measure of central tendency by adding the individual ages of each officer and dividing by the number of officers. The calculation is 227/8 = 28.375 years.
In general, the formula for the range is:
R=h-l
Where:
· r is the range
· h is the highest score in the .
CJ 301 – Measures of DispersionVariability Think back to the .docx
1. CJ 301 – Measures of Dispersion/Variability
Think back to the description of measures of central tendency
that describes these statistics as measures of how the data in a
distribution are clustered, around what summary measure are
most of the data points clustered.
But when comes to descriptive statistics and describing the
characteristics of a distribution, averages are only half story.
The other half is measures of variability.
In the most simple of terms, variability reflects how scores
differ from one another. For example, the following set of
scores shows some variability:
7, 6, 3, 3, 1
The following set of scores has the same mean (4) and has less
variability than the previous set:
3, 4, 4, 5, 4
The next set has no variability at all – the scores do not differ
from one another – but it also has the same mean as the other
two sets we just showed you.
4, 4, 4, 4, 4
Variability (also called spread or dispersion) can be thought of
as a measure of how different scores are from one another. It is
even more accurate (and maybe even easier) to think of
variability as how different scores are from one particular score.
And what “score” do you think that might be? Well, instead of
comparing each score to every other score in a distribution, the
2. one score that could be used as a comparison is – that is right-
the mean. So, variability becomes a measure of how much each
score in a group of scores differs from the mean.
Remember what you already know about computing averages –
that an average (whether it is the mean, the median or the mode)
is a representative score in a set of scores. Now, add your new
knowledge about variability- that it reflects how different
scores are from one another. Each is important descriptive
statistic. Together, these two (average and variability) can be
used to describe the characteristics of a distribution and show
how distribution differ from one another.
Measures of dispersion/variability describe how the data in a
distribution are scattered or dispersed around, or from, the
central point represented by the measure of central tendency.
We will discuss four different measures of dispersion, the
range, the mean deviation, the variance, and the standard
deviation.
RANGE
The range is a very simple measure of dispersion to calculate
and interpret. The range is simply the difference between the
highest score and the lowest score in a distribution.
Consider the following distribution that measures the “Age” of
a random sample of eight police officers in a small rural
jurisdiction.
Officer X = Age_
3. 1 41
2 20
3 35
4 25
5 23
6 30
7 21
8 32
First, let’s calculate the mean as our measure of central
tendency by adding the individual ages of each officer and
dividing by the number of officers. The calculation is 227/8 =
28.375 years.
In general, the formula for the range is:
R=h-l
4. Where:
· r is the range
· h is the highest score in the data set
· l is the lowest score in the data set
The range of this distribution would be the difference between
41 and 20, or 21 years. The variable age has a range of 21
years. We can say that this sample has a mean age of 28.375
years with a range of 21 years ranging from 20 to 41 years.
Although the range was quite easy to calculate using the
“eyeball” technique if there was a sample of 1,000 officers from
Phoenix, Arizona the eyeball technique would be more difficult
to use and we would resort to the computer to request the range
and it would be very quickly and accurately computed and
reported to us and we could have a very quick sense of how the
data in the variable Age are dispersed.
Take the following set of scores, for example:
98, 86, 77, 56, 48
In this example, 98-48=50. The range is 50. In a set of 500
numbers, where the largest is 98 and the smallest is 37, then the
range would be 61.
The range is used almost exclusively to get a very general
estimate of how wide or different scores are from one another –
that is, the range shows how much spread there is from the
lowest to the highest point in a distribution.
STANDARD DEVIATION
Now we get to the most frequently used measure of variability,
the standard deviation. Just think about what the term implies;
5. it’s a deviation from something (guess what?) that is standard.
Actually, the standard deviation (sd) represents the average
amount of variability in a set of scores. In practical terms, it’s
the average distance from the mean. The larger the standard
deviation, the larger the average distance each data point is
from the mean of the distribution.
LET’S LEARN HOW TO CALCULATE THE STANDARD
DEVIATION:
1. First, you need to determine the mean. The mean of a list of
numbers is the sum of those numbers divided by the quantity of
items in the list (read: add all the numbers up and divide by how
many there are).
2. Then, subtract the mean from every number to get the list of
deviations. Create a list of these numbers. It's OK to get
negative numbers here.
3. Next, square the resulting list of numbers (read: multiply
them with themselves) to get the squared deviation.
4. Add up all of the resulting squares to get their total sum.
5. To get the standard deviation, Divide your result by one less
than the number of items in the list and just take the square root
of the resulting number
I know this sounds confusing, but just check out several
examples below and practice each of them and you will be able
to calculate the standard deviation easily:
Example 1:
Your list of numbers:
X(X – Mean)(X- Mean) ²
1 (1- 7) = - 6 36
3 (3-7) = -4 16
4 (4-7) = -3 9
6 (6-7) = -1 1
6. 9 (9-7) = 2 4
19 (19-7) = 12 144
____________ ___________
Explanations below:
1. Mean: (1+3+4+6+9+19) / 6 = 42 / 6 = 7
2. List of deviations: -6, -4, -3, -1, 2, 12
3. Squares of deviations: 36, 16, 9, 1, 4, 144
4. Sum of deviations: 36+16+9+1+4+144 = 210
5. Standard Deviation =
Explanation: divided by one less than the number of items in the
list: 210 / 5 = 42
Square root of this number: square root (42) = about 6.48
LET’S GO BACK TO OUR FIRST EXAMPLE (WHEN WE
CALCULATED THE RANGE):
Consider the following distribution that measures the “Age” of
a random sample of eight police officers in a small rural
jurisdiction.
Officer X = Age_ (X – Mean) -Deviation(X- Mean) ² -
Squared Deviation
1
41
8. (30 – 28.375) = 1.625
2.641
7
21
(21 – 28.375) = -7.375
54.391
8
32
(32 – 28.375) = 3.62513.141
Σ = 227
Σ = 0.000 Σ = 383.878
Let’s calculate the mean as our measure of central tendency by
adding the individual ages of each officer and dividing by the
number of officers. The calculation is 227/8 = 28.375 year
It is important to note from this calculation that the sum of the
deviations of each score from the mean is equal to zero. When
doing hand calculations of the mean deviation, variance, and
standard deviation this is an excellent place to check your math.
If the sum of the deviations of each score from the mean does
not equal zero (or a number very, very close to zero in
situations when you are rounding decimal places) then you have
made a mathematical error either in your subtractions or your
calculation of the mean.
Then, subtract the mean from every number to get the list of
9. deviations. Create a list of these numbers. It's OK to get
negative numbers here.
Next, square the resulting list of numbers (read: multiply them
with themselves) to get the squared deviation.
Add up all of the resulting squares to get their total sum.
To get the standard deviation, Divide your result by one less
than the number of items in the list and just take the square root
of the resulting number.
It is important for you to note that the last step in the
calculation of the variance that I have described requires you to
reduce the sample size by 1. This is done because we are using
a sample rather than the entire population of officers. If our
data is the actual entire population we would not subtract 1
from N. We would simply divide by the size of the entire
population.. In this example the eight observations of the age
of police officers is a sample from the total population of police
officers in this jurisdiction. By subtracting 1 from the sample
size (N – 1) we are adjusting the final value of the variance (s2)
resulting in a value that is larger than if we were divide by N.
When using sample data it is better to overstate the measure of
dispersion than to understate it.
STANDARD DEVIATION - s
The standard deviation (s) is very simple to calculate.
In our example with the sample of 8 police officers and their
Age, the standard deviation is:
S = √383.878/7 s = √54.84 s = 7.41years
10. The standard deviation has another very important advantage
over the other measures of dispersion in that we are able to use
the standard deviation to estimate the number of variable values
within certain areas under the curve representative of those
values.
Using the Standard Deviation
I have posted a pdf file under the “notes” link that outlines the
areas falling under/within the normal curve/bell curve (or
normal distribution).Please refer to that graphical display for
the remainder of this discussion.
Please read:
Our calculated mean is 28.375 years. When using the normal
curve/bell curve (or normal distribution) to represent our
variable we would place the mean, 28.375 years at the center of
the distribution above X bar.
The numbers that correspond to the “-1s” and “+1s” are 20.965
(mean – standard deviation) and 35.785 (mean + standard
deviation) respectively. These numbers are calculated by
adding one standard deviation unit (7.41 years) to the mean of
28.375 years and subtracting one standard deviation unit (7.41
years) from the mean of 28.375 years. This represents the range
of ages between which we would expect to find approximately
68.26% of the total population of police officers in this
jurisdiction. We would expect approximately 34.13% to have
an age between 20.965 years and 28.375 years. Similarly, we
would expect approximately 34.13% to have an age between
28.375 years and 35.785 years.
11. The numbers that correspond to the “-2s” and “+2s” are 13.555
and 43.195 respectively. These numbers are calculated by
adding two standard deviation units (7.41 years x 2 = 14.82
years) to the mean of 28.375 years and subtracting two standard
deviation units (7.41 years x 2 = 14.82 years) from the mean of
28.375 years. This represents the range of ages between which
we would expect to find approximately 95.44% of the total
population of police officers in this jurisdiction. We would
expect approximately 47.72% to have an age between 13.55
years and 28.375 years. Similarly, we would expect
approximately 47.72% to have an age between 28.375 years and
43.195 years.
The numbers that correspond to the “-3s” and “+3s” are 6.145
and 50.605 respectively. These numbers are calculated by
adding three standard deviation units (7.41 years x 3 = 22.23
years) to the mean of 28.375 years and subtracting three
standard deviation units (7.41 years x 3 = 22.23 years) from the
mean of 28.375 years. This represents the range of ages
between which we would expect to find approximately 99.74%
of the total population of police officers in this jurisdiction.
We would expect approximately 49.87% to have an age between
6.145 and 28.375 years. Similarly, we would expect 49.87% to
have an age between 28.375 years and 50.605 years.
More examples for practice:
STEP 1: Find the Mean for the distribution.
X
9
Mean= ΣX/N
8
= 30/6 = 5
12. 6
4
2
1
ΣX = 30
STEP 2: Subtract the Mean from each raw score to get the
DEVIATION.
X (X- Mean)-Deviation
9 (9-5) +4
8 (8-5) +3
6 (6-5) +1
4 (4-5) -1
2 (2-5) -3
1 (1-5) -4
STEP 3: Square each deviation before adding the SQUARED
DEVIATIONS TOGETHER.
X (X- Mean)-Deviation (X-
Mean) ² - Squared Deviation
9 (9-5) +4
16
8 (8-5) +3
9
13. 6 (6-5) +1
1
4 (4-5) -1
1
2 (2-5) -3
9
1 (1-5) -4
16
Σ(X- Mean) ² = 52
STEP 4: Divide by N-1 and get the SQUARE ROOT OF THE
RESULT FOR THE STANDARD DEVIATION.
S= √52/5 s= √10.4 s= 3.22
More examples:
The following data represent the number of crime calls at “Hot
Spots” in a year. Calculate and interpret the standard deviation
of crime calls at these hot spots.
Hot Spot Number # of Calls (X- Mean)-Deviation
(X- Mean) ² -Squared Deviation
1
2 -19.5
380.25
2 9 -12.5
156.25
3 11 -10.5
110.25
4 13 -8.5
72.25