This document discusses measures of variability and dispersion in descriptive statistics. It defines variability as how scores differ from each other or from the mean. Four measures of dispersion are discussed: range, mean deviation, variance, and standard deviation. Standard deviation is described as the average distance from the mean and the most commonly used measure. Examples are provided to demonstrate how to calculate standard deviation step-by-step. The standard deviation is then used to estimate what percentage of values fall within certain ranges from the mean based on the normal distribution curve.
1. These is info only ill be attaching the questions work CJ 301 –
Measures of Dispersion/Variability
Think back to the description of
measures of central tendency
that describes these statistics as measures of how the data in a
distribution are clustered, around what summary measure are
most of the data points clustered.
But when comes to descriptive statistics and describing the
characteristics of a distribution, averages are only half story.
The other half is measures of variability.
In the most simple of terms, variability reflects how scores
differ from one another. For example, the following set of
scores shows some variability:
7, 6, 3, 3, 1
The following set of scores has the same mean (4) and has less
variability than the previous set:
3, 4, 4, 5, 4
The next set has no variability at all – the scores do not differ
from one another – but it also has the same mean as the other
2. two sets we just showed you.
4, 4, 4, 4, 4
Variability (also called spread or dispersion) can be thought of
as a measure of how different scores are from one another. It is
even more accurate (and maybe even easier) to think of
variability as how different scores are from one particular score.
And what “score” do you think that might be? Well, instead of
comparing each score to every other score in a distribution, the
one score that could be used as a comparison is – that is right-
the mean. So, variability becomes a measure of how much each
score in a group of scores differs from the mean.
Remember what you already know about computing averages –
that an average (whether it is the mean, the median or the mode)
is a representative score in a set of scores. Now, add your new
knowledge about variability- that it reflects how different
scores are from one another. Each is important descriptive
statistic. Together, these two (average and variability) can be
used to describe the characteristics of a distribution and show
how distribution differ from one another.
Measures of dispersion/variability
describe how the data in a distribution a
re scattered or dispersed around, or from, the central point
represented by the measure of central tendency.
3. We will discuss
four different measures of dispersion
, the
range
, the
mean deviation
, the
variance
, and the
standard deviation
.
RANGE
The
range
is a very simple measure of dispersion to calculate and
interpret.
The
range
is simply the difference between the highest score and the
lowest score in a distribution.
Consider the following distribution that measures the “Age” of
a random sample of eight police officers in a small rural
jurisdiction.
4. Officer
X = Age_
41
20
35
25
23
30
21
32
First, let’s calculate the mean as our measure of central
tendency by adding the individual ages of each officer and
dividing by the number of officers.
5. The calculation is 227/8 = 28.375 years.
In general, the formula for the range is:
R=h-l
Where:
r is the range
h is the highest score in the data set
l is the lowest score in the data set
The
range
of this distribution would be the difference between 41 and 20,
or
21 years
.
The variable age has a range of 21 years.
We can say that this sample has a mean age of 28.375 years
with a
range
of 21 years ranging from 20 to 41 years. Although the
range
was quite easy to calculate using the “eyeball” technique if
there was a sample of 1,000 officers from Phoenix, Arizona the
eyeball technique would be more difficult to use and we would
resort to the computer to request the
range
and it would be very quickly and accurately computed and
reported to us and we could have a very quick sense of how the
data in the variable Age are dispersed.
6. Take the following set of scores, for example:
98, 86, 77, 56, 48
In this example, 98-48=50. The range is 50. In a set of 500
numbers, where the largest is 98 and the smallest is 37, then the
range would be 61.
The range is used almost exclusively to get a very general
estimate of how wide or different scores are from one another –
that is, the range shows how much spread there is from the
lowest to the highest point in a distribution.
STANDARD DEVIATION
Now we get to the most frequently used measure of variability,
the standard deviation. Just think about what the term implies;
it’s a deviation from something (guess what?) that is standard.
Actually, the standard deviation (sd) represents the average
amount of variability in a set of scores. In practical terms, it’s
the average distance from the mean. The larger the standard
deviation, the larger the average distance each data point is
from the mean of the distribution.
LET’S LEARN HOW TO CALCULATE THE STANDARD
DEVIATION:
1.
First, you need to determine the mean
.
The mean of a list of numbers is the sum of those numbers
divided by the quantity of items in the list (read: add all the
numbers up and divide by how many there are).
2.
Then, subtract the mean from every number to get the list of
7. deviations
. Create a list of these numbers. It's OK to get negative numbers
here.
3. Next, square the resulting list of numbers
(read: multiply them with themselves) to get the squared
deviation.
4.
Add up all of the resulting squares to get their total sum.
5.
To get the standard deviation
,
Divide your result by one less than the number of items in the
list and just take the square root of the resulting number
I know this sounds confusing, but just check out several
examples below and practice each of them and you will be able
to calculate the standard deviation easily:
Example 1:
Your list of numbers:
X
(X – Mean)
(X- Mean) ²
1
(1- 7) = - 6
36
3
(3-7) = -4
16
9. å = 210
Explanations below:
1. Mean: (1+3+4+6+9+19) / 6 = 42 / 6 = 7
2. List of deviations: -6, -4, -3, -1, 2, 12
3. Squares of deviations: 36, 16, 9, 1, 4, 144
4. Sum of deviations: 36+16+9+1+4+144 = 210
5. Standard Deviation =
S =
√ 210 / 5 = √ 42 = 6.48
Explanation:
divided by one less than the number of items in the list: 210 / 5
= 42
10. Square root of this number: square root (42) = about 6.48
LET’S GO BACK TO OUR FIRST EXAMPLE (WHEN WE
CALCULATED THE RANGE):
Consider the following distribution that measures the “Age” of
a random sample of eight police officers in a small rural
jurisdiction.
Officer
X = Age_
(X – Mean) -Deviation
(X- Mean) ²
- Squared Deviation
41
(41 – 28.375) = 12.625
159.391
2
20
(20 – 28.375) = -8.375
13. Σ = 0.000
Σ = 383.878
Let’s calculate the mean as our
measure of central tendency by adding the individual ages of
each officer and dividing by the number of officers.
The calculation is 227/8 = 28.375 year
It is important to note from this calculation that the sum of the
deviations of each score from the mean is equal to zero.
When doing hand calculations of the
mean deviation
,
variance
, and
standard deviation
this is an excellent place to check your math.
If the sum of the deviations of each score from the mean does
not equal zero (or a number very, very close to zero in
situations when you are rounding decimal places) then you have
made a mathematical error either in your subtractions or your
calculation of the
mean
.
Then, subtract the mean from every number to get the list of
deviations
14. . Create a list of these numbers. It's OK to get negative numbers
here.
Next, square the resulting list of numbers
(read: multiply them with themselves) to get the squared
deviation.
Add up all of the resulting squares to get their total sum.
To get the standard deviation
, Divide your result by one less than the number of items in the
list and just take the square root of the resulting number.
It is important for you to note that the last step in the
calculation of the
variance
that I have described requires you to reduce the sample size by
1.
This is done because we are using a
sample
rather than the entire population of officers.
If our data is the actual entire
population
we would not subtract 1 from N.
We would simply divide by the size of the entire
population.
.
In this example the eight observations of the age of police
officers is a sample from the total population of police officers
in this jurisdiction.
By subtracting 1 from the sample size (N – 1) we are adjusting
the final value of the
15. variance (s
2
)
resulting in a value that is larger than if we were divide by N.
When using sample data it is better to overstate the measure of
dispersion than to understate it.
STANDARD DEVIATION - s
The
standard deviation (s)
is very simple to calculate.
In our example with the sample of 8 police officers and their
Age, the
standard deviation
is:
S =
√383.878/7
s = √54.84
s = 7.41years
The
16. standard deviation
has another very important advantage over the other measures
of dispersion in that we are able to use the
standard deviation
to estimate the number of variable values within certain areas
under the curve representative of those values.
Using the Standard Deviation
I have posted a pdf file under the “notes” link that outlines the
areas falling under/within the
normal curve/bell curve (or normal distribution).
Please refer to that graphical display for the remainder of this
discussion.
Please read:
Our calculated mean is 28.375 years.
When using the
normal curve/bell curve (or normal distribution)
to represent our variable we would place the mean, 28.375
years at the center of the distribution above X bar.
The numbers that correspond to the
“-1s”
and
“+1s”
are
20.965
(mean – standard deviation) and
35.785
(mean + standard deviation) respectively.
These numbers are calculated by adding one standard deviation
17. unit (7.41 years) to the mean of 28.375 years and subtracting
one standard deviation unit (7.41 years) from the mean of
28.375 years.
This represents the range of ages between which we would
expect to find approximately 68.26% of the total population of
police officers in this jurisdiction.
We would expect approximately 34.13% to have an age between
20.965 years and 28.375 years.
Similarly, we would expect approximately 34.13% to have an
age between 28.375 years and 35.785 years.
The numbers that correspond to the
“-2s”
and
“+2s”
are
13.555
and
43.195
respectively.
These numbers are calculated by adding two standard deviation
units (7.41 years x 2 = 14.82 years) to the mean of 28.375 years
and subtracting two standard deviation units (7.41 years x 2 =
14.82 years) from the mean of 28.375 years.
This represents the range of ages between which we would
expect to find approximately 95.44% of the total population of
police officers in this jurisdiction.
We would expect approximately 47.72% to have an age between
13.55 years and 28.375 years.
18. Similarly, we would expect approximately 47.72% to have an
age between 28.375 years and 43.195 years.
The numbers that correspond to the
“-3s”
and
“+3s”
are
6.145
and
50.605
respectively.
These numbers are calculated by adding three standard
deviation units (7.41 years x 3 = 22.23 years) to the mean of
28.375 years and subtracting three standard deviation units
(7.41 years x 3 = 22.23 years) from the mean of 28.375 years.
This represents the range of ages between which we would
expect to find approximately 99.74% of the total population of
police officers in this jurisdiction.
We would expect approximately 49.87% to have an age between
6.145 and 28.375 years.
Similarly, we would expect 49.87% to have an age between
28.375 years and 50.605 years.
More examples for practice:
STEP 1: Find the Mean for the distribution.
19. X
9
Mean=
ΣX/N
8
= 30/6 = 5
6
4
2
1
ΣX = 30
STEP 2: Subtract the Mean from each raw score to get the
DEVIATION.
X
(X- Mean)-Deviation
9
23. (X- Mean) ²
= 52
STEP 4:
Divide by N-1
and
get the SQUARE ROOT OF THE RESULT FOR THE
STANDARD DEVIATION
.
S=
√52/5
s= √10.4
s= 3.22
More examples:
The following data represent the number of crime calls at “Hot
Spots” in a year. Calculate and interpret the standard deviation
of crime calls at these hot spots.
Hot Spot Number
# of Calls
(X- Mean)-Deviation
(X- Mean) ²
-Squared Deviation
1
2