Summarizing distributions of
1. Measuring center: median, mean
2. Measuring spread: range, interquartile
range, standard deviation
3. Measuring position: quartiles, percentiles,
standardized scores (z-scores)
4. Using boxplots
5. The effect of changing units on summary
When describing the “center” of a set of
data, we can use the mean or the median.
Mean: “Average” value
Median: “Center” value (Q2)
Where is the Center of the
If you had to pick a single number to describe
all the data what would you pick?
It’s easy to find the center when a histogram is
unimodal and symmetric—it’s right in the
On the other hand, it’s not so easy to find the
center of a skewed histogram or a histogram
with more than one mode.
To find the mean
of a set of
their values and
divide by the
Find the mean of:
2 3 4 6 8 12
Although the mean is the most popular
measure of center, it is not always the most
The mean is very sensitive to extreme
Because outliers affect the mean, we say
that the mean is NOT a resistant measure of
So if the mean is not a resistant measure of
center, what is? Median
The median is the value with
exactly half the data values
below it and half above it.
It is the middle data value
once the data values have
been ordered) that divides
the histogram into two
It has the same units as
The median is not
influenced by extreme
observations, so we say
that the median is a
resistant measure of
Finding the Median
First sort the values (arrange them in order),
then follow one of these:
1. If the number of data values is even, the
median is found by computing the mean of
the two middle numbers.
2. If the number of data values is odd, the
median is the number located in the exact
middle of the list.
5.40 1.10 0.42 0.73 0.48 1.10
0.42 0.48 0.73 1.10 1.10 5.40
(in order - even number of values – no exact middle shared by two numbers)
0.73 + 1.1 MEDIAN is 0.915
5.40 1.10 0.42 0.73 0.48 1.10 0.66
0.42 0.48 0.66 0.73 1.10 1.10 5.40
(in order - odd number of values)
exact middle MEDIAN is 0.73
Mean vs Median
Average value of variable Typical value of variable
Not resistant to outliers Resistant to outliers
A good measure when the data
A reliable measure regardless
of the shape of the distribution
Farther out in the long tail than
the median when data is
Close to the center even when
the data is skewed
Easy to find Less prone to mistakes
Interquartile Range (IQR)
Distance between largest and smallest values.
Range = Maximum – Minimum
Range is useful if there are no outliers.
How to find the IQR:
1. Find median
2. Find the median of both halves of data
the lower median is 1st
the upper median is 3rd
3. Subtract the two quartile scores
One general rule of thumb for identifying
outliers is finding any data points that lie:
Lower than 1.5 * IQR below Q1
Higher than 1.5 * IQR above Q3
Check For Understanding
• The “Descriptive Statistics” of test grades for a certain
class are listed below.
Mean = 74.71
Median = 76
Standard Deviation = 12.61
Minimum = 35
Maximum = 94
Q1 = 68
Q3 = 84
• (a) Determine the IQR for this data.
• (b) Using the answer from part (a), determine whether
the lowest and highest values in the data are outliers.
A standard deviation is a measure of the average
deviation from the mean.
(xi − x)2
If the data is uniform or symmetric use:
If the data is skewed, use:
Spread:Five-number summary, Range, IQR
Distributions with Outliers
Since outliers affect mean and standard
deviation, it is usually better to use median
However, if the distribution is unimodal—use
mean and median and just report outliers
However, if you find a simple reason for
outlier, eliminate it and use mean and
standard devation—if symmetric
• We can either use z-
Scores or percentiles to
declare the location of
an observation in a
• z-Scores use the mean
and standard deviation.
• Percentiles use a
position relative to the
• is the notation for
the kth percentile
• is the notation for
the nth quartile
P Q25 1=
P Q50 2= = median
P Q75 3=
If you are trying to find the percentile
corresponding to a certain score x:
number of scores <
total number of scores
Percentile = ×
• Percentiles are used often when reporting academic
scores such as SAT scores. Let’s say you get a 620 on
the math portion of the SAT. It might also indicate
that you are in the “78th percentile”. That means
that you scored better than 78% of all students
taking that particular SAT.
Measuring Relative Standing With
Standardized Values (z-Scores)
• One way to compare an individual to the whole
distribution is to describe it’s location in the
distribution relative to the mean.
• Let’s do this by describing how many standard
deviations an individual is away from the mean value.
• We call this the “standardized value,” or, the “z-
Here is how to interpret z-scores:
A z-score less than 0 represents an element less than
A z-score greater than 0 represents an element
greater than the mean.
A z-score equal to 0 represents an element equal to
A z-score equal to 1 represents an element that is 1
standard deviation greater than the mean; a z-score
equal to 2, 2 standard deviations greater than the
A z-score equal to -1 represents an element that is 1
standard deviation less than the mean; a z-score equal
to -2, 2 standard deviations less than the mean; etc.
The five-number summary of a distribution
consists of the smallest observation, the first
quartile, the median, the third quartile, and the
largest observation, written in order from
smallest to largest.
Minimum Q1 Median Q3 Maximum
The five-number summary divides the
distribution roughly into quarters. This leads
to a new way to display quantitative data, the
How to make a boxplot:
1. Draw and label a number line that includes
the range of the distribution.
2. Draw a central box from Q1 to Q3.
3. Note the median M inside the box.
4. Extend lines (whiskers) from the box out to
the minimum and maximum values that are
Effect of Changing Units
If you add a constant to every
value, the mean and median
increase by the same
Suppose you have a set of
scores with a mean equal to 5
and a median equal to 6. If
you add 10 to every score,
the new mean will be 5 + 10 =
15; and the new median will
be 6 + 10 = 16.
If you multiply every value
by a constant. Then, the
mean and the median will
also be multiplied by that
Assume that a set of scores
has a mean of 5 and a
median of 6. If you multiply
each of these scores by 10,
the new mean will be 5 * 10
= 50; and the new median
will be 6 * 10 = 60.
Sometimes, researchers change units (minutes to hours,
feet to meters, etc.). Here is how measures of central
tendency are affected when we change units:
Check For Understanding
The average score on a test is 150 with a
standard deviation of 15. Each score is then
increased by 25. What are the new mean and
Check For Understanding
The test grades from a college statistics class are shown
85 72 64 65 98 78 75 76 82 80 61 92 72 58 65 74 92 85 74 76 77 77
62 68 68 54 62 76 73 85 88 91 99 82 80 74 76 77 70 60
(a) Construct two different graphs of these data
(b) Calculate the five-number summary and the mean and
standard deviation of the data.
(c) Describe the distribution of the data, citing both the
and the summary statistics found in questions (a) and (b).
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.