2. Chapter 3:
Describing, Exploring, and Comparing Data
3.1 Measures of Center
3.2 Measures of Variation
3.3 Measures of Relative Standing and Boxplots
2
Objectives:
1. Summarize data, using measures of central tendency, such as the mean, median, mode,
and midrange.
2. Describe data, using measures of variation, such as the range, variance, and standard
deviation.
3. Identify the position of a data value in a data set, using various measures of position,
such as percentiles, deciles, and quartiles.
4. Use the techniques of exploratory data analysis, including boxplots and five-number
summaries, to discover various aspects of data
3. Recall: 3.1 Measures of Center
Measure of Center (Central Tendency)
A measure of center is a value at the center or
middle of a data set.
1. Mean: π₯ =
π₯
π
, π =
π₯
π
, π₯ =
πβπ₯ π
π
2. Median: The middle value of ranked data
3. Mode: The value(s) that occur(s) with the
greatest frequency.
4. Midrange: ππ =
πππ+πππ₯
2
5. Weighted Mean: π₯ =
π€βπ₯
π€
3
4. Key Concept: Variation is the single most important topic in statistics.
This section presents three important measures of variation: range, standard
deviation, and variance.
3.2 Measures of Variation
4
1. Range = Max - Min
2. Variance
3. Standard Deviation
4. Coefficient of Variation
5. Chebyshevβs Theorem
6. Empirical Rule (Normal)
7. Range Rule of Thumb for
Understanding Standard Deviation
π β
π ππππ
4
& Β΅ Β± 2Ο
1 β 1/k2
Use CVAR to compare
variabiity when the units are
different.
100%
s
CVAR
X
ο½ ο
5. 5
Example 1: Two brands of outdoor paint are tested to see how long each
will last before fading. The results (in months) for a sample of 6 cans are
shown. Find the mean and range of each group.
a. Find the mean and range of each group.
b. Which brand would you buy?
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
210
Brand A: 35, 60 10 50
6
x
x R
n
ο½ ο½ ο½ ο ο½ο½ ο₯
210
35
Brand B: 6
45 25 20
x
x
n
R
ο½ ο½
ο½ ο ο½
ο½ ο₯ The average for both brands is the same, but the range
for Brand A is much greater than the range for Brand B.
Which brand would you buy?
π = πππ₯ β πππ, π₯ =
π₯
π
, π =
π₯
π
, π =
(π₯β π₯)2
πβ1
Range = Maximum data value β Minimum data value
3.2 Measures of Variation
The range uses only the maximum and the minimum data values, so it is very sensitive
to extreme values. Therefore, the range is not resistant, it does not take every value
into account, and does not truly reflect the variation among all of the data values.
6. Variance & Standard Deviation
6
3.2 Measures of Variation
The variance is the average of the squares of the distance each value is from the mean.
The standard deviation is the square root of the variance.
The standard deviation is a measure of how spread out your data are and how much data values deviate
away from the mean.
Notation
s = sample standard deviation
Ο = population standard deviation
Usage & properties:
1. To determine the spread of the data.
2. To determine the consistency of a variable.
3. To determine the number of data values that fall within a specified interval in a distribution (Chebyshevβs
Theorem).
4. Used in inferential statistics.
5. The value of the standard deviation s is never negative. It is zero only when all of the data values are exactly the
same.
6. Larger values of s indicate greater amounts of variation.
11. Range Rule of Thumb for Understanding Standard Deviation
The range rule of thumb is a crude but simple tool for understanding and
interpreting standard deviation. The vast majority (such as 95%) of sample
values lie within 2 standard deviations of the mean.
11
Variance & Standard Deviation3.2 Measures of Variation
Unusual:
Significantly low values are Β΅ β 2Ο or lower.
Significantly high values are Β΅ + 2Ο or higher.
Usual:
Values not significant are between (Β΅ β 2Ο ) and (Β΅ + 2Ο).
Range Rule of Thumb for Estimating a Value of the Standard Deviation
To roughly estimate the standard deviation from a collection of known sample data
(when the distribution is unimodal and approximately symmetric), use: π β
π ππππ
4
12. The Empirical Rule
The empirical rule states that for
data sets having a distribution that
is approximately bell-shaped, the
following properties apply.
β’ About 68% of all values fall within 1
standard deviation of the mean.
β’ About 95% of all values fall within 2
standard deviations of the mean.
β’ About 99.7% of all values fall within 3
standard deviations of the mean.
12
3.2 Measures of Variation
13. Example 5
13
IQ scores have a bell-shaped distribution with a mean of 100 and a
standard deviation of 15. What percentage of IQ scores are between 70 and
130?
130 β 100 = 30 & 100 β 70 = 30
The empirical rule: About 95% of all IQ scores are between 70 and 130.
30
π
=
30
15
= 2
Example 6
Use Range Rule of Thumb to approximate the lowest value and the highest value in a
data set where π₯ = 10 & π = 12.
Β΅ Β± 2Ο
4
R
s ο»
12
3
4
ο½ ο½
π₯ Β± 2π = 10 Β± 2(3)
πΏππ€ = 4 & βπ = 16
π = πππ₯ β πππ, π₯ =
π₯
π
, π =
π₯
π
, π β
π
4
, π =
(π₯β π₯)2
πβ1
, π =
(π₯βπ)2
π
14. 14
Chebyshevβs Theorem3.2 Measures of Variation
The proportion of values from any data set that fall within k standard deviations of the
mean will be at least 1 β 1/k2, where k is a number greater than 1 (k is not necessarily
an integer).
# of standard
deviations, k
Minimum Proportion
within k standard
deviations
Minimum Percentage within k
standard deviations
2 1 β 1/4 = 3/4 75%
3 1 β 1/9 = 8/9 88.89%
4 1 β 1/16 = 15/16 93.75%
16. Comparing Variation in Different Samples or Populations
Coefficient of Variation
The coefficient of variation (or CV) for a set of nonnegative sample or population
data, expressed as a percent, describes the standard deviation relative to the mean,
and is given by the following:
The coefficient of variation is the standard deviation divided by the mean,
expressed as a percentage.
Use CVAR to compare standard deviations when the units are different.
16
Properties of Variance3.2 Measures of Variation
100%
s
CV
x
ο½ ο 100%CV
ο³
ο
ο½ ο
17. Example 9
17
3.2 Measures of Variation
The mean of the number of sales of cars over a 3-month period is 87, and
the standard deviation is 5. The mean of the commissions is $5225, and
the standard deviation is $773. Compare the variations of the two.
Commissions are more variable than sales.
5
100% 5.7% Sales
87
CVar ο½ ο ο½
773
100% 14.8% Commissions
5225
CVar ο½ ο ο½
100%CV
ο³
ο
ο½ ο
18. Properties of Variance
The units of the variance are the squares of the units of the original data values.
The value of the variance can increase dramatically with the inclusion of
outliers. (The variance is not resistant.)
The value of the variance is never negative. It is zero only when all of the data
values are the same number.
The sample variance sΒ² is an unbiased estimator of the population variance ΟΒ².
18
3.2 Measures of Variation
Why Divide by (n β 1)?
There are only n β 1 values that can be assigned without constraint. With a
given mean, we can use any numbers for the first n β 1 values, but the last value
will then be automatically determined.
With division by n β 1, sample variances sΒ² tend to center around the value of
the population variance ΟΒ²; with division by n, sample variances sΒ² tend to
underestimate the value of the population variance ΟΒ².
19. Biased and Unbiased Estimators
The sample standard deviation s is a biased estimator of the population standard
deviation s, which means that values of the sample standard deviation s do not
tend to center around the value of the population standard deviation Ο.
The sample variance sΒ² is an unbiased estimator of the population variance ΟΒ²,
which means that values of sΒ² tend to center around the value of ΟΒ² instead of
systematically tending to overestimate or underestimate ΟΒ².
19
2
2
2
2
2
)(
, οο³ο³ο οο½ο
ο
ο½ο½
ο₯ο₯
ο₯
ο₯
N
fm
N
N
mf
fm
N
mf
Recall for a Grouped Data: m is the Midpoint of a class