๏บPlease Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.1: Measures of Center
2. Chapter 3:
Describing, Exploring, and Comparing Data
3.1 Measures of Center
3.2 Measures of Variation
3.3 Measures of Relative Standing and Boxplots
2
Objectives:
1. Summarize data, using measures of central tendency, such as the mean, median, mode,
and midrange.
2. Describe data, using measures of variation, such as the range, variance, and standard
deviation.
3. Identify the position of a data value in a data set, using various measures of position,
such as percentiles, deciles, and quartiles.
4. Use the techniques of exploratory data analysis, including boxplots and five-number
summaries, to discover various aspects of data
3. Key Concept
Obtain a value that measures the center of a data set.
Present and interpret measures of center, including mean and median.
3.1 Measures of Center
Measure of Center (Central Tendency)
A measure of center is a value at the center or middle of a data set.
1. Mean
2. Median
3. Mode
4. Midrange
5. Weighted Mean 3
4. A measure of center is a value at the center or middle of a data set.
The Mean (or Arithmetic Mean) of a set of data is the measure of center found by
adding all of the data values and dividing the total by the number of data values.
3.1 Measures of Center
4
Important Properties of the Mean
โข Sample means drawn from the same population tend to vary less than other measures
of center.
โข The mean of a data set uses every data value.
โข A disadvantage of the mean is that just one extreme value (outlier) can change the
value of the mean substantially. (Using the following definition, we say that the mean
is not resistant.)
Resistant:
๏ A statistic is resistant if the presence of extreme values (outliers) does not cause it
to change very much.
5. Notation
โ denotes the sum of a set of data values.
x is the variable usually used to represent the individual data values.
n represents the number of data values in a sample.
N represents the number of data values in a population.
3.1 Measures of Center
ยต is pronounced โmuโ and is the mean of all values in a population.
The symbol ๐ฅ (x-bar) is used for sample mean.
๐ฅ =
๐ฅ
๐
=
๐1 + ๐2 + ๐3 + โฏ + ๐๐
๐
๐ =
๐ฅ
๐
=
๐1 + ๐2 + ๐3 + โฏ + ๐๐
๐
5
6. The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
3.1 Measures of Center
๐ =
20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30
9
=
276
9
= 30.7๐๐๐ฆ๐
6
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
๐ฅ =
๐ฅ
๐
โ
Example 1
7. Median
Median (MD): ๐ (x-Tilde): The median of a data set is the measure of
center that is the middle value when the original data values are arranged in
order of increasing (or decreasing) magnitude. In other words, the median is the
midpoint of the data array.
3.1 Measures of Center
To find the median: Firs sort the data in ascending order:
1. If the number of data values is odd, the median is the number located
in the exact middle of the sorted list.
2. If the number of data values is even, the median is found by
computing the mean of the two middle numbers in the sorted list. 7
Properties: The median does not change by large amounts when we include
just a few extreme values, so the median is a resistant measure of center.
The median does not directly use every data value. (For example, if the largest
value is changed to a much larger value, the median does not change.)
8. Given the data speeds: 38.5, 55.6, 22.4, 14.1, and
23.1 (all in megabits per second, or Mbps), find
a. The mean
b. The median
๐ฅ =
38.5 + 55.6 + 22.4 + 14.1 + 23.1
5
=
153.7
5
= 30.74๐๐๐๐
The median is: 23.1 Mbps.
๐. Sort: 14.1,22.4,23.1,38.5,55.6
8
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
Example 2
9. Mode
The mode (sometimes called the most typical case) of a data set is the value(s) that occur(s) with the greatest
frequency.
The mode can be found with qualitative data. When no data value is repeated, we say that there is no mode. There
may be no mode, one mode (unimodal), two modes (bimodal), or many modes (multimodal).
Mode, Midrange
3.1 Measures of Center
9
Midrange
The midrange of a data set is the measure of center that is the value midway between the maximum and minimum
values in the original data set. ๐ด๐ =
๐ด๐๐+๐ด๐๐
๐
Because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes so the
midrange is not resistant.
In practice, the midrange is rarely used, but it has three features:
1. The midrange is very easy to compute.
2. The midrange helps reinforce the very important point that there are several different ways to define the
center of a data set.
3. The value of the midrange is sometimes used incorrectly for the median, so confusion can be reduced by
clearly defining the midrange along with the median.
10. Given the data speeds: 38.5, 55.6, 22.4, 14.1, 23.1, 24.5
(all in megabits per second, or Mbps), find
a. The mean
b. The median
c. The mode
d. The midrange
10
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
, ๐๐ =
๐๐๐+๐๐๐ฅ
2
๐ฅ =
38.5 + 55.6 + 22.4 + 14.1 + 23.1 + 24.5
6
=
178.2
6
= 29.7๐๐๐๐
Sort:
14.1,22.4,23.1,24.5,38.5,55.6 ๐ฅ =
23.1 + 24.5
2
= 23.80
There is no mode, why?
๐๐ =
๐๐๐ + ๐๐๐ฅ
2
=
14.1 + 55.6
2
= 34.85๐๐๐๐
Example 3
11. a. Find the mode of these data speeds (in Mbps):
The mode is 0.3 Mbps, because it is the data speed occurring most often
(three times).
11
b. Mode? 0.3, 0.3, 0.6, 4.0, 4.0
c. Mode? 0.3, 1.1, 2.4, 4.0, 5.0
Two modes: 0.3
Mbps and 4.0 Mbps.
No mode because no
value is repeated.
0.2, 0.3, 0.3, 0.3, 0.6, 0.6, 0.7, 1.2
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
, ๐๐ =
๐๐๐+๐๐๐ฅ
2
Example 4
12. Rounding Rule:
3.1 Measures of Center
The mean, median, and midrange should be rounded to one more
decimal place than occurs in the raw data.
The mean, in most cases, is not an actual data value.
For the mode, leave the value as is without rounding (because values of
the mode are the same as some of the original data values).
12
Caution
Never use the term average when referring to a measure of center. The word
average is often used for the mean, but it is sometimes used for other
measures of center.
The term average is not used by statisticians, the statistics community, or
professional journals.
13. Use the frequency distribution to find the mean.
Time (seconds) Frequency f Class Midpoint x f ยท x
75 โ 124 11
125 โ 174 24
175 โ 224 10
225 โ 274 3
275 โ 324 2
Totals: โf = 50
13
Calculating the Mean from a Frequency Distribution
99.5
149.5
199.5
249.5
299.5
1094.5
3588.0
1995.0
748.5
599.0
โ(f ยท x) = 8025.0
๐ฅ =
๐ โ ๐๐
๐
=
8025
50
= 160.5
The result of ๐ฅ = 160.5 is an
approximation because it is based
on the use of class midpoint values
instead of the original data.
Example 5
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
, ๐ฅ =
๐โ ๐๐
๐
Mean from a Frequency Distribution: Multiply each frequency by its corresponding
class midpoint, add the products, and divide by sample size.
Mean from a Frequency Distribution:
Calculations is made by pretending that
all sample values in each class are equal
to the class midpoint.
14. Example 6
14
Class
Boundaries
Frequency
5.5 - 10.5
10.5 - 15.5
15.5 - 20.5
20.5 - 25.5
25.5 - 30.5
30.5 - 35.5
35.5 - 40.5
1
2
3
5
4
3
2
n = ๏f = 20
Midpoint
Xm
8
13
18
23
28
33
38
8
26
54
115
112
99
76
f ยทXm
๏ f ยทXm = 490
๐ฅ =
๐ โ ๐๐
๐
=
490
20
Calculating the Mean from a Frequency Distribution
= 24.5
๐ฅ =
๐ฅ
๐
, ๐ =
๐ฅ
๐
, ๐ฅ =
๐โ ๐๐
๐
Use the frequency distribution to find the
mean, the modal class, and the mode.
The modal class is: 20.5 โ 25.5
The mode is the midpoint
of the modal class:
(20.5 + 25.5) / 2 = 23
15. Example 7: Grade-Point Average
Find a GPA for a college student who took 5 courses and received: A (3 credits), A (4
credits), B (3 credits), C (3 credits), and F (1 credit). The grading system assigns quality
points to letter grades as follows: A = 4; B = 3; C = 2; D = 1; F = 0.
When different x data values are assigned different weights w, we can compute a
weighted mean.
To find the Weighted Mean of a variable:
Multiply each value by its corresponding weight and divide the sum of the
products by the sum of the weights.
15
Weighted Mean
3.1 Measures of Center
๐ฅ =
๐ค1๐1 + ๐ค2๐2 + โฏ + ๐ค๐๐๐
๐ค1 + ๐ค2 + โฏ + ๐ค๐
=
๐ค๐
๐ค
๐ฅ =
3(4) + 4(4) + 3(3) + 3(2) + 1(0)
3 + 4 + 3 + 3 + 1
Use the numbers of credits as
weights: w = 3, 4, 3, 3, 1.
Replace the letter grades of A,
A, B, C, and F with the
corresponding value quality
points: x = 4, 4, 3, 2, 0.
=
43
14
= 3.07
W X
3
4
3
3
1
4
4
3
2
0
16. 16
Weighted Mean
3.1 Measures of Center
Grade-Point Average
A student received the following grades. Find the corresponding GPA.
Course Credits, w Grade, X
English Composition 3 A (4 points)
Introduction to Psychology 3 C (2 points)
Biology 4 B (3 points)
Physical Education 2 D (1 point)
๐ฅ =
๐ค๐
๐ค
=
3 โ 4 + 3 โ 2 + 4 โ 3 + 2 โ 1
3 + 3 + 4 + 2
=
32
12
= 2.67
Example 8 ๐ฅ =
๐ค1๐1 + ๐ค2๐2 + โฏ + ๐ค๐๐๐
๐ค1 + ๐ค2 + โฏ + ๐ค๐
=
๐ค๐
๐ค
18. Properties of the Mean
๏Uses all data values.
๏Varies less than the median or mode
๏Used in computing other statistics, such as
the variance
๏Unique, usually not one of the data values
๏Cannot be used with open-ended classes
(A Class with no boundary, Examples: 20
or Less, 50 or more )
๏Affected by extremely high or low values,
called outliers
3.1 Measures of Center (Time)
18
Properties of the
Midrange
๏Easy to compute.
๏Gives the midpoint.
๏Affected by extremely
high or low values in a
data set
19. Properties of the Median
๏Gives the midpoint
๏Used when it is necessary to find
out whether the data values fall into
the upper half or lower half of the
distribution.
๏Can be used for an open-ended
distribution.
๏Affected less than the mean by
extremely high or extremely low
values.
3.1 Measures of Center
19
Properties of the Mode
๏Used when the most
typical case is desired
๏Easiest average to
compute
๏Can be used with
nominal data
๏Not always unique or
may not exist