It is a compromise in that if you drop none of the data you get an ordinary mean, while if you drop all the data but one value, you get the median.
Transcript
1.
Introduction to Statistics for Built
Environment
Course Code: AED 1222
Compiled by
DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)
CENTRE FOR FOUNDATION STUDIES (CFS)
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
2.
Lecture 7
Measures of central tendency
Today’s Lecture:
Measures of central tendency for grouped and
ungrouped data:
The arithmetic mean/trimmed mean
The median
The mode
Summary of comparative characteristics
3.
What is/are Measures of Central Tendency?
●Usually called the average with the purpose to
summarize in a single value: the typical size, middle
property, or central location of a set of values.
Measures of Central Tendency
●Measures of Central Tendency is a single value
situated at the centre of a data and can be taken
as a summary value for that data set.
●The three most common measures of central
tendency are the mean, median and mode.
5.
The arithmetic mean
●When people use the word average, they are usually
referring to the arithmetic mean.
●The arithmetic mean is the most commonly used measure
of central tendency.
●The mean is the sum of all scores/data divided by the
number of scores/data.
●Which is the best single number to describe a group of
scores.
●Called meu for population and x bar for sample mean.
What is/are Mean?
µ x
9.
●The same process in principle.
●However, since the compression of data in a
frequency table results in the loss of actual values of
the observations in each class, it becomes necessary
to make an assumption about these values.
The assumption is that every observation in a class
has a value equal to the class midpoint.
• Computing the Mean for grouped data
The arithmetic mean cont.
10.
No. of Liters sold No. of sales staff (f) Class midpoints (m) fm
80 and less than 90 2 85 170
90 and less than 100 6 95 570
100 and less than 110 10 105 1050
110 and less than 120 14 115 1610
120 and less than 130 9 125 1125
130 and less than 140 7 135 945
140 and less than 150 2 145 290
f 50 fm 5760
=x
= 5760/50
The arithmetic mean cont.
• Computing the Mean for grouped data
Example :
Formula of :
= 115.2 Liters sold
11.
●The mean is a good measure for roughly symmetric
distributions.
●Can be misleading in skewed distributions since it can
be greatly influenced by extreme values (outliers), and
thus it is not the most appropriate measure of central
tendency for very skewed distributions.
●This problem associated with the calculation of the
arithmetic mean can be overcome by relying on a slightly
modified measure of central tendency: the trimmed
mean.
The arithmetic mean cont.
12.
The trimmed mean
●The trimmed mean is calculated by “trimming” or dropping the
smallest and largest numbers from the data set and calculating
the mean of the remaining numbers.
There is no rule determining the number of values to be
trimmed. This rather depends on the data available.
For example, a 5% trimmed mean would be calculated by
dropping the smallest 5% and the largest 5% of the data set and
computing the mean for the remaining 90% of the original data.
●The trimmed mean is a compromise between the arithmetic
(ordinary) mean and the median. Why?
13.
The median
●The median is a measure of central tendency that
occupies/lies the middle position in an array of values.
Half (50%) the data items fall below the median,
and another half (50%) are above that value.
●The median position (not the median value) can be
found using the formula: i=(n+1)/2 or i=(1/2)n
where ‘n’ is the number of observations or values in a
data set.
What is/are Median?
17.
●If using the formula results in a noninteger
value, we take the average of the two nearest
numbers.
For example:
n=18,
based on the formula, the median position is:
i=(18+1)/2=9.5,
in this case we take the average of the 9th
and 10th
values as the median of the data set.
The median cont.
More Example:
19.
• Computing the Median for grouped data
The median cont.
●Since the actual values of a data set are lost
when a distribution is constructed, it is only
possible to approximate the median value for
grouped data.
●The median for grouped data can be estimated
using the following formula:
20.
Where:
Bl = lower boundary of class containing median
n = sample size
cfp= cumulative frequency of classes preceding
class containing the median
fm = number of observations in class containing
the median
i = width of the interval containing the median
Computing the median for grouped data cont.
i)
f m
cf p
2
n
(+Bl=Med
Formula of :
21.
No. of Liters sold No. of sales staff (f) Cumulative frequency
(cf)
80 and less than 90 2 2
90 and less than 100 6 8
100 and less than 110 10 18
110 and less than 120 14 32
120 and less than 130 9 41
130 and less than 140 7 48
140 and less than 150 2 50
Computing the median for grouped data cont.
Compute the median for the above data set.
22.
i)
f m
cf p
2
n
(+Bl=Med
10
14
18
50
110 )

2(+=
10
14
7
110 )(+=
105.0110 )(+=
= 115 Liters sold
5110+=
Answer:
23.
The mode
●The mode is the most commonly occurring value
in a data set.
A distribution may have one mode, two modes (bi
modal) or more modes (multimodal). It is also
possible for a distribution to have no mode.
●The mode may be an important measure to a
clothing manufacturer who must decide how many
dresses of each size to make. What is most
commonly purchased size?
What is/are Mode?
26.
Estimating the mode for grouped data
●When actual data values are unknown, the class
in a distribution with the largest frequency is often
referred to as the modal class.
●The mode may then be defined to be the mid
point of that class.
●If two or more classes share the distinction of
having the largest frequency, then there are two or
more midpoint values representing two or more
modes.
27.
Where:
L = lower boundary of class containing the mode
f0 = frequency of class containing the mode
f1 = frequency of class preceding the class containing
the mode
f2= frequency of class after the class containing the
mode
c = size of the class containing the mode
Computing the mode for grouped data cont.
Formula of :
c
ffff
ff
+L=Mode
+ )20()10(
10
28.
No. of Liters sold No. of sales staff (f) Cumulative frequency
(cf)
80 and less than 90 2 2
90 and less than 100 6 8
100 and less than 110 10 18
110 and less than 120 14 32
120 and less than 130 9 41
130 and less than 140 7 48
140 and less than 150 2 50
Estimating the mode for grouped data cont.
Estimate the mode for the above data
32.
Which measure to use?
●Not all measures are appropriate for all kinds of variables.
●Nominal data (e.g. gender, race)>> mode is the only valid
measure.
●Ordinal data (e.g. salary categories)>> mode & median can be
used.
• When to use the arithmetic mean?
– The best measure for continuous data.
• When to use the median?
– When you know that a distribution is skewed.
– When you have a small number of subjects.
• When to use the mode?
– Only when describing discrete categorical data.
34.
Summary of comparative characteristics
The arithmetic mean:
1. It is the most familiar and most widely used measure.
2. It is a measure that is affected by the value of every
observation in the data set.
3. Its value may be distorted too much by a relatively few
extreme values (outliers). And thus can lose its
representative quality in badly skewed data. The
trimmed mean can help overcome such a problem.
4. It can not be computed from a frequency distribution
with an open ended class.
35.
The median:
1. It is easy to define and easy to understand.
2. It is affected by the number of observations but not by
the values of these observations. Thus extremely high
or low values (outliers)do not distort the median.
3. It is frequently used in badly skewed distributions.
4. It may be computed in an openended distribution,
since the median value is located in the median class
interval which is highly unlikely to be an openended
interval.
Summary of comparative characteristics
36.
The mode:
1. It is generally a less widely used measure than the
mean and median.
2. It may not exist in some sets of data, or there may be
more than one mode in other data sets.
3. It is not affected by extreme values (outliers) in a
distribution.
Summary of comparative characteristics
37.
Next class…
The following topics will be discussed:
Measures of variability / dispersion (Part I):
The range
Quartiles & the Interquartile range
Percentiles
The five number summary