2. Measures of Central Tendency
While distributions provide an overall picture of
some data set, it is sometimes desirable to
represent some property of the entire data set
using a single statistic
The first descriptive statistic we will discuss are
those used to indicate where the ‘center’ of the
distribution lies.
The expected value
There are different measures of central
tendency, each with their own advantages and
disadvantages. We will limit to the discussion of
mode.
3. The Mode
The mode is simply the value of the relevant variable
that occurs most often (i.e., has the highest frequency) in
the data sample.
Note that if you have done a frequency histogram, you
can often identify the mode simply by finding the value
with the highest bar (since it gives the highest
frequency).
Mode is the only measure of central tendency that is
suitable for nominal data also.
4. Properties
If the random variable (or each value from the sample) is
subjected to the linear or affine transformation which
replaces X by aX+b, so are the mean, median and
mode.
In continuous unimodal distributions the median lies, as
a rule of thumb, between the mean and the mode, about
one third of the way going from mean to mode. In a
formula, median ≈ (2 × mean + mode)/3. This rule given
by Karl Pearson, although useful, is not always true and
in general the three statistics can appear in any order.
For unimodal distributions, the mode is within standard
deviations of the mean, and the root mean square
deviation about the mode is between the standard
deviation and twice the standard deviation.
5. Van Zwet condition
Van Zwet derived an inequality which provides sufficient
conditions for this inequality to hold.
The inequality Mode ≤ Median ≤ Mean
holds if F( Median - x ) + F( Median + x ) ≥ 1 for all x where
F() is the cumulative distribution function of the distribution.
6. Mode
Advantages
Very quick and easy to determine
Is an actual value of the data
Not affected by extreme scores
Disadvantages
Sometimes not very informative (e.g. cigarettes
smoked in a day)
Can change dramatically from sample to sample
Might be more than one (eg. Bimodal distributions).
Which among the two is more representative cannot
be ascertained.
7. Mode
Advantages
Very quick and easy to determine
Is an actual value of the data
Not affected by extreme scores
Disadvantages
Sometimes not very informative (e.g. cigarettes
smoked in a day)
Can change dramatically from sample to sample
Might be more than one (eg. Bimodal distributions).
Which among the two is more representative cannot
be ascertained.