2. WHAT IS A DISTRIBUTION?
Piece Distribution
• Location (typical value)
• Spread (span)
• Shape (symmetrical or skewed)
•
3. NUMERICAL MEASURES
Measures of central tendency – mean,
median and mode
Measures of dispersion/ variability/ spread-
range and standard deviation
Skewness and Kurtosis
4. Population : is the collection of all
elements we wish to study
Sample: is a part of the population
The measures in Descriptive Statistics are
defined and exist for the population
distribution; but typically applied in a
sample distribution to estimate them for
the population
5. MEASURES OF CENTRAL
TENDENCY
Mean- the ‘arithmetic’ centre
Median – the ‘positional’ centre
Mode – the ‘most frequently occurring’
centre
6. The Arithmetic Mean
The AM or simply the Mean is the arithmetic average
of all the data values in the distribution
Mean for the population is denoted by ‘m’
Mean for the sample is denoted by ‘x’
For ungrouped data:
m= SX ; x= SX N is size of the pop,
N n n is the size of sample
Grouped data:
m= SfX ; x= SfX
N n
7. Advantages and Disadvantages of
the Mean
The pluses:
It is easily understood
It is defined for all distributions and is uniquely
defined
It considers all the data values in the distribution
Combining means can be done as it is
mathematically computed
Among the measures of central tendency, it has the
least sampling variability
The minuses:
It cannot be used for discrete data
It is affected by extreme values
8. Points to note
The mean is in the same units as the
data
The mean can be a fraction
The mean does not denote the “most
likely“ value; it denotes that if the
process continues in the same way, it
will yield a total as if each data value is
equal to the mean
9. The Median
The Median is the positional centre of the distribution;
it is called the “positional average”
It is the value of the middle item after arranging the
data in ascending or descending order
For ungrouped data:
Median = (n+1)/2th item in the ordered array
For grouped data:
Median= L + (n/2 – pcf)* w/f,
where L is the lower limit of the median class, w is class
width, pcf is the cumulative frequency of the previous
class and f is the frequency of the median class
10. The pluses:
It is defined for all distributions and is rigidly defined
It is not affected by extreme values
It can be used for qualitative (ordinal) data
It is preferred to the mean when we are interested in
the positional centre and not the arithmetic accuracy
which the mean gives
The minuses:
Median is not arithmetically defined, hence medians
cannot be combined without re- ordering the data
It has a higher sampling variability than the mean
Advantages and Disadvantages of
the Median
11. Points to note
The median is in the same units as the
data
It denotes the middlemost value: 50% of
the data values are below the median
and 50% of the data values are above
the median
12. The Mode
The mode is the “most frequently occurring” data
point in the distribution. It is the “most popular” centre
of the distribution.
It is the data value with the maximum likelihood.
For ungrouped data,
Mode = the most repeated value
For grouped data,
Mode= L + d1/ (d1+ d2) * w,
where L is the lower limit of the heaviest class,
d1= frequency of modal class – frequency of previous
class, d2 = frequency of modal class – frequency of
next class, w = class width
13. The pluses:
It can be used for qualitative (both nominal and
ordinal scales) as well as quantitative data
It is used to find the most likely/ most popular data
(eg in the commercial sense)
It is not affected by extreme values in the distribution
The minuses:
It is not rigidly defined for a distribution- a distribution
may not have a mode or may have more than one
mode
The mode is not arithmetically defined, hence modes
cannot be combined
The sampling variability is higher than the mean or
Advantages and Disadvantages of
the Mode
14. Points to note
The mode is in the same units as the
data
It denotes the most likely value – the
value with the highest probability of
occurring
15. Relationship between the Mean,
Median and Mode
The values of Mean, Median and Mode can tell us the shape of
the distribution:
In a symmetrical distribution, Mean= Median= Mode
Mode=Median=Mean
In a positively skewed distribution, Mode<Median<Mean
Mode Med Mean
In a negatively skewed distribution, Mean<Median<Mode
Mean Med Mode
16. Other Measures of Central
Tendency
Weighted Mean: is used based on the relative
importance of each data value in the total
Weighted mean= SwiXi
Swi
Geometric Mean: is used to calculate the average
rate of change (also called CAGR)
GM= nth root of (Product of compounded growth)
17. Mid- range: is used to quickly estimate the mean
value when a large number of data points lie in a
small interval. It is typically used for calculating
average daily share price
Mid- range= (Min value + Max value)/2
It is affected by outliers
Mid- hinge: is used to estimate the mean by
considering the middle half of the data
Mid- hinge= (Q1+Q3)/2, where Q1 is the first Quartile
and Q3 is the third quartile
Other Measures of Central
Tendency
18. Other Measures of Location-
Fractiles
Fractiles are positional values dividing the ordered distribution in
specified fractions. Fractiles are calculated similarly to the median
The first Quartile Q1 divides the ordered data in the ratio 25: 75.
The third Quartile Q3 divides the ordered data in the ratio 75: 25
Q1 is the (n+1)/4th value in the ordered data. In grouped data,
Q1= L + (n/4-pcf)*w/f, where L is the lower limit of the Q1 class,
pcf is the cum frequency of the class previous to the Q1 class, w is
the class width and f is the frequency of the Q1 class
Q3 is the 3*(n+1)/4th value in the ordered data. In grouped data,
Q3= L + (3n/4-pcf)*w/f, where L, pcf and f mean the same with
reference to the Q3 class
Deciles and percentiles are calculated similarly