Descriptive Analysis 
Group C 
MBM 1st Semester 
NCC
Agendas 
• Measure of Central Tendency 
• Measure of dispersion 
• Skewness & kurtosis. 
• Five number summary 
• Box- Whisker plot
Measures of Central Tendency 
Introduction 
 Single Data Represents a set of 
Data 
 Concentrates towards the middle of the 
distribution.
Various Measures of Central Tendency 
• Mean or Average 
 Simply Mean is the sum of all the observations divided by 
the number of observation (A.M, G.M, H.M, C.M,W.M) 
• Median 
 Median is the positional average of the given series of n 
observation arranged in an ascending or descending order 
of magnitude 
• Mode 
 The variate value that occurs most frequently is known as 
a mode. It is denoted by Mo
Arithmetic Mean ( or simply mean) 
Individual Series 
Direct Method 
X̅ = 
n = No. of observation 
ΣX = Sum of all observation 
Short Cut Method X̅ = (a + Σd)/n a= assumed mean, d= 
deviation from assumed 
mean = X-a 
n = total number of 
observation
Discrete Series 
Direct Method X̅ = Σfx/N 
Short cut method̅X̅ = a+ Σfd/N
• Continuous Series: The formula for continuous 
series is the same as for discrete series, the only 
difference is that the middle value of a class is to 
be taken as X in case of continuous series.
Median 
• Observation of the central of data set 
• Suitable when the average of qualitative 
• Appropriate for the open ended classified data.
Method of calculation 
series 
Individual 
Series 
•Ascending or 
Descending order 
Md= value of 
(n+1/2) th item 
Discrete Series • Calculation of c.f 
Md= (n+1)/2 
Continuous 
Series 
•Class should be 
exclusive 
Md=n/2 
Md=L+(n/2 - 
cf)/F*h
Mode 
• Dictionary meaning “ most used” 
• Value that occurs more often or with the greatest 
frequency
Mode 
Type of Series 
Individual the value which has 
maximum repetition. 
Discrete value of variate which has 
maximum frequency is the 
mode. 
Continuous
What is measures of dispersion? 
 Central tendency measures do not reveal 
the variability present in the data. 
 Dispersion is the scatteredness of the data 
series around it average. 
 Dispersion is the extent to which values 
in a distribution differ from the average 
of the distribution.
Why we need measures of dispersion? 
(Significance) 
 Determine the reliability of an average 
 Serve as a basis for the control of the 
variability 
 To compare the variability of two or more 
series and 
 Facilitate the use of other statistical measures.
Method of Measuring Dispersion 
1. Range 
2. Quartile Deviation 
3. Mean Deviation 
4. Standard Deviation 
These are called absolute measures of dispersion 
Absolute measures have the units in which the 
data are collected.
• Range 
The range is the simplest possible measure of 
dispersion and is defined as the difference between 
the largest and smallest values of the variable. 
In symbols, 
Range = L – S. 
Where, L = Largest value. 
S = Smallest value. 
Usually used in combination with other measures of 
dispersion.
Relative measure of dispersion is the ratio of a 
measure of dispersion to an appropriate average 
from which deviations were measured. 
The important relative measures of dispersion are 
• Coefficient of Range 
• Coefficient of Quartile Deviation 
• Coefficient of Mean Deviation 
• Coefficient of Standard Deviation
• In individual observations and discrete series, L and S 
are easily identified. In continuous series, the following 
two methods are followed: 
Method 1: 
L = Upper boundary of the highest class 
S = Lower boundary of the lowest class. 
Method 2: 
L = Mid value of the highest class. 
S = Mid value of the lowest class. 
Coefficient of Range : 
• Range is an absolute value, so it cannot compare two 
distribution with different units. 
• For the comparison of such distribution coefficient of 
range is used. 
• In Symbol , Coefficient of Range= (L-S) / (L+S)
• Quartile Deviation 
Quartile Deviation is half of the difference between the first and 
third quartiles. Hence, it is called Semi Inter Quartile Range. 
In Symbols, among the quartiles Q1, Q2 and Q3, the range Q3- 
Q1 is called inter quartile range and (Q1-Q3)/2 is quartile 
deviation or semi inter quartile range. 
Coefficient of Quartile Deviation : The relative measure based on 
lower and upper quartile is known as coefficient of Q.D. 
Q.D= (Q3-Q1)/(Q3+Q1)
Mean Deviation 
Measures the ‘average’ distance of each observation away 
from the mean of the data . 
Deviation from A.M, Median and Mode 
Generally more sensitive than the range or interquartile range, 
since a change in any value will affect it.
• Formula for calculating Mean Deviation 
~ Mean Deviation from Mean = Σf|x - X̅ |/n 
~ Mean Deviation from Median = Σf | X- median|/n 
~ mean Deviation from Mode = Σf | X- mode|/n 
Note: Frequency (f) does not mention in individual series.
Standard Deviation 
• Standard Deviation is Standard Deviation 
• Positive square of the arithmetic mean of the square of 
the deviation taken from the A.M. 
• The most common and best measure of dispersion 
• Takes into account every observation
Basic Formula of standard Deviation 
SD ( ϭ) = Σ(x-x )2 
N 
• The square of standard deviation is called the 
variance.
Coefficient of variation 
– Compare the variability between two set of data 
– expressed as a percentage rather than in terms of the units 
of the particular data 
Formula for coefficient of variation (CV): 
CV = ϭ / X̅ * 100
Skewness 
• Lack of Symmetry. 
• According to distribution of data, Skewness is 
used to measure the shape drawn from 
frequency distribution. 
• Relates to the shape of the curve.
For example 
set A set B 
variable (X) frequency (f) variable (X) frequency (f) 
10 5 10 5 
15 15 15 20 
20 30 20 15 
25 30 25 45 
30 15 30 10 
35 5 35 5 
Total 100 100
• In set A & B of the above Example both have same mean , x= 
22.5 or standard deviation =6.02 the curve drawn for both 
cases shows that they have different shapes. Following are the 
shape of the curve for set A & Set B. 
The curve of set A is non skewed or normal curve.
The curve of Set B 
The shape of the frequency distribution is skewed .
A distribution of data said to be skewed 
When????? 
If….. 
• Arithmetic mean≠ median≠ mode 
• Quartiles are not equi -distant from the median 
• The curve drawn from the frequency distribution 
isn't of bell shape type.
Types of Skewness 
According to the view of elongation of the tail of the curve of the 
frequency distribution are as follow. 
• No Skewness or symmetry 
• Positive Skewness 
• Negative Skewness
No Skewness 
• Distribution of the data said to be no skewed if the curve 
drawn from the data is Neither elongated more to the left nor 
to the right side. 
• The curve equally elongated to the right as well as to the left 
side 
• if Mean= median= mode
Positive Skewness 
• A distribution of the data is said to have positive skewness or 
right skewed if the curve drawn from the data is more 
Elongated to the right side 
• Mean Median Mode
Negative Skewness 
• A distribution of data is said to have negative Skewness of left 
skewed if the curve drawn from the data is more elongated to 
the left side 
• Mean Median  Mode
Measures of Skewness 
• Absolute Measure :- It express in terms of 
original units of the data so it is not appropriate. 
• Relative measure:- It relates with the consistency 
it doesn’t contain any units of the data.
Relative methods of measuring 
Skewness 
• Karl Pearson’s measure of Skewness 
• Bowley’s measure of Skewness 
• Kelly’s measure of Skewness
Pearson’s measure of Skewness 
Absolute measure of Skewness not in widely used , expressed in 
the terms of original unit of data . 
a) Skewness= mean- mode 
b) Skewness= mean- median 
The relative measure of Skewness is coefficient of Skewness & 
frequently used. 
If mode is defined: Sk= mean- mode / S.D. 
If mode is ill defined: Sk = 3(mean- median)/ S.D. 
Pearson’s coefficient of Skewness generally lies between -3 &+3.
Bowley's measure of Skewness 
• Absolute measure of Skewness is 
Skewness= Q3+Q1- 2Md 
• Also known as quartile measure of Skewness 
• Lies between -1 & +1 
• It is used when, Open ended classes having ill defined mode & 
distribution with extreme observation & particularly useful . 
Sk(B)= Q3+Q1-2 Md/ Q3-Q1
Interpretation of results of Pearson’s measures:- 
• If Sk(P)= 0 distribution is symmetrical ( non-skewed) 
• If Sk(P)>0 distribution is positively skewed. 
• If Sk(P)<0 distribution is negatively skewed. 
Interpretation of results of Bowley’s measures:- 
• Sk(B)= 0, distribution is symmetrical. 
• Sk (B)>0, distribution is positively skewed. 
• Sk(B)<0, distribution is negatively skewed.
Kelly’s measure of Skewness 
Kelly’s absolute measure of Skewness 
• Skewness= P90+P10-2P50 
• Skewness= D9+D1-2D5 
Kelly’s Coefficient of Skewness is 
• Sk (Kelly)= P90+P10-2P50/P90-P10 
• Sk (Kelly)= D9+D1-2D5/D9-D1 
• Percentile measure of Skewness 
• Seldom used in practice.
Kurtosis 
• Besides central tendency, dispersion and 
skewness, kurtosis is the also one of the measure 
by which the frequency distribution can be 
described and compared. 
• The study of kurtosis helps in studying the 
peakedness of the frequency distribution in 
comparison to normal distribution. 
• Measure of kurtosis give the extent to which the 
distribution is more peaked or flat topped with 
respect to the normal curve.
Types of kurtosis 
• Mesokurtic 
• Leptokurtic 
• Platykurtic
Measures of kurtosis 
• Kurtosis can be measured with the help of quartiles and 
percentiles. 
• Measures of kurtosis based on quartiles and percentiles is 
known as percentile coefficient of kurtosis. 
• It is denoted by k and calculated as: 
k=1/2(Q3-Q1)/P90-P10 
where Q3=upper quartile P90=90th percentile 
Q1=lower quartile P10=10th percentile
Conditions for testing the Kurtosis 
I. If k=0.263,the distribution is mesokurtic. 
II. If k>0.263,the distribution is leptokurtic. 
III. If k<0.263,the distribution is Platykurtic.
Five point summary 
• Five point summary is the descriptive tool 
• Provide information about the set of observation 
• The five-number summary provides a concise 
summary of the distribution of the observations. 
• It allow to recognize the shape of data set.
• It consist of 5 important items: 
–the sample minimum (smallest 
observation) 
– the lower quartile or first quartile 
– the median (middle value) 
–the upper quartile or third quartile 
–the sample maximum (largest 
observation)
• five-number summary gives information about 
– the location (from the median), 
– spread (from the quartiles) and 
– range (from the sample minimum and maximum) of 
the observations
Box-Whisker Plot 
• Shows how the data is distributed using the 
following components 
– Median 
– Upper quartiles 
– Lower quartiles 
– Maximum and Minimum Values
17, 18, 19, 21, 24,26, 27 
The lower quartile (LQ) is the median of 
the lower half of the data. 
The LQ is 18 
The upper quartile (UQ) is the 
median of the upper half of the data. 
The UQ is 26. 
_ 
16 17 18 19 20 21 22 23 24 25 26 27 28
Make a Box & Whisker Plot 
76, 78, 82, 87, 88, 88, 89, 90, 91, 95 
88 
Find the median of this 
segment (LQ) 
LQ = 82 
Find the median of 
this segment. 
UQ = 90
Least 
Value 
Lower 
Quartile 
(LQ) 
Middle 
Quartile 
Upper 
Quartile 
76, 78, 82, 87, 88, 88, 89, 90, 91, 95 
65 70 75 80 85 90 95 
Greatest 
Value 
100 105 
What number represents 25% of the data? 
What number represents 50% of the data? 
What number represents 75% of the data? 
LQ 82 
Median 88 
UQ 90
Box - and - Whisker Plot 
• Displays large set of 
data. 
• Gives general idea of 
how data clusters. 
• Graph includes: 
- Title 
- Labeled intervals 
- Box between lower 
and upper quartiles 
- Whiskers from 
quartiles to extremes 
- Median, quartiles 
and whiskers labeled
Summary 
• Central tendency exhibits central representation 
of data 
• Measure of dispersion depicts the variation of 
data. 
• Measure of Skewness reveal the shape of the 
curve drawn from the distribution of data. 
• Kurtosis is used to measure the convexity of the 
curve. 
• Box - and - Whisker Plot displays large set of 
data, Gives general idea of how data clusters.
Thank you very much 
My Respected Guru & 
My Dear Friends

State presentation2

  • 1.
    Descriptive Analysis GroupC MBM 1st Semester NCC
  • 2.
    Agendas • Measureof Central Tendency • Measure of dispersion • Skewness & kurtosis. • Five number summary • Box- Whisker plot
  • 3.
    Measures of CentralTendency Introduction  Single Data Represents a set of Data  Concentrates towards the middle of the distribution.
  • 4.
    Various Measures ofCentral Tendency • Mean or Average  Simply Mean is the sum of all the observations divided by the number of observation (A.M, G.M, H.M, C.M,W.M) • Median  Median is the positional average of the given series of n observation arranged in an ascending or descending order of magnitude • Mode  The variate value that occurs most frequently is known as a mode. It is denoted by Mo
  • 5.
    Arithmetic Mean (or simply mean) Individual Series Direct Method X̅ = n = No. of observation ΣX = Sum of all observation Short Cut Method X̅ = (a + Σd)/n a= assumed mean, d= deviation from assumed mean = X-a n = total number of observation
  • 6.
    Discrete Series DirectMethod X̅ = Σfx/N Short cut method̅X̅ = a+ Σfd/N
  • 7.
    • Continuous Series:The formula for continuous series is the same as for discrete series, the only difference is that the middle value of a class is to be taken as X in case of continuous series.
  • 8.
    Median • Observationof the central of data set • Suitable when the average of qualitative • Appropriate for the open ended classified data.
  • 9.
    Method of calculation series Individual Series •Ascending or Descending order Md= value of (n+1/2) th item Discrete Series • Calculation of c.f Md= (n+1)/2 Continuous Series •Class should be exclusive Md=n/2 Md=L+(n/2 - cf)/F*h
  • 10.
    Mode • Dictionarymeaning “ most used” • Value that occurs more often or with the greatest frequency
  • 11.
    Mode Type ofSeries Individual the value which has maximum repetition. Discrete value of variate which has maximum frequency is the mode. Continuous
  • 12.
    What is measuresof dispersion?  Central tendency measures do not reveal the variability present in the data.  Dispersion is the scatteredness of the data series around it average.  Dispersion is the extent to which values in a distribution differ from the average of the distribution.
  • 13.
    Why we needmeasures of dispersion? (Significance)  Determine the reliability of an average  Serve as a basis for the control of the variability  To compare the variability of two or more series and  Facilitate the use of other statistical measures.
  • 14.
    Method of MeasuringDispersion 1. Range 2. Quartile Deviation 3. Mean Deviation 4. Standard Deviation These are called absolute measures of dispersion Absolute measures have the units in which the data are collected.
  • 15.
    • Range Therange is the simplest possible measure of dispersion and is defined as the difference between the largest and smallest values of the variable. In symbols, Range = L – S. Where, L = Largest value. S = Smallest value. Usually used in combination with other measures of dispersion.
  • 16.
    Relative measure ofdispersion is the ratio of a measure of dispersion to an appropriate average from which deviations were measured. The important relative measures of dispersion are • Coefficient of Range • Coefficient of Quartile Deviation • Coefficient of Mean Deviation • Coefficient of Standard Deviation
  • 17.
    • In individualobservations and discrete series, L and S are easily identified. In continuous series, the following two methods are followed: Method 1: L = Upper boundary of the highest class S = Lower boundary of the lowest class. Method 2: L = Mid value of the highest class. S = Mid value of the lowest class. Coefficient of Range : • Range is an absolute value, so it cannot compare two distribution with different units. • For the comparison of such distribution coefficient of range is used. • In Symbol , Coefficient of Range= (L-S) / (L+S)
  • 18.
    • Quartile Deviation Quartile Deviation is half of the difference between the first and third quartiles. Hence, it is called Semi Inter Quartile Range. In Symbols, among the quartiles Q1, Q2 and Q3, the range Q3- Q1 is called inter quartile range and (Q1-Q3)/2 is quartile deviation or semi inter quartile range. Coefficient of Quartile Deviation : The relative measure based on lower and upper quartile is known as coefficient of Q.D. Q.D= (Q3-Q1)/(Q3+Q1)
  • 19.
    Mean Deviation Measuresthe ‘average’ distance of each observation away from the mean of the data . Deviation from A.M, Median and Mode Generally more sensitive than the range or interquartile range, since a change in any value will affect it.
  • 20.
    • Formula forcalculating Mean Deviation ~ Mean Deviation from Mean = Σf|x - X̅ |/n ~ Mean Deviation from Median = Σf | X- median|/n ~ mean Deviation from Mode = Σf | X- mode|/n Note: Frequency (f) does not mention in individual series.
  • 21.
    Standard Deviation •Standard Deviation is Standard Deviation • Positive square of the arithmetic mean of the square of the deviation taken from the A.M. • The most common and best measure of dispersion • Takes into account every observation
  • 22.
    Basic Formula ofstandard Deviation SD ( ϭ) = Σ(x-x )2 N • The square of standard deviation is called the variance.
  • 23.
    Coefficient of variation – Compare the variability between two set of data – expressed as a percentage rather than in terms of the units of the particular data Formula for coefficient of variation (CV): CV = ϭ / X̅ * 100
  • 24.
    Skewness • Lackof Symmetry. • According to distribution of data, Skewness is used to measure the shape drawn from frequency distribution. • Relates to the shape of the curve.
  • 25.
    For example setA set B variable (X) frequency (f) variable (X) frequency (f) 10 5 10 5 15 15 15 20 20 30 20 15 25 30 25 45 30 15 30 10 35 5 35 5 Total 100 100
  • 26.
    • In setA & B of the above Example both have same mean , x= 22.5 or standard deviation =6.02 the curve drawn for both cases shows that they have different shapes. Following are the shape of the curve for set A & Set B. The curve of set A is non skewed or normal curve.
  • 27.
    The curve ofSet B The shape of the frequency distribution is skewed .
  • 28.
    A distribution ofdata said to be skewed When????? If….. • Arithmetic mean≠ median≠ mode • Quartiles are not equi -distant from the median • The curve drawn from the frequency distribution isn't of bell shape type.
  • 29.
    Types of Skewness According to the view of elongation of the tail of the curve of the frequency distribution are as follow. • No Skewness or symmetry • Positive Skewness • Negative Skewness
  • 30.
    No Skewness •Distribution of the data said to be no skewed if the curve drawn from the data is Neither elongated more to the left nor to the right side. • The curve equally elongated to the right as well as to the left side • if Mean= median= mode
  • 31.
    Positive Skewness •A distribution of the data is said to have positive skewness or right skewed if the curve drawn from the data is more Elongated to the right side • Mean Median Mode
  • 32.
    Negative Skewness •A distribution of data is said to have negative Skewness of left skewed if the curve drawn from the data is more elongated to the left side • Mean Median  Mode
  • 34.
    Measures of Skewness • Absolute Measure :- It express in terms of original units of the data so it is not appropriate. • Relative measure:- It relates with the consistency it doesn’t contain any units of the data.
  • 35.
    Relative methods ofmeasuring Skewness • Karl Pearson’s measure of Skewness • Bowley’s measure of Skewness • Kelly’s measure of Skewness
  • 36.
    Pearson’s measure ofSkewness Absolute measure of Skewness not in widely used , expressed in the terms of original unit of data . a) Skewness= mean- mode b) Skewness= mean- median The relative measure of Skewness is coefficient of Skewness & frequently used. If mode is defined: Sk= mean- mode / S.D. If mode is ill defined: Sk = 3(mean- median)/ S.D. Pearson’s coefficient of Skewness generally lies between -3 &+3.
  • 37.
    Bowley's measure ofSkewness • Absolute measure of Skewness is Skewness= Q3+Q1- 2Md • Also known as quartile measure of Skewness • Lies between -1 & +1 • It is used when, Open ended classes having ill defined mode & distribution with extreme observation & particularly useful . Sk(B)= Q3+Q1-2 Md/ Q3-Q1
  • 38.
    Interpretation of resultsof Pearson’s measures:- • If Sk(P)= 0 distribution is symmetrical ( non-skewed) • If Sk(P)>0 distribution is positively skewed. • If Sk(P)<0 distribution is negatively skewed. Interpretation of results of Bowley’s measures:- • Sk(B)= 0, distribution is symmetrical. • Sk (B)>0, distribution is positively skewed. • Sk(B)<0, distribution is negatively skewed.
  • 39.
    Kelly’s measure ofSkewness Kelly’s absolute measure of Skewness • Skewness= P90+P10-2P50 • Skewness= D9+D1-2D5 Kelly’s Coefficient of Skewness is • Sk (Kelly)= P90+P10-2P50/P90-P10 • Sk (Kelly)= D9+D1-2D5/D9-D1 • Percentile measure of Skewness • Seldom used in practice.
  • 40.
    Kurtosis • Besidescentral tendency, dispersion and skewness, kurtosis is the also one of the measure by which the frequency distribution can be described and compared. • The study of kurtosis helps in studying the peakedness of the frequency distribution in comparison to normal distribution. • Measure of kurtosis give the extent to which the distribution is more peaked or flat topped with respect to the normal curve.
  • 41.
    Types of kurtosis • Mesokurtic • Leptokurtic • Platykurtic
  • 42.
    Measures of kurtosis • Kurtosis can be measured with the help of quartiles and percentiles. • Measures of kurtosis based on quartiles and percentiles is known as percentile coefficient of kurtosis. • It is denoted by k and calculated as: k=1/2(Q3-Q1)/P90-P10 where Q3=upper quartile P90=90th percentile Q1=lower quartile P10=10th percentile
  • 43.
    Conditions for testingthe Kurtosis I. If k=0.263,the distribution is mesokurtic. II. If k>0.263,the distribution is leptokurtic. III. If k<0.263,the distribution is Platykurtic.
  • 44.
    Five point summary • Five point summary is the descriptive tool • Provide information about the set of observation • The five-number summary provides a concise summary of the distribution of the observations. • It allow to recognize the shape of data set.
  • 45.
    • It consistof 5 important items: –the sample minimum (smallest observation) – the lower quartile or first quartile – the median (middle value) –the upper quartile or third quartile –the sample maximum (largest observation)
  • 46.
    • five-number summarygives information about – the location (from the median), – spread (from the quartiles) and – range (from the sample minimum and maximum) of the observations
  • 47.
    Box-Whisker Plot •Shows how the data is distributed using the following components – Median – Upper quartiles – Lower quartiles – Maximum and Minimum Values
  • 48.
    17, 18, 19,21, 24,26, 27 The lower quartile (LQ) is the median of the lower half of the data. The LQ is 18 The upper quartile (UQ) is the median of the upper half of the data. The UQ is 26. _ 16 17 18 19 20 21 22 23 24 25 26 27 28
  • 49.
    Make a Box& Whisker Plot 76, 78, 82, 87, 88, 88, 89, 90, 91, 95 88 Find the median of this segment (LQ) LQ = 82 Find the median of this segment. UQ = 90
  • 50.
    Least Value Lower Quartile (LQ) Middle Quartile Upper Quartile 76, 78, 82, 87, 88, 88, 89, 90, 91, 95 65 70 75 80 85 90 95 Greatest Value 100 105 What number represents 25% of the data? What number represents 50% of the data? What number represents 75% of the data? LQ 82 Median 88 UQ 90
  • 51.
    Box - and- Whisker Plot • Displays large set of data. • Gives general idea of how data clusters. • Graph includes: - Title - Labeled intervals - Box between lower and upper quartiles - Whiskers from quartiles to extremes - Median, quartiles and whiskers labeled
  • 52.
    Summary • Centraltendency exhibits central representation of data • Measure of dispersion depicts the variation of data. • Measure of Skewness reveal the shape of the curve drawn from the distribution of data. • Kurtosis is used to measure the convexity of the curve. • Box - and - Whisker Plot displays large set of data, Gives general idea of how data clusters.
  • 53.
    Thank you verymuch My Respected Guru & My Dear Friends