Dr. Aarati vijaykumar
1st year M.D (K.C)
Definition of statistics: It is the ‘science of collecting,
classifying, presenting & interpreting data’ relating to any
sphere of enquiry.
Having learnt the methods of collection & presentation
of data, we have to understand & grasp the application of
mathematical techniques involved in analysis & interpretation
of the data.
As medicos, we should learn to apply the formulae
straight to our problems without worrying how they have
been deduced. Application of methods for analysis is quite
easy & we should become familiar with them so as to verify
our preconceived ideas or to remove doubts which might arise
at the first look of figures collected.
“ If a man will begin with certainties, he shall end in
doubts’: but if he will be content to begin with doubts, he
shall end in certainties.”
- Francis Bacon
Characteristics of frequency distribution is of
1. Measures of central tendency ( Location, Position, Average)
2. Measures of dispersion ( Scatterdness, Variability, Spread)
It refers to a single central number or value that
condenses the mass data & enables us to give an idea
about the whole or entire data.
1. Arithmetic Mean
2. Median Q2
3. The mode Z
It is the most commonly used measure of central
It is also called as ‘Average’.
It is defined as additional or summation of all
individual observations divided by the total number of
Types of series
1. Ungrouped series ( Ungrouped data, Unclassified data, Raw
data ) : Includes individual observations without frequency.
2. Grouped series ( Classified data ) : Includes individual
observation with frequency & class frequency.
1. Direct method
2. Indirect method
Merits of Arithmetic Mean
1. Easy to understand & to calculate.
2. It is correctly or rigidly defined.
3. It is based on each & every observation.
4. Every set of data has one & only one A.M.
5. Used for further mathematical calculations like standard
Demerits of Arithmetic Mean
1. Affected by extreme values ( either low or high)
2. It can not be obtained even if a single value is missing.
It is called Q2 because it denotes 2nd quartile or positional
It is the 2nd measure of central tendency.
Here there are 3 quartile Q1 , Q2 , Q3 which divides
the distribution into 4 parts or equal.
A Q1 Q2 Q3 B
Median divides the distribution into two equal parts i.e.
50% of the distribution is below the median & 50% is above
Q1 = n/4, Q2 = 3 x n/4
When ‘n’ is odd if the total number of observations are
even, then arrange the observations either in ascending or
descending order & calculate the median by formula.
Q3 = n+1/ 2
Dictionary meaning of mode is common or fashionable.
Mode is the value which occurs more frequently in a given set
There are 3 types
Ex: Selection of mode : Observation having the highest
Mode = 10
Type 2 :
Selection of mode: Observation containing highest frequency.
Ex: Number of children per family.
No.of children/Family No.of families
25 is highest frequency so ‘2’ is mode.
Class containing highest frequency.
Merits of Mode:
1. Easy to calculate & understand.
2. Not affected by extreme value.
3. Mode can be found by both qualitative & quantitative data.
Demerits of Mode:
1. Some times no mode or more then one mode in a given set
2. Not used for further mathematical calculation.
3. Not commonly used.
Examples of Ungrouped series :
1. Direct method
x = Individual observation
n = Number of observation
Ex: Systolic BP of the patients, calculate mean, mode & median.
1. 110mmHg x1
2. 100mmHg x2
3. 150mmHg x3
4. 140mmHg x4
5. 140mmHg x5
6. 120mmHg x6
Mean ( Average ) : = ∑x/n
∑ = Summation
n = Number of samples
x = Individual observation.
∑x = x1+ x2+ x3 + x4 + x5 + x6
Most repeated number in the data: 140mmHg
Median : 100, 110, 120, 140, 140,150
= 120+140 = 260/2
Step deviation method of calculation mean :
Ex: Height of the school children's given below find out the
1. 148cm x1
2. 143cm x2
3. 160cm x3
4. 152cm x4
5. 157cm x5
6. 150cm x6
7. 155cm x7
Working origin ( w ) = 150cm
= ∑ ( x – w ) / n
148 -150 = -2
143 -150 = -7
160 - 150 = 10
152 - 150 = 2
157 - 150 = 7
150 -150 = 0
155 -150 = 5
= 15/7 = 2.1
= w +
= 150 + 2.1
Find mean days of confinement after delivery in the following?
Mean = ∑fx/n , ∑f = n
No. of patients
Total days of
6 5 30
7 4 28
8 4 32
9 3 27
10 2 20
Measures of variability describes the spread or scatterdness
of the individual observation around the central tendency.
1. Gives complete idea/picture of data
2. Helps in comparison of distribution.
3. Useful for further calculations
4. Gives idea about the reliability of average value.
Methods of dispersion
1. Range ( R )
2. Inter quartile range ( IQR )
3. Quartile deviation / Semi inter quartile range
4. Mean deviation / Average deviation (MD)
5. Standard deviation (SD)
Is defined as the difference between the highest & lowest values
in a set of data.
R = H – L
Ex: Weight of an adult person 50 -100kg
Easy to calculate & understand
Has got a well defined formula
gives first hand information about variation
It is not based on all the values
Affected by extreme value
It is the interval between the value of upper
quartile ( the value above which 25% observation
falls) & lower quartile ( the values which fall
below the 25% ).
So the measures gives us the range of middle
50% of observation & it is very helpful when the
observations are not homogenous & extreme in
nature. It is the superior measure over the range
in such conditions.
Merits of IQR:
Easy & simple to understand
Easy to calculate
Not affected by extreme values
Demerits of IQR :
It is a positional value which is based on two
Based on first & last values
It is an average amount of scatter of the
items in a distribution from any measures of
the central tendency by ignoring the
Formula: M.D = ∑ |x – | / nx
Example: Average marks obtained in 5 internals by a
x x -
25 25- 22 = 3
15 15- 22 = -7
25 25-22 = 3
25 25-22 = 3
20 20- 22 = -2
It is most widely used, best method of
Though in AD it takes into consideration of all the
observation & it ignores the mathematical signs,
but SD overcomes this problem by squaring the
SD is the square root of summation of square
of deviation of given set of observation from the
AM divided by the total number of observation.
Formula : Ungrouped series
Standard deviation = ∑( x- )2 / n
n ˃ 30
Standard deviation = ∑f (x - )2 / n
n ˂ 30
Where, ∑ – is Summation of,
x – is Individual observation, – is Arithmetic mean,
n – is Total number of observation
Average marks obtained in 5 internals by a student
S.D = ∑ ( X - ) 2 / n = 80/5 = 16
x - ( x - )2
25 25 – 22 = 3 9
15 15 – 22 = -7 49
25 25 – 22 = 3 9
25 25 – 22 = 3 9
20 20 – 22 = -2 4
= 110 = 80
Co – efficient of SD = SD/ Mean x 100
= 4 / 22 x 100
= 400 / 22
= 18.1 %
Significance of SD :
Based on all observations.
Best method of calculation without ignoring mathematical
Useful for further statistical calculations. (i.e. Test of
Useful for calculation of standard error.
Lesser the standard deviation, better the estimation of