From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Ch 6 DISPERSION.doc
1. 6
Measures of dispersion
[Q: Discuss measures of dispersion. (BSMMU, MD Radiology,
January, 2010)]
It is the measure of extent of deviation of individual values
from central value.
Dispersion
Nearer to center is better
12 + 3 is better than 12 + 10
Importance:
It helps us to determine how representative the central value is.
If majority of value cluster around the center or are found
scatter within a very narrow range, then the central value is
designated as highly representative.
Types:
[Q:
2. Biostatistics-49
1. What are the measure of desperation? (BSMMU,
Radiology, January, 2012)
2. Enumerate the measures of dispersion. Calculate them
from the following data set of 3,5, 6, 10. (BSMMU, MD
Radiology, January, 2010)
3. Discuss measures of dispersion. (BSMMU, MD, January,
2010)]
a. Absolute
Range
Interquartile range
mean deviation (MD)
Standard deviation (SD)
Standard error (SE)
b. Relative :
Co-efficient of variation
Co-efficient of mean deviation
Multiple regression
RANGE
It is the absolute difference between the highest and lowest
value in a series of observation.
R= (H- L)
R= range
H=highest value
L= lowest value
Range is also calculated by the formula
R=mean + SD
Merits:
a. Easy to compute.
b. It gives some idea about the deviation from the central
value.
Demerits:
3. Biostatistics-50
Only provides us with the idea of extreme value in a set of data.
It does not tell us about the variable of values within the
observation.
INTERQUARTILE RANGE
Range is a measure based on two extreme observations and it
fails to take account of the scatter within the range. In
lnterquartile range some extreme observations on two sides are
discarded.
i.e.
1/4 (25 %) of observations at the lower end and another
(25 %) of observations at the upper end and lnterquartile
range include the middle 50% of observations.
Figure-: lnterquartile range
In other words lnterquartile range represents the
difference between the third quartile and first quartile.
Symbolically
Interquartile range = Q3 – Q1
Semi Interquartile range or quartile deviation
4. Biostatistics-51
= (Q3 – Q1)/2
MEAN DEVIATION
It is the mean of deviation.
sum(x- )
=
x
MD N
Here
x = observation
x =mean
N = no. of observation.
Example: Find out the mean deviation of observation 2,3,4,5
and 6
Observation(x) Mean( )
x deviation from
Mean(x- )
x
2
3
4
5
6 4
-2
-1
0
1
2
Sum of
observation =20
)
x x
-
sum ( = 6
(ignoring minus
sign)
Mean =20/5=4{no of item =5}
Sum of deviation from mean = 6(ignoring - sign)
)
=
x x
MD N
-
sum (
5. Biostatistics-52
6
= =1.2
5
MD
THE VARIANCE
The variance is a measure of how spread out a distribution is.
It is mean squared deviation of the item [If the deviation is x- x ,
then squared deviation is )
x x
-
( 2
] i.e., sum of squares, divided
by the number of independent observations. This number is
not total but one less than the total number of measurements
or observations (n) in the series. Therefore, divide by n - 1. It is
also called the degrees of freedom in statistical terms.
Hence, variance
2
)
variance = 1
x x
n
-
-
sum (
Very often variance is written as Var., or SD2
or S2
for sample or
for universe or population and is made use of in many
statistical methods.
For example, for the numbers 1, 2, and 3, the mean is 2 and the
variance is:
σ2
= .
STANDARD DEVIATION (SD)
[Q: Write short notes on: SD, (BSMMU, MD Radiology, January,
2010)]
It is the measures of dispersion from the mean value.
The formula for the standard deviation is very simple: it is the
square root of the variance. It is the most commonly used
measure of spread.
6. Biostatistics-53
2
)
SD = 1
x x
n
-
-
sum (
Example: Find out the SD of observations 4, 5,6,7,8 & 9
Calculation:
X x x x
- )
x x
-
( 2
4
5
6
7
8
9
6.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
6.25
2.25
0.25
0.25
2.25
6.25
Sum of observation = (4+5+6+7+8+9) = 39
Mean of observations = 39/6= 6.5
Sum of deviation i.e. sum )
x x
-
( =17.5
n=6
(n-1)=6-1=5
2
)
SD = 1
x x
n
-
-
sum (
=
17.5
= 3.5 =1.871
5
Merits:
1. In case of mean deviation we have disregarded the minus
(-) sign to find out the sum deviation. This ignorance of
sign is not correct from algebraic point of view. The
standard deviation provides a basis for overcoming this
error. The individual deviation from the mean is first
squared up to change the -ve sign and then they are
summed up.
2. It takes into consideration of all the values of the series.
7. Biostatistics-54
Standard Deviations of Grouped Data
Suppose we were interested in how many siblings are in
statistics students' families. We come up with a frequency
distribution table below.
Number of Children
1 2 3 4 5 6 7
Frequency
5 12 8 3 0 0 1
We extend the table as follows:
Number of Children
(x)
Frequency
(f)
xf x2
f
1 5 5 5
2 12 24 48
3 8 24 72
4 3 12 48
5 0 0 0
6 0 0 0
7 1 7 49
Totals = 29 =
72
2
f =
222
Next we calculate
2
(72)2
SSx = 2
f - = 222 -
n 29
= 43.24
Now finally apply the formula
8. Biostatistics-55
to get
STANDARD ERROR
[Q: Write short notes on: Standard error of mean (SEM)
(BSMMU, MD Radiology, July, 2010))]
It is the measure inaccuracy of dispersion from the mean.
Standard errors are important because they reflect how much
sampling fluctuation a statistic will show.
Calculation
Standard error is the standard deviation of the sampling
distribution of the mean. The formula for the standard error of
the mean is:
SD
SE =
n
Or
2
)
SE = ( 1)
x x
n n
-
-
sum (
The formula shows that the larger the sample size, the smaller
the standard error of the mean. More specifically, the size of
the standard error of the mean is inversely proportional to the
square root of the sample size.
Example: Find out the SE of observations 4, 5,6,7,8 & 9
9. Biostatistics-56
Calculation:
X x x x
- )
x x
-
( 2
4
5
6
7
8
9
6.5 -2.5
-1.5
-0.5
0.5
1.5
2.5
6.25
2.25
0.25
0.25
2.25
6.25
Sum of x= 39
Mean ( )
x = 6.5
Sum of (x- x ) 2
=17.5
n = 6
So, SE =
17.5
6×5
=
17.5
30
= 0.5833
= 0.7637
[Q: Estimate standard deviation (SD), standard error of
means (SEM) from the following data set of hemoglobin
levels (g/dl): 10.9, 13.8, 18.0, 15.1, 13.5, 14.2 and 13.4.
(BSMMU, Radiology, January, 2012)]
CO-EFFICIENT OF VARIATION:
It is a measure used to compare relative variability. The
variation of the same character in two or more different series
has to be compared quite often. It may be interest to know
whether the weights vary more in spleens or in hearts, growth
varies more in girls or boys.
= ×100
SD
CV Mean
10. Biostatistics-57
[Q:
1. Determine standard deviation, standard error of mean
and coefficient of variation from the given hemoglobin
levels (g/dl) of 10 female workers. 11.3, 11.2, 12.5,
13,0, 9,3, 12.9, 11.4, 12.9, 11.3 & 10.4 (BSMMU, MD
Radiology, July, 2010)
2. What do you mean by central tendency? Calculate
mean, SD, SE and CV from the following figures: 3, 5, 6,
10. (BSMMU, MD Radiology, January, 2009)]