Measures of Dispersion

Dr. Manoj Kumar Meher
Kalahandi University
meher.manoj@gmail.com
MEASURES OF DISPERSION

STANDARD DEVIATION
• Standard deviation is the measure of dispersion of a set of data from its mean. It measures
the absolute variability of a distribution; the higher the dispersion or variability, the greater is the
standard deviation and greater will be the magnitude of the deviation of the value from their
mean.

Year 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
Rainfall (in
CM)
210.8 205.6 78.6 158.4 99.5 167.7 152.5 104.6 98.8 125.8 187.6
Average
Temperature
in °C
32.5 29.8 31.6 27.9 33.3 30.1 27.5 27.5 28.6 33.2 28.6
Year 2008 2009 2010 2011 2012 2013 2014 2015
Rainfall (in
CM)
210.8 205.6 78.6 158.4 99.5 167.7 152.5 104.6
Average
Temperature
in °C
32.5 29.8 31.6 27.9 33.3 30.1 27.5 27.5

VARIANCE
• Variance (σ2) in statistics is a measurement of the spread between
numbers in a data set. That is, it measures how far each number in
the set is from the mean and therefore from every other number in
the set.
• Variance measures variability from the average or mean, variability is
volatility, and volatility is a measure of risk. Therefore, the variance
statistic can help determine the risk.
• A large variance indicates that numbers in the set are far from the
mean and from each other, while a small variance indicates the
opposite.
• Variance can be negative. A variance value of zero indicates that all
values within a set of numbers are identical.
• All variances that are not zero will be positive numbers.

Formula for Variance
σ 2
=
∑ 𝑋 − 𝑋 2
𝑁
Where
• σ 2= variance
• X= value of the item
• 𝑋= Mean
• N= Number of elements

X 𝑿 𝑿 − 𝑿 𝑿 − 𝑿 𝟐
304
312
-8 64
351 39 1521
235 -77 5929
258 -54 2916
268 -44 1936
124 -188 35344
144 -168 28224
478 166 27556
123 -189 35721
456 144 20736
852 540 291600
151 -161 25921
∑=3744,
N= 12 ∑=477468
σ 2 =
∑ 𝑋 − 𝑋 2
𝑁
= 477468 / 12 = 39789

EXERCISE: CALCULATE VARIANCE, SD & MEAN
DEV
Sl. No. DISTRICT
Net Area
Sown
Sl. No. DISTRICT
Net Area
Sown
1 Balasore 212 16 Koraput 264
2 Bhadrak 156 17 Malkangiri 134
3 Balangir 336 18 Nabarangpur 179
4 Sonepur 118 19 Rayagada 153
5 Cuttack 140 20 Mayurbhanj 335
6 Jagatsingpur 87 21 Phulbani 105
7 Jajpur 136 22 Boudh 77
8 Kendrapara 137 23 Puri 131
9 Dhenkanal 139 24 Khordha 101
10 Angul 171 25 Nayagarh 119
11 Ganjam 357 26 Sambalpur 177
12 Gajapati 68 27 Bargarh 294
13 Kalahandi 335 28 Deogarh 59
14 Nawapara 163 29 Jharsuguda 65
15 Keonjhar 259 30 Sundargarh 285

COEFFICIENT OF VARIATION
• The coefficient of variation (CV) is a statistical measure of the
dispersion of data points in a data series around the mean. The
coefficient of variation represents the ratio of the standard deviation
to the mean, and it is a useful statistic for comparing the degree of
variation from one data.
• Why do we use coefficient of variation?
• Basically, all the data points are plotted first and then the coefficient
of variation is used to measure the dispersion of those points from
each other and the mean. So it helps us in understanding the data
and also to see the pattern it forms. It is calculated as a ratio of the
standard deviation of the data set to mean value.

Formula
Where:
σ – the standard deviation
μ – the mean

Sl No Region Paddy Yield/hact.
1 R-1 70
2 R-2 75
3 R-3 80
4 R-4 57
5 R-5 59
6 R-6 38
7 R-7 75
8 R-8 24
9 R-9 68
10 R-10 65
11 R-11 57
12 R-12 59
13 R-13 32
14 R-14 29
15 R-15 17
Where:
σ – the standard deviation = 20.41
μ – the mean = 35.67
C.V= SD/ Mean X 100
=20.41/35.67X100
= 38.03

EXERCISE
Sl. No. DISTRICT
Net Area
Sown
Sl. No. DISTRICT
Net Area
Sown
1 Balasore 212 16 Koraput 264
2 Bhadrak 156 17 Malkangiri 134
3 Balangir 336 18 Nabarangpur 179
4 Sonepur 118 19 Rayagada 153
5 Cuttack 140 20 Mayurbhanj 335
6 Jagatsingpur 87 21 Phulbani 105
7 Jajpur 136 22 Boudh 77
8 Kendrapara 137 23 Puri 131
9 Dhenkanal 139 24 Khordha 101
10 Angul 171 25 Nayagarh 119
11 Ganjam 357 26 Sambalpur 177
12 Gajapati 68 27 Bargarh 294
13 Kalahandi 335 28 Deogarh 59
14 Nawapara 163 29 Jharsuguda 65
15 Keonjhar 259 30 Sundargarh 285

CHI-SQUARE (Χ2) AND FREQUENCY DATA
• Up to this point, the inference to the population has been
concerned with “scores” on one or more variables, such as CAT
scores, mathematics achievement, and hours spent on the
computer.
• We used these scores to make the inferences about population
means. To be sure not all research questions involve score data.
• Today the data that we analyze consists of frequencies; that is, the
number of individuals falling into categories. In other words, the
variables are measured on a nominal scale.
• The test statistic for frequency data is Pearson Chi-Square. The
magnitude of Pearson Chi-Square reflects the amount of
discrepancy between observed frequencies and expected
frequencies.
15

STEPS IN TEST OF HYPOTHESIS
1. Determine the appropriate test
2. Establish the level of significance: α
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare computed test statistic against a tabled/critical
value.
16

1. DETERMINE APPROPRIATE TEST
• Chi Square is used when both variables are measured
on a nominal scale.
• It can be applied to interval or ratio data that have
been categorized into a small number of groups.
• It assumes that the observations are randomly
sampled from the population.
• All observations are independent (an individual can
appear only once in a table and there are no
overlapping categories).
• It does not make any assumptions about the shape of
the distribution nor about the homogeneity of
variances.
17

2. ESTABLISH LEVEL OF SIGNIFICANCE
• The significance level, also denoted as alpha or α, is a measure of
the strength of the evidence that must be present in
our sample before you will reject the null hypothesis and conclude
that the effect is statistically significant. The researcher determines
the significance level before conducting the experiment.
• α is a predetermined value
• The convention
• α = .05
• α = .01
• α = .001
18

3. DETERMINE THE HYPOTHESIS:
WHETHER THERE IS AN ASSOCIATION OR NOT
• Ho : The two variables are independent
• Ha : The two variables are associated
19

4. CALCULATING TEST STATISTICS
• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
• The expected frequencies represent the number of cases
that would be found in each cell if the null hypothesis
were true ( i.e. the nominal variables are unrelated).
• Expected frequency of two unrelated events is product of
the row and column frequency divided by number of
cases.
Fe= Fr Fc / N
20
Hypothesis is a tentative assumption made in order to draw out and test its
logical or empirical consequences

21
 




 

e
e
o
F
F
F 2
2 )
(
  




 

e
e
o
F
F
F 2
2 )
(

 




 

e
e
o
F
F
F 2
2 )
(


22
 




 

e
e
o
F
F
F 2
2 )
(
  




 

e
e
o
F
F
F 2
2 )
(


5. DETERMINE DEGREES OF FREEDOM
df = (R-1)(C-1)
23
The degrees of freedom in a statistical calculation represent how
many values involved in a calculation have the freedom to vary. The
degrees of freedom can be calculated to help ensure the statistical
validity of chi-square tests. These tests are commonly used to
compare observed data with data that would be expected to be
obtained according to a specific hypothesis.

6. COMPARE COMPUTED TEST STATISTIC
AGAINST A TABLED/CRITICAL VALUE
• The computed value of the Pearson chi- square statistic is
compared with the critical value to determine if the
computed value is improbable
• The critical tabled values are based on sampling
distributions of the Pearson chi-square statistic
• If calculated 2 is greater than 2 table value, reject Ho
24
• Ho : The two variables are independent
• Ha : The two variables are associated

EXAMPLE
• Suppose a researcher is interested in voting
preferences on employment issues.
• A questionnaire was developed and sent to a
random sample of 90 voters.
• The researcher also collects information
about the political party membership of the
sample of 90 respondents.
25

BIVARIATE FREQUENCY TABLE OR
CONTINGENCY TABLE
Favor Neutral Oppose f row
NDA 10 10 30 50
UPA 15 15 10 40
f column 25 25 40 n = 90
26

CONTINGENCY TABLE
NDA 10 10 30 50
UPA 15 15 10 40
f column 25 25 40 n = 90
27

CONTINGENCY TABLE
NDA 10 10 30 50
UPA 15 15 10 40
f column 25 25 40 n = 90
28

CONTINGENCY TABLE
NDA 10 10 30 50
UPA 15 15 10 40
f column 25 25 40 n = 90
29

1. DETERMINE APPROPRIATE TEST
1. Party Membership ( 2 levels) and Nominal
2. Voting Preference ( 3 levels) and Nominal
30

2. ESTABLISH LEVEL OF SIGNIFICANCE
Alpha of .05 (5%)
31

3. DETERMINE THE HYPOTHESIS
• Ho : There is no difference between D & R in their
opinion on EMPLOYMENT issue.
• Ha : There is an association between responses to
the EMPLOYMENT survey and the party
membership in the population.
32

50*25/90 Favor Neutral Oppose f row
NDA fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
UPA fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
33

NDA fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
UPA fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
34
= 50*25/90

NDA fo =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
UPA fo =15
fe =11.1
fo =15
fe =11.1
fo =10
fe =17.8
40
f column 25 25 40 n = 90
35
= 40* 25/90

8
.
17
)
8
.
17
10
(
11
.
11
)
11
.
11
15
(
11
.
11
)
11
.
11
15
(
2
.
22
)
2
.
22
30
(
89
.
13
)
89
.
13
10
(
89
.
13
)
89
.
13
10
(
2
2
2
2
2
2
2













36
= 11.03
Favor
Neutral
Oppose
f
row
Favor
Neutral
Oppose
Favor
Neutral
Oppose
Favor
Neutral
Oppose
Favor
Neutral
Oppose
NDA 10.00 10.00 30.00 50.00 13.89 13.89 22.22 -3.89 -3.89 7.78 15.12 15.12 60.49 1.09 1.09 2.72
UPA 15.00 15.00 10.00 40.00 11.11 11.11 17.78 3.89 3.89 -7.78 15.12 15.12 60.49 1.36 1.36 3.40
f
column
25 25 40 90 11.03
Observed (O) Expected ( E) (O-E) (O-E)² (O-E)²/E

5. DETERMINE DEGREES OF FREEDOM
df = (R-1)(C-1) =
(2-1)(3-1) = 2
37

6. COMPARE COMPUTED TEST
STATISTIC AGAINST A TABLED/CRITICAL
VALUE
• α = 0.05
• df = 2
• Critical tabled value = 5.99
• Test statistic, 11.03, exceeds critical value
• Null hypothesis is rejected
• NDA & UPA differ significantly in their opinions on
EMPLOYMENT issues.
38

EXERCISE
V1 V2 V3 V4
P1 24 25 9 32
P2 15 25 18 15
P3 12 20 10 30
P4 17 16 45 25

• Hypothesis = Association
• Due to pollution in Delhi there is increase of Covid-19
• Covid-19 is associated with pollution in Delhi
• Critical level =0.001, 0.01, 0.05,
• Degree of freedom = Df = (C-1) X R-1
• Chi Square (X square) = (O-E)2 /E
• X2 Value should be compare with the X2 table
• if calculate value is higher then table value Hypothesis is
rejected
• Alternative hypothesis is accepted

Measures of Dispersion

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Measures of Dispersion

Similar to Measures of Dispersion (20)

Recently uploaded

Recently uploaded (20)

Measures of Dispersion