Measures of Dispersion
Richard Malumba
Intro..
• Just as we use measures of location to identify the center of the data,
measures of dispersion show the extent of the spread outwards
• The commonest measures of dispersion are the range, the standard
deviation and the coefficient of variation (CV)
Range
• The range is the difference between the highest (Maximum) and
lowest (Minimum) value in a set of observations.
• In bio-statistics the range is often reported by the minimum and
maximum values such as “from (minimum) to (Maximum)” in the
example of height in meters of women: 1.72, 1.6, 1.5, 1.4, 1.6, 1.5,
1.7, 1.55, we demonstrate how to find the range;
Cont..
• Step 1: Arrange data in ascending/descending order 1.4, 1.5, 1.5,
1.55, 1.6, 1.6, 1.7, 1.72
• Step 2: Identify the minimum and maximum
Minimum =1.4 Maximum=1.72
• Step 3: Calculate the range; the difference between the minimum and
maximum figure
1.72-1.4=0.32
Percentiles, quartiles and Inter quartile range
• A value is said to be an nth percentile in a given set of observations if
n percent of the observations fall at or below it.
• In other words a percentile is any of the 99 values that divide the
values in a set of data into 100 equal parts, so that each part
represents 1/100th of the sample or population
• Common percentiles are the 25th, the 50th and the 75th. A 50th
percentile is a median. The 25th and 75th percentiles are also known
as lower and upper quartiles respectively
Cont..
• The 50th percentile (median) is a measure of location/ central
tendency while other percentiles are normally used to gauge the
spread of the data
• One of the commonest use of percentiles is in computation of the
inter quartile range
Example
• The distance in kilometers of 11 health units from the district
headquarters is 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, and 36. We can
calculate the inter quartile range from lower and upper quartiles
using the steps below
• Step 1: Arrange in ascending order
6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49
• Step 2: Identify the position of the lower and upper quartile
Cont..
• Position of lower quartile Q1= =3
• Position of upper quartile Q2==9
• Step 3: Identify the values that occupy positions Q1 and Q2
• Like in a median, if a value occupies the position Q1 then the value is
the lower quartile
• Similarly, if it occupies position Q2 then it is the upper quartile
• The value for position Q1 =15 and the value for position Q2=43
Cont..
• Step 3: Identify the values that occupy positions Q1 and Q2
• Like in a median, if a value occupies the position Q1 then the value is the
lower quartile. Similarly, if it occupies position Q2 then it is the upper
quartile
• The value for position Q1 =15 and the value for position Q2=43
• Step 4: Calculate the inter quartile range
• Subtract the value on position Q1 from the value on position Q2: 43-15=28
Cont..
• If the quartile lies between observations, the value of the quartile is
the value of the lower observation plus the specified fraction of the
difference between the observations. For example if the position of a
quartile 20¼ it lies between 20th and 21st observations
• So the quartile is the value of the 20th observation plus ¼ the
difference between the 20th and 21st observations
• Suppose we had another distance of a health unit as 51km, then
Q1=3¼ and Q2=9.75
Cont..
• Then the lower quartile would be in position between the 3rd (15)
and 4th (36) values; this will be 15 + ¼ X (36-15) = 20.25
• The upper quartile will be between the 9th (43) and 10th (47) values;
this will be 43+ ¾ X (47-43) =46
• The inter quartile range = 46-20.25 = 25.75
Variance and Standard Deviation
• Variance and standard deviation measure how far observations are
from the expected value or the mean. The variance is obtained by
summing up squared differences of the observations from the mean
and dividing by n -1 where n is the number of observations
• The differences add to Zero and when they are squared they all
become positive numbers
Cont..
• The standard deviation is the square root of the variance. The
standard deviation is the most commonly used measure of statistical
dispersion
• It is non-negative and has the same units as the data. The standard
deviation of the population is symbolized by s while for the sample it
is designated as ‘ s
Cont..
1. Calculate the mean
2. Compute the difference between each observation from the mean
(-)
3. Square the differences – when we square the differences, we
eliminate the negative signs and therefore, our sum cannot be zero
4. Get the sum of the squared differences å
Cont..
• Since the data is a sample, divide the sum (from step 4) by the
number of observations minus one, i.e. (n -1) (where n is equal
to the number of observations in the data set). [The term (n -1) will
later be called degrees of freedom]. When we do so, we obtain the
sample variance, usually given as
• On the other hand the population variance, σ2 is obtained by dividing
the sum of the squared differences by the total number of
observations, or population size, N
Cont..
• Since we have all along been dealing with squared differences, we
now obtain the square root of the variance
• The standard deviation is the square root of the variance
Presentation of descriptive data
• Suppose that we have 12 students in a bio-statistics course who have
each achieved a score in a test
• We can present this information in a straight forward and simplistic
way e.g. arranged in order from the smallest to the highest as: 61, 69,
72, 76, 78, 83, 85, 85, 86, 88, 93 and 97
Cont..
• However, this poses problems as it is;
• Too detailed
• Too broad
• Difficult to interpret
• Plausible for small data sets
• There are available methods for summarizing this information in a
way that the data set can be presented more meaningfully
Cont..
• We can summarize nominal and ordinal data (categorical data) using
numbers. We use;
• Frequencies
• Ratios, rates and proportions
• These measures can also be used for numerical data that has been
grouped e.g. age categories
• A commonly used form of proportions is percentages
Tables and graphical presentations
We can use tables and graphs to display data. This depends on whether
it is numerical or categorical. We can employ;
• Frequency tables: We subdivide numerical data into classes e.g. age
groups, and indicate the counts in each group
• Histograms: They mainly show area. The continuous variable of
interest is on the x-axis, usually in grouped form based on ranges. The
size of the ranges must be uniform. The frequency of occurrence is on
the y-axis
Cont..
• Frequency polygons: A derivative of histograms in which a line is
drawn to indicate the frequencies. They are preceded by a frequency
table. They are useful when comparing two distributions on the same
graph
• Line graphs: They indicate the variation of one discrete continuous
variable with another
Cont..
• Scatter plots
• Stem and Leaf plots
• Box and whisker plots

Measures of Disperson epidemiology..pptx

  • 1.
  • 2.
    Intro.. • Just aswe use measures of location to identify the center of the data, measures of dispersion show the extent of the spread outwards • The commonest measures of dispersion are the range, the standard deviation and the coefficient of variation (CV)
  • 3.
    Range • The rangeis the difference between the highest (Maximum) and lowest (Minimum) value in a set of observations. • In bio-statistics the range is often reported by the minimum and maximum values such as “from (minimum) to (Maximum)” in the example of height in meters of women: 1.72, 1.6, 1.5, 1.4, 1.6, 1.5, 1.7, 1.55, we demonstrate how to find the range;
  • 4.
    Cont.. • Step 1:Arrange data in ascending/descending order 1.4, 1.5, 1.5, 1.55, 1.6, 1.6, 1.7, 1.72 • Step 2: Identify the minimum and maximum Minimum =1.4 Maximum=1.72 • Step 3: Calculate the range; the difference between the minimum and maximum figure 1.72-1.4=0.32
  • 5.
    Percentiles, quartiles andInter quartile range • A value is said to be an nth percentile in a given set of observations if n percent of the observations fall at or below it. • In other words a percentile is any of the 99 values that divide the values in a set of data into 100 equal parts, so that each part represents 1/100th of the sample or population • Common percentiles are the 25th, the 50th and the 75th. A 50th percentile is a median. The 25th and 75th percentiles are also known as lower and upper quartiles respectively
  • 6.
    Cont.. • The 50thpercentile (median) is a measure of location/ central tendency while other percentiles are normally used to gauge the spread of the data • One of the commonest use of percentiles is in computation of the inter quartile range
  • 7.
    Example • The distancein kilometers of 11 health units from the district headquarters is 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, and 36. We can calculate the inter quartile range from lower and upper quartiles using the steps below • Step 1: Arrange in ascending order 6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49 • Step 2: Identify the position of the lower and upper quartile
  • 8.
    Cont.. • Position oflower quartile Q1= =3 • Position of upper quartile Q2==9 • Step 3: Identify the values that occupy positions Q1 and Q2 • Like in a median, if a value occupies the position Q1 then the value is the lower quartile • Similarly, if it occupies position Q2 then it is the upper quartile • The value for position Q1 =15 and the value for position Q2=43
  • 9.
    Cont.. • Step 3:Identify the values that occupy positions Q1 and Q2 • Like in a median, if a value occupies the position Q1 then the value is the lower quartile. Similarly, if it occupies position Q2 then it is the upper quartile • The value for position Q1 =15 and the value for position Q2=43 • Step 4: Calculate the inter quartile range • Subtract the value on position Q1 from the value on position Q2: 43-15=28
  • 10.
    Cont.. • If thequartile lies between observations, the value of the quartile is the value of the lower observation plus the specified fraction of the difference between the observations. For example if the position of a quartile 20¼ it lies between 20th and 21st observations • So the quartile is the value of the 20th observation plus ¼ the difference between the 20th and 21st observations • Suppose we had another distance of a health unit as 51km, then Q1=3¼ and Q2=9.75
  • 11.
    Cont.. • Then thelower quartile would be in position between the 3rd (15) and 4th (36) values; this will be 15 + ¼ X (36-15) = 20.25 • The upper quartile will be between the 9th (43) and 10th (47) values; this will be 43+ ¾ X (47-43) =46 • The inter quartile range = 46-20.25 = 25.75
  • 12.
    Variance and StandardDeviation • Variance and standard deviation measure how far observations are from the expected value or the mean. The variance is obtained by summing up squared differences of the observations from the mean and dividing by n -1 where n is the number of observations • The differences add to Zero and when they are squared they all become positive numbers
  • 13.
    Cont.. • The standarddeviation is the square root of the variance. The standard deviation is the most commonly used measure of statistical dispersion • It is non-negative and has the same units as the data. The standard deviation of the population is symbolized by s while for the sample it is designated as ‘ s
  • 14.
    Cont.. 1. Calculate themean 2. Compute the difference between each observation from the mean (-) 3. Square the differences – when we square the differences, we eliminate the negative signs and therefore, our sum cannot be zero 4. Get the sum of the squared differences å
  • 15.
    Cont.. • Since thedata is a sample, divide the sum (from step 4) by the number of observations minus one, i.e. (n -1) (where n is equal to the number of observations in the data set). [The term (n -1) will later be called degrees of freedom]. When we do so, we obtain the sample variance, usually given as • On the other hand the population variance, σ2 is obtained by dividing the sum of the squared differences by the total number of observations, or population size, N
  • 16.
    Cont.. • Since wehave all along been dealing with squared differences, we now obtain the square root of the variance • The standard deviation is the square root of the variance
  • 17.
    Presentation of descriptivedata • Suppose that we have 12 students in a bio-statistics course who have each achieved a score in a test • We can present this information in a straight forward and simplistic way e.g. arranged in order from the smallest to the highest as: 61, 69, 72, 76, 78, 83, 85, 85, 86, 88, 93 and 97
  • 18.
    Cont.. • However, thisposes problems as it is; • Too detailed • Too broad • Difficult to interpret • Plausible for small data sets • There are available methods for summarizing this information in a way that the data set can be presented more meaningfully
  • 19.
    Cont.. • We cansummarize nominal and ordinal data (categorical data) using numbers. We use; • Frequencies • Ratios, rates and proportions • These measures can also be used for numerical data that has been grouped e.g. age categories • A commonly used form of proportions is percentages
  • 20.
    Tables and graphicalpresentations We can use tables and graphs to display data. This depends on whether it is numerical or categorical. We can employ; • Frequency tables: We subdivide numerical data into classes e.g. age groups, and indicate the counts in each group • Histograms: They mainly show area. The continuous variable of interest is on the x-axis, usually in grouped form based on ranges. The size of the ranges must be uniform. The frequency of occurrence is on the y-axis
  • 21.
    Cont.. • Frequency polygons:A derivative of histograms in which a line is drawn to indicate the frequencies. They are preceded by a frequency table. They are useful when comparing two distributions on the same graph • Line graphs: They indicate the variation of one discrete continuous variable with another
  • 22.
    Cont.. • Scatter plots •Stem and Leaf plots • Box and whisker plots