SUMMARY MEASURE
MODULE 3
BIOSTATISTICS
MEASURES OF CENTRAL TENDENCY
• Convey information regarding the average value of a set of values
• The clustering/grouping at a particular value or central point of a frequency
distribution
• Thus, for any particular set of data, a single typical value can be used to
describe the entire data
• Three measures of central location are commonly used in epidemiology:
arithmetic mean, median, and mode
MEASURES OF CENTRAL TENDENCY
The choice of an appropriate measure of central tendency for representing a
distribution depends on three factors
• The way the variables are measured (their level of measurement)
• The shape of the distribution
• The purpose of the research
MEASURES OF CENTRAL TENDENCY
Data can be either symmetric or skewed
• SYMMETRIC. If the data can be divided into pieces that are very similar to each other
• SKEWED. If one tail of a unimodal distribution is longer than the other tail, meaning that the data is not
spread evenly.
Data can be either right skewed (positively skewed) or left skewed (negatively skewed).
• If data is skewed to the right, it will rise quickly to a peak and have a long tail on the right.
• The opposite is true for data that is skewed to the left.
REMEMBER: Skewness refers to the tail, not the hump. So a distribution that is skewed to the left has a
long left tail.
This type of symmetric distribution, as illustrated in Figure 2.2, is the classic bell-shaped curve — also known as a normal
distribution.
In Figure 2.6, distribution A is skewed to the right.A distribution that has a central location to the right and a tail to the
left is said to be negatively skewed or skewed to the left. In Figure 2.6, distribution C is skewed to the left.
MEAN ( !
𝑥:SAMPLE MEAN; µ: POPULATION MEAN)
The mean is simply the arithmetic average of the data and is calculated by taking the sum of all values
in the number set and dividing that total by the number of values in the dataset.The mean is the most
commonly used measure of central tendency.
• Properties of the Mean
1. Uniqueness. For a given set of data there is one and only one arithmetic mean.
2. Simplicity.The arithmetic mean is easily understood and easy to compute.
3. Since each and every value in a set of data enters into the computation of the mean, it is affected by
each value. Extreme values, therefore, have an influence on the mean and, in some cases, can so
distort it that it becomes undesirable as a measure of central tendency.
4.The mean is not the measure of choice for data that are severely skewed or have extreme values in
one direction or another
MEAN
MEDIAN
The median is the 50th percentile of the values in a dataset and represents the literal
middle of the data.
a. If the number of observations (n) is odd, the middle position falls on a single
observation.
b. If the number of observations is even, the middle position falls between two
observations.
MODE
• The category or score with the largest frequency or percentage in the distribution
• It can be determined simply by tallying the number of times each value occurs.
MEASURES OF DISPERSION
Spread, or dispersion, is the second important feature of frequency distributions.
A way to describe the spread of the data, or how far each data point is from the center.
If all the values are the same, there is no dispersion; if they are not all the same, dispersion
is present in the data.
The amount of dispersion may be small when the values, though different, are close
together
If the values are widely scattered, the dispersion is greater.
Measures of spread include the range, interquartile range, variance and standard deviation.
RANGE
The range is the difference between the largest and smallest value in a set of observations.
Range= maximum – minimum
In the statistical world, the range is reported as a single number and is the result of
subtracting the maximum from the minimum value. In the epidemiologic community, the
range is usually reported as “from (the minimum) to (the maximum),” that is, as two
numbers rather than one.
INTERQUARTILE RANGE (IQR)
The interquartile range is the difference between the 25th
percentile (1st quartile) and the 75th percentile (3rd quartile)
in a set of data.
In other words, the interquartile range includes the second
and third quartiles of a distribution
This measurement gives an idea of the middle 50 percent of
the observations and is, therefore, less likely to be influenced
by outliers or extreme values.
VARIANCE (S2:SAMPLE; s2: POPULATION)
• The variance represents the amount of spread or variability around the mean of a set of
data.
• The variance can be described as the average squared deviation of individual values from
the mean of that set
(n-1: degrees of freedom)
STANDARD DEVIATION (S ; s)
The standard deviation of a set of data is
the square root of the variance
Standard deviation is usually calculated only
when the data are more-or-less “normally
distributed”
The variance and the standard deviation are two
closely related measures of variation that
increase or decrease based on how closely the
scores cluster around the mean
• Standard Error (SE)
• The standard error is the standard deviation of the sampling distribution of the means, rather than the
observations themselves.
• The smaller the standard error, the closer any given sample mean is likely to be to the true population
mean.
• The primary practical use of the standard error of the mean is in calculating confidence intervals around
the arithmetic mean.
They do not measure a central tendency or a spread (dispersion), but instead measure location in a data set.
Quartiles: Each quartile includes 25% of the data.
First quartile is the 25th percentile.
Second quartile is the 50th percentile (median)
Third quartile is the 75th percentile
Fourth quartile is the 100th percentile (maximum)
Percentiles: Divide the data in a distribution into 100 equal parts.
The Pth percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below
it.
In other words, the 90th percentile has 90% of the observations at or below it.
The median, the halfway point of the distribution, is the 50th percentile.
The maximum value is the 100th percentile, because all values fall at or below the maximum.
MEASURE OF POSITION
• A general rule to follow is that if the data is skewed either to the left or to the right, the
median represents the data better than the mean.
• If a sample is normally distributed, the mean and median will be nearly the same. With
symmetrical data, the mode will be similar as well.
• The arithmetic mean is the best descriptive measure for data that are normally distributed
SUMMARY MEASURES.pdf

SUMMARY MEASURES.pdf

  • 1.
  • 2.
    MEASURES OF CENTRALTENDENCY • Convey information regarding the average value of a set of values • The clustering/grouping at a particular value or central point of a frequency distribution • Thus, for any particular set of data, a single typical value can be used to describe the entire data • Three measures of central location are commonly used in epidemiology: arithmetic mean, median, and mode
  • 3.
    MEASURES OF CENTRALTENDENCY The choice of an appropriate measure of central tendency for representing a distribution depends on three factors • The way the variables are measured (their level of measurement) • The shape of the distribution • The purpose of the research
  • 4.
    MEASURES OF CENTRALTENDENCY Data can be either symmetric or skewed • SYMMETRIC. If the data can be divided into pieces that are very similar to each other • SKEWED. If one tail of a unimodal distribution is longer than the other tail, meaning that the data is not spread evenly. Data can be either right skewed (positively skewed) or left skewed (negatively skewed). • If data is skewed to the right, it will rise quickly to a peak and have a long tail on the right. • The opposite is true for data that is skewed to the left. REMEMBER: Skewness refers to the tail, not the hump. So a distribution that is skewed to the left has a long left tail.
  • 5.
    This type ofsymmetric distribution, as illustrated in Figure 2.2, is the classic bell-shaped curve — also known as a normal distribution. In Figure 2.6, distribution A is skewed to the right.A distribution that has a central location to the right and a tail to the left is said to be negatively skewed or skewed to the left. In Figure 2.6, distribution C is skewed to the left.
  • 6.
    MEAN ( ! 𝑥:SAMPLEMEAN; µ: POPULATION MEAN) The mean is simply the arithmetic average of the data and is calculated by taking the sum of all values in the number set and dividing that total by the number of values in the dataset.The mean is the most commonly used measure of central tendency. • Properties of the Mean 1. Uniqueness. For a given set of data there is one and only one arithmetic mean. 2. Simplicity.The arithmetic mean is easily understood and easy to compute. 3. Since each and every value in a set of data enters into the computation of the mean, it is affected by each value. Extreme values, therefore, have an influence on the mean and, in some cases, can so distort it that it becomes undesirable as a measure of central tendency. 4.The mean is not the measure of choice for data that are severely skewed or have extreme values in one direction or another
  • 7.
  • 8.
    MEDIAN The median isthe 50th percentile of the values in a dataset and represents the literal middle of the data. a. If the number of observations (n) is odd, the middle position falls on a single observation. b. If the number of observations is even, the middle position falls between two observations.
  • 11.
    MODE • The categoryor score with the largest frequency or percentage in the distribution • It can be determined simply by tallying the number of times each value occurs.
  • 13.
    MEASURES OF DISPERSION Spread,or dispersion, is the second important feature of frequency distributions. A way to describe the spread of the data, or how far each data point is from the center. If all the values are the same, there is no dispersion; if they are not all the same, dispersion is present in the data. The amount of dispersion may be small when the values, though different, are close together If the values are widely scattered, the dispersion is greater. Measures of spread include the range, interquartile range, variance and standard deviation.
  • 14.
    RANGE The range isthe difference between the largest and smallest value in a set of observations. Range= maximum – minimum In the statistical world, the range is reported as a single number and is the result of subtracting the maximum from the minimum value. In the epidemiologic community, the range is usually reported as “from (the minimum) to (the maximum),” that is, as two numbers rather than one.
  • 15.
    INTERQUARTILE RANGE (IQR) Theinterquartile range is the difference between the 25th percentile (1st quartile) and the 75th percentile (3rd quartile) in a set of data. In other words, the interquartile range includes the second and third quartiles of a distribution This measurement gives an idea of the middle 50 percent of the observations and is, therefore, less likely to be influenced by outliers or extreme values.
  • 18.
    VARIANCE (S2:SAMPLE; s2:POPULATION) • The variance represents the amount of spread or variability around the mean of a set of data. • The variance can be described as the average squared deviation of individual values from the mean of that set (n-1: degrees of freedom)
  • 19.
    STANDARD DEVIATION (S; s) The standard deviation of a set of data is the square root of the variance Standard deviation is usually calculated only when the data are more-or-less “normally distributed” The variance and the standard deviation are two closely related measures of variation that increase or decrease based on how closely the scores cluster around the mean
  • 20.
    • Standard Error(SE) • The standard error is the standard deviation of the sampling distribution of the means, rather than the observations themselves. • The smaller the standard error, the closer any given sample mean is likely to be to the true population mean. • The primary practical use of the standard error of the mean is in calculating confidence intervals around the arithmetic mean.
  • 21.
    They do notmeasure a central tendency or a spread (dispersion), but instead measure location in a data set. Quartiles: Each quartile includes 25% of the data. First quartile is the 25th percentile. Second quartile is the 50th percentile (median) Third quartile is the 75th percentile Fourth quartile is the 100th percentile (maximum) Percentiles: Divide the data in a distribution into 100 equal parts. The Pth percentile (P ranging from 0 to 100) is the value that has P percent of the observations falling at or below it. In other words, the 90th percentile has 90% of the observations at or below it. The median, the halfway point of the distribution, is the 50th percentile. The maximum value is the 100th percentile, because all values fall at or below the maximum. MEASURE OF POSITION
  • 23.
    • A generalrule to follow is that if the data is skewed either to the left or to the right, the median represents the data better than the mean. • If a sample is normally distributed, the mean and median will be nearly the same. With symmetrical data, the mode will be similar as well. • The arithmetic mean is the best descriptive measure for data that are normally distributed