More Related Content
Similar to HI 224 Chapter 10 (20)
More from BealCollegeOnline (20)
HI 224 Chapter 10
- 2. © 2019 AHIMA
ahima.org
Learning Objectives
• Explain how and why percentiles are used
• Compute the percentile from an ungrouped distribution
• Prepare statistics measuring central tendency such as mean,
median or mode
• Prepare statistics measuring variation such as range,
variance, standard deviation, and correlation
2
- 4. © 2019 AHIMA
ahima.org
Descriptive Statistics (continued)
• Variable
• A variable is a characteristic that can have different
values
• Examples
• A person may be HIV negative or positive
• The variable is HIV status
• The values are negative and positive
• Third-party payers include numerous insurance companies
• The variable is “third-party payer”
• The values are the names of the organizations paying for
services
- 5. © 2019 AHIMA
ahima.org
Frequency Distribution
• Shows the values that a variable can take and the
number of observations associated with each value
• A variable is a characteristic or property that may
take on different values
- 6. © 2019 AHIMA
ahima.org
Rank
• Denotes a value’s position in a group relative to
other values organized in order of magnitude
• E.g., a value of 50 means that the value is 50th from the
beginning (or end) of a series
• The position of the observation is more important
than the number associated with it
- 7. © 2019 AHIMA
ahima.org
Quartile
• Data arranged in four equal parts
• First quartile includes the first 25% of the data
• Second quartile includes the first 25% of the data and
second 25% of the data
• Third quartile extends to 75% of the data
- 8. © 2019 AHIMA
ahima.org
Decile
• Data divided into ten equal parts
• First decile correspond to the 10th percentile
• Second decile corresponds to the 20th percentile and so
on
- 9. © 2019 AHIMA
ahima.org
Percentile
• Separate the scores into 100 equal parts
• A score at the 54th percentile means that the score is
greater than or equal to 54 percent of the scores in the
group
• Called a percentile rank
- 10. © 2019 AHIMA
ahima.org
Measures of Central Tendency
• In summarizing data, it is often useful to have a
single typical number that is representative of the
entire collection of data or specific population
• Such numbers are customarily referred to as
measures of central tendency
• Three measures of central tendency are frequently
used:
• Mean
• Median
• Mode
- 11. © 2019 AHIMA
ahima.org
Mean
• The arithmetic average
• It is common to use the term “average” to designate
mean
• To obtain the mean, add all the values in a frequency
distribution and then divide the total by the number of
values in the distribution
• For example, seven hospital inpatients have the following
lengths of stay: 2, 3, 4, 3, 5, 1, and 3
• The frequency distribution in order is 1, 2, 3, 3, 3, 4, and 5
• To construct a frequency distribution, all the values that the
LOS can take are listed, and the number of times a discharged
patient had that particular LOS is entered
- 12. © 2019 AHIMA
ahima.org
Mean (continued)
• To determine the mean, we sum all the values in the
frequency distribution and divide by the frequency
• The total is 21
• We arrived at this by adding 2 + 3 + 4 + 3 + 5 + 1 + 3
• To arrive at the mean, divide 21 by the number of
values (or frequency distribution) which in our case is 7
• The mean (or average) equals three days
• Formula
• Total sum of all the values /Number of the values involved = X
or
• Σ scores/N
- 13. © 2019 AHIMA
ahima.org
Mean (continued)
• The most common measure of central tendency
• Advantage
• Easy to compute
• Disadvantage
• It is sensitive to extreme values, called outliers that may
distort its representation of the typical value of a set of
numbers
- 14. © 2019 AHIMA
ahima.org
Median
• The midpoint (center) of the distribution of values
• Point above and below which 50 percent of the values
lie
• Describes the middle of the data, literally
• The median value is obtained by arranging the
numerical observations in ascending or descending
order and then determining the value in the middle
of the array
• May be the middle observation (if there is an odd
number of values)
• May be a point halfway between the two middle values
(if there is an even number of values)
- 15. © 2019 AHIMA
ahima.org
Median (continued)
• To arrive at the median in an even-numbered
distribution, add the two middle values together
and divide by two
• The advantage of using the median as a measure of
central tendency is that it is unaffected by extreme
values
- 16. © 2019 AHIMA
ahima.org
Mode
• The value that occurs with highest frequency
• It is the most typical
• Simplest of the measures of central tendency because
it does not require any calculations
• In the case of a small number of values, each value
likely may occur only once and there will be no mode
• The mode is rarely used as a sole descriptive measure
of central tendency because it may not be unique
• There may be two or more modes
• These are called bimodal or multimodal distributions
- 17. © 2019 AHIMA
ahima.org
Choice of a Measure of
Central Tendency
• Depends on the number of values and the nature of
their distribution
• Sometimes the mean, median, and mode are identical
• The mean is preferable because it includes information
from all observations
• If the series of values contains a few that are unusually
high or low, the median may represent the series better
than the mean
• The mode is often used in samples where the most
typical value is preferred
• The mode does not have to be numerical
- 18. © 2019 AHIMA
ahima.org
Measures of Variation
• Variability
• We also want to consider the spread of the distribution,
which tells us how widely the observations are spread
out around the measure of central tendency
• The mean gives a measure of central tendency of a list
of numbers but tells nothing about the spread of the
numbers in the list
• Variability refers to the difference between each score
and every other score
- 19. © 2019 AHIMA
ahima.org
Measures of Variation (continued)
• Range
• The simplest measure of spread
• The range is the difference between the largest and
smallest values in a frequency distribution
• The easiest to compute
• It is the simplest, order-based measure of spread, but it
is far from optimal as a measure of variability for two
reasons
• First, as the sample size increases, the range also tends to
increase
• Second, it is obviously affected by extreme values which are
very different from other values in the data
- 20. © 2019 AHIMA
ahima.org
Variance
• A frequency distribution is the average of the
standard deviations from the mean
• The variance of a sample is symbolized by s2
• The variance of a distribution is larger when the
observations are widely spread
- 21. © 2019 AHIMA
ahima.org
Variance (continued)
• To calculate the variance, first determine the mean
• Then, the squared deviations of the mean are calculated by subtracting
the mean of the frequency distribution from each value in the
distribution. The difference between the two values is squared (X –
Xbar)2
• The squared differences are summed and divided by N – 1
• S2 = variance
• = sum
• X = value of a measure or observation
• Xbar = mean
• N = number of values or observations
• The term N – 1 is used in the denominator instead of N to adjust for the
fact that the mean of the sample is used as an estimate of the mean of
the underlying population
- 22. © 2019 AHIMA
ahima.org
Standard Deviation (SD)
• Square root of the variance
• Because SD is the square root of the variance, can
be more easily interpreted as a measure of
variation
• When SD is small, less dispersion around the mean
• When SD is large, greater dispersion around the
mean
- 23. © 2019 AHIMA
ahima.org
Standard Deviation (continued)
• To understand this concept, it can help to learn
about what mathematicians call normal
distribution of data:
• A normal distribution of data means that most of the
values in a set of data are close to the “average,” while
relatively few values tend to one extreme or the other
• The standard deviation is a statistic that tells you how
closely all the observations are clustered around the
mean in a set of data
• When the examples are closely gathered, and the bell-
shaped curve is steep, the standard deviation is small
• When the examples are spread apart, and the bell curve
is relatively flat, the standard deviation will be larger
- 24. © 2019 AHIMA
ahima.org
Standard Deviation (continued)
• Normal distribution means that if the variable on every
person in the population were measured, the frequency
distribution would display a normal pattern, with most
of the measurements near the center of the frequency
• It also would be possible to accurately and summarily
describe the population, with respect to variable, by
calculating the mean, variance, and SD of the values
• In a normal distribution, one SD in both directions from
the mean contains 68.3 percent of all values
• Two SDs in both directions from the mean contain 95.5
percent of all values
• Three SDs in both directions from the mean contain 99.7
percent of all observations
- 25. © 2019 AHIMA
ahima.org
Skew
• Not all distributions are symmetrical or have the
usual bell-shaped curve
• Some curves are skewed
• Their numbers do not fall in the middle, but rather on
one end of the curve
• Skewness is the horizontal stretching of a frequency
distribution to one side or the other so that one tail is
longer than the other
• The direction of skewness is on the side of the long tail
• That is, if the longer tail is on the right then the curve is skewed
to the right
• If the longer tail is on the left, then the curve is skewed to the
left
- 26. © 2019 AHIMA
ahima.org
Correlation
• Measures the extent of a linear relationship
between two variables and can be described as
strong, moderate, or week and positive or negative
• A positive relationship between two variables is
direct
• Negative relationships are inverse
- 27. © 2019 AHIMA
ahima.org
Correlation (continued)
• Correlation does not imply causation
• Just because two variables are highly correlated does
not mean that one causes the other
- 28. © 2019 AHIMA
ahima.org
Correlation (continued)
• The value for correlation is always between -1 and
+1
• A correlation of 0 means there is no relationship
between the variables
• -1 implies a perfect negative (inverse) relationship
• +1 implies a perfect positive (direct) relationship