1.
Background | Enter Data | Analyze Data | Interpret Data | Report Data
Z-Scores
Sometimes we want to do more than summarize a bunch of scores. Sometimes
we want to talk about particular scores within the bunch. We may want to tell
other people about whether or not a score is above or below average. We may
want to tell other people how far away a particular score is from average. We
might also want to compare scores from different bunches of data. We will want
to know which score is better. Z-scores can help with all of this.
They Tell Us Important Things
Z-Scores tell us whether a particular score is equal to the mean, below the
mean or above the mean of a bunch of scores. They can also tell us how far a
particular score is away from the mean. Is a particular score close to the mean
or far away?
If a Z-Score….
ü Has a value of 0, it is equal to the group mean.
ü Is positive, it is above the group mean.
ü Is negative, it is below the group mean.
ü Is equal to +1, it is 1 Standard Deviation above the mean.
ü Is equal to +2, it is 2 Standard Deviations above the mean.
ü Is equal to -1, it is 1 Standard Deviation below the mean.
ü Is equal to -2, it is 2 Standard Deviations below the mean.
Z-Scores Can Help Us Understand…
How typical a particular score is within bunch of scores. If data are normally
distributed, approximately 95% of the data should have Z-score between -2 and
+2. Z-scores that do not fall within this range may be less typical of the data in a
bunch of scores.
2.
Z-Scores Can Help Us Compare…
Individual scores from different bunches of data. We can use Z-scores to
standardize scores from different groups of data. Then we can compare raw
scores from different bunches of data.
How Do You Calculate a Z-Score/ Sigma Level?
Jeff Sauro • June 14, 2004
The benefit of using a z-score in usability metrics was explained in "What's a
Z-Score and why use it in Usability Testing?" this article discusses different
ways of calculating a z-score.
The short answer is: It depends on your data and what you're looking for. If
you've encountered the z-score in a statistics book you usually get some
formula like:
The above formula is for obtaining a z-score for an entire population. Usability
testing obviously samples a very small subset of the population and thus the
following formula is used:
Where x-bar and s are used as estimators for the population's true mean and
standard deviation. Both formulas essentially calculate the same thing:
3.
Calculating a Z-Score Example
For example, lets say you took the GRE a few weeks ago and got
scores of 630 Verbal and 700 Quantitative. How good are these
scores? Which is better, the Verbal or Quantitative score? Using
a z-score can tell you how far you are from the mean and thus
how well you performed. If you know the mean and standard
deviations for a set of GRE test takers you can compare your
scores.
the means and standard deviations of a set of test takers on the
GRE website
verbal quantitative
mean 469 591
StDev 119 148
By plugging in your scores you get the following:
Verbal z = (630 - 469) ÷ 119 = 1.35σ
Quantitative z = (700 - 591) ÷ 148 = .736σ
To convert these sigma values into a percentage you can look
them up in a standard z-table, use the Excel formula
=NORMSDIST(1.35) or use the Z-Score to Percentile Calculator
(choose 1-sided) and get the percentages : 91% Verbal and 77%
Quantitative. You can see where your score falls within the
4.
sample of other test takers and also see that the verbal score
was better than the quantitative score. Assuming the sample data
was normally distributed, here's how the scores would look
graphically
Figure 1: Verbal Score
Figure 2: Quantitative Score
Z-Scores and Process Sigma
An interactive Graph of the Standard Normal Curve similar to Figures 1 & 2 is
available for you to visualize how the z-scores and the area under the normal
curve correspond. The graphs also allow you to see the difference between one
and two-sided (also called two-tailed) areas. In Six Sigma the process sigma
metric is derived using the same method as a z-score. However, in Six Sigma
you are measuring the distance a sample mean is above a specification limit--
there can be an upper and lower spec limit that a sample must fall between as
5.
well. As in the z-score, you still use the same normal-deviates from the z-table
to approximate the area under the curve. The process sigma metric is
essentially a Z equivalent.
When testing software with users, task times are usually a good metric that will
reveal the individual differences in performance. For task times there typically is
only an upper spec limit. That is, it usually doesn't matter how fast a user
completes a task, but it does matter if a user takes too long. For example, say
you and your product team determined that a task should be completed in 120
seconds. 120 seconds becomes your Upper Spec Limit (USL). You sampled 10
users and got these task times:
Sample
100
99
101
125
100
123
96
90
98
116
USL: 120
Mean: 104
StDev: 12
To calculate the process sigma you subtract the mean (104) of
the sample from the target (120) and divide by the sample
standard deviation (12). For Sample 1 the process sigma is
-1.32σ. The visual representation of the data can be seen below:
6.
In the case of task times, a negative process sigma is ideal--as you want more
people completing the task below the task time, not above it. You can simply
drop the negative when communicating the results in the event it causes
confusion. If you were to make radical improvements to the UI and then
sampled another set of ten users, here are more results:
Sample 2
60
75
99
88
65
72
75
72
87
65
USL: 120
Mean: 75.8
StDev: 12.14
7.
In the redesign, the average of the new sample is well below the
spec limit and the process sigma is now very high. The
corresponding defect area is now only .01% and the quality area
is 99.98
Of course having users perform that much below the spec limit is
not very common due to the inherent variability in user
performance.
If you need more help with z-scores, see the Crash course in Z-
scores, a tutorial with plenty of pictures, examples and review
questions for you to grasp this concept
The z-score
The Standard Normal Distribution
8.
Definition of the Standard Normal Distribution
The Standard Normal distribution follows a normal distribution and has
mean 0 and standard deviation 1
Notice that the distribution is perfectly symmetric about 0.
If a distribution is normal but not standard, we can convert a value to the
Standard normal distribution table by first by finding how many standard
deviations away the number is from the mean.
The z-score
The number of standard deviations from the mean is called the z-score and can
be found by the formula
x -
z =
Example
9.
Find the z-score corresponding to a raw score of 132 from a normal distribution
with mean 100 and standard deviation 15.
Solution
We compute
132 -
z = = 2.133
15
Example
A z-score of 1.7 was found from an observation coming from a normal
distribution with mean 14 and standard deviation 3. Find the raw score.
Solution
We have
x -
1.7 =
3
To solve this we just multiply both sides by the denominator 3,
(1.7)(3) = x - 14
5.1 = x - 14
x = 19.1
The z-score and Area
Often we want to find the probability that a z-score will be less than a given
10.
value, greater than a given value, or in between
two values. To accomplish this, we use the table
from the textbook and a few properties about the
normal distribution.
Example
Find
P(z < 2.37)
Solution
We use the table. Notice the picture on the table has shaded region
corresponding to the area to the left (below) a z-score. This is exactly what we
want. Below are a few lines of the table.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
2.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .9916
2.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .9936
The columns corresponds to the ones and tenths digits of the z-score and the
rows correspond to the hundredths digits. For our problem we want the row 2.3
(from 2.37) and the row .07 (from 2.37). The
number in the table that matches this is .9911.
Hence
P(z < 2.37) = .9911
Example
11.
Find
P(z > 1.82)
Solution
In this case, we want the area to the right of 1.82.
This is not what is given in the table. We can use
the identity
P(z > 1.82) = 1 - P(z < 1.82)
reading the table gives
P(z < 1.82) = .9656
Our answer is
P(z > 1.82) = 1 - .9656 = .0344
Example
Find
P(-1.18 < z < 2.1)
Solution
Once again, the table does not exactly handle this type of area. However, the
area between -1.18 and 2.1 is equal to the area to the left of 2.1 minus the area
to the left of -1.18. That is
P(-1.18 < z < 2.1) = P(z < 2.1) - P(z < -1.18)
To find P(z < 2.1) we rewrite it as P(z < 2.10) and use the table to get
P(z < 2.10) = .9821.
12.
The table also tells us that
P(z < -1.18) = .1190
Now subtract to get
P(-1.18 < z < 2.1) = .9821 - .1190 = .8631
Back to the Probability Home Page
Back to the Elementary Statistics (Math 201) Home Page
e-mail Questions and Suggestions
• scoring: Use the table below to determine your BMI rating. The table
shows the World Health Organization BMI classification system. The rating
scale is the same for males and females. You can also use the reverse
lookup BMI table for determining your ideal weight based on height.
classification BMI (kg/m2) sub-classification
BMI
(kg/m2)
underweight < 18.50 Severe thinness < 16.00
Moderate thinness 16.00 -
16.99
Mild thinness 17.00 -
18.49
normal range 18.5 - 24.99 normal 18.5 -
24.99
overweight ≥ 25.00 pre-obese 25.00 -
13.
29.99
Obese
(≥ 30.00)
obese
class I
30.00 -
34.99
obese
class II
35.00 -
39.99
obese
class II
≥ 40.00
source: World Health Organization
Fitness Testing
Fitness Testing > Tests > Anthropometry > Body Composition > Waist to Hip
Ratio
Waist to Hip Ratio (WHR)
•
aim: the purpose of this test to determine the ratio of waist circumference
to the hip circumference, as this has been shown to be related to the risk of
coronary heart disease.
• equipment required: tape measure
• procedure: A simple calculation of the measurements of the waist girth
divided by the hip girth. Waist to Hip Ratio (WHR) = Gw / Gh, where Gw =
waist girth, Gh = hip girth. It does not matter which units of measurement
you use, as long as it is the same for each measure.
14.
• scoring: The table below gives general guidelines for acceptable levels
for hip to waist ratio. You can use any units for the measurements (e.g. cm
or inches), as it is only the ratio that is important.
acceptable unacceptable
excellent good average high extreme
male < 0.85 0.85 -
0.90
0.90 - 0.95 0.95 -
1.00
> 1.00
female < 0.75 0.75 -
0.80
0.80 - 0.85 0.85 -
0.90
> 0.90
• target population: This measure is often used to determine the coronary
artery disease risk factor associated with obesity.
Anthropometric Results
Anthropometric results should be interpreted based on the WHO classifications
as described below using the WHO standard curves.
Cut offs for acute malnutrition (wasting)
Acute malnutrition based on weight-for-Height in z-scores and percentage of the
median
15.
Table 6: Cut off points for acute malnutrition (weight for height)
Degree of
malnutrition
Definition using
z-score
Definitions using
% of median
Acute None/Mild ≥ -2.0 ≥ 80%
Moderate ≥ - 3.0 but <-2.0 ≥70% but <80%
Severe <-3.0 or oedema <70% or
Oedema
Global Acute
(GAM)
Moderate
+
Severe
<-2.0 and/or
Oedema
<80% and/or
Oedema
Severe Acute
(SAM)
Severe < - 3.0 and/ or
Oedema
<70% and/or
Oedema
Cut off points for chronic malnutrition (Stunting)
Chronic malnutrition based on Height-for-Age in z-scores and percentage of the
median
Cut off points for chronic malnutrition (Stunting)
Chronic malnutrition based on Height-for-Age in z-scores and percentage of the
median
Table 7: Cut off points for chronic malnutrition (height for age)
Height for age z-scores Height for age % of
median
Normal/Not Stunted ≥-2 z-scores ≥ 90
Moderate chronic
malnutrition
≥ - 3.0 but <-2.0 ≥ 80% and <90%
Severe chronic
malnutrition/Severely
stunted
<-3 Z scores <80%
Total chronic
malnutrition/Total stunted
(moderate + severe)
<-2 Z score <90%
Cut off points for Underweight
16.
Underweight based on Weight-for-Age in z-scores and percentage of the
median
Table 8: Cut off points for Underweight
Description of Nutritional
Status
Weight for Age Index Z
scores
Weight for Age % of
median
Severe Underweight <-3 Z scores <70%
Moderately Underweight ≥ - 3.0 but <-2.0 ≥ 70% and <80%
Total Underweight
(moderate plus severe)
<-2 Z score <80%
Normal ≥-2 Z-scores ≥ 80%
Using a global classifications of malnutrition
The following classifications for malnutrition have been established by WHO as
levels for interpreting
WFH, HFA and WFA z-scores (WHO 2002).
For acute malnutrition (wasting), care needs to be taken to assess the context;
a prevalence classified as “poor/medium” but which is likely to get worst will
have different programmatic implications than a prevalence classified as
“serious/high” but where the situation is likely to improve (e.g. impending good
harvest).
Table 9: prevalence of malnutrition and interpretation levels
Index Normal/
Low
Poor/
Medium
Serious/
High
Critical/
Very high
Wasting
(GAM)
<5% 5-9.9% 10-14.9% >15%
Stunting <20% 20-29.9% 30-39.9% >40%
Underweight <10% 10-19.9% 20-29.9% >30%
Risk of mortality using MUAC
For children taller/greater than 65 cm
17.
Table 10: MUAC cut-offs and risk of mortality
Nutritional Status MUAC
Severe <11.0 cm
Moderate >11.0 and 12.5 cm
Mild malnutrition >12.5 and 13.5cm
Satisfactory nutritional status > 13.5cm
Note: New WHO standards recommend MUAC < 115 mm as criteria for severe
malnutrition among
children of age 6 months and above.
6.1. NCHS/WHO Reference Standards
The reference standards most commonly used
to standardize measurements were developed
by the US National Center for Health Statistics
(NCHS) and are recommended for international
use by the World Health Organization. The
reference population chosen by NCHS was
a statistically valid random population of
healthy infants and children. Questions have
frequently been raised about the validity of
the US-based NCHS reference standards for
populations from other ethnic backgrounds.
Available evidence suggests that until the age
of approximately 10 years, children from wellnourished
and healthy families throughout
the world grow at approximately the same
rate and attain the same height and weight
as children from industrialized countries.
The NCHS/WHO reference standards are
available for children up to 18 years old
but are most accurate when limited to use
with children up to the age of 10 years. The
NCHS/WHO international reference tables
can be used for standardizing anthropometric
18.
data from around the world and can be found
on FANTA’s website at www.fantaproject.org/
publications/anthropometry.shtml.
6.2. Comparisons to the Reference Standard
References are used to standardize a child’s
measurement by comparing the child’s
measurement with the median or average
measure for children at the same age and sex.
For example, if the length of a 3 month old
boy is 57 cm, it would be difficult to know
if that was reflective of a healthy 3 month
old boy without comparison to a reference
standard. The reference or median length for
a population of 3 month old boys is 61.1 cm
and the simple comparison of lengths would
conclude that the child was almost 4 cm
shorter than could be expected.
When describing the differences from the
reference, a numeric value can be standardized
to enable children of different ages and sexes
to be compared. Using the example above,
the boy is 4 cm shorter than the reference
child but this does not take the age or the sex
of the child into consideration. Comparing
a 4 cm difference from the reference for a
6. Comparison of Anthropometric Data to Reference
Standards
40
child 3 months old is not the same as a 4 cm
difference from the reference for a 9 year old
child, because of their relatively different body
sizes.
Taking age and sex into consideration,
differences in measurements can be expressed
a number of ways:
• standard deviation units, or Z-scores
• percentage of the median
• percentiles
19.
To standardize reporting, USAID recommends
that Cooperating Sponsors calculate
percentages of children below cut-offs as well
as other statistics using Z-scores. If Z-scores
cannot be used, percentage of the median
should be used.
6.3. Standard Deviation Units or Z-Scores
Z-scores are more commonly used by the
international nutrition community because
they offer two major advantages. First, using
Z-scores allows us to identify a fixed point
in the distributions of different indices and
across different ages. For all indices for all
ages, 2.28% of the reference population lie
below a cut-off of -2 Z-scores. The percent of
the median does not have this characteristic.
For example, because weight and height
have different distributions (variances), -2
Z-scores on the weight-for-age distribution
is about 80% of the median, and -2 Z-scores
on the height-for-age distribution is about
90% of the median. Further, the proportion
of the population identified by a particular
percentage of the median varies at different
ages on the same index.
The second major advantage of using Zscores
is that useful summary statistics can be
calculated from them. The approach allows the
mean and standard deviation to be calculated
for the Z-scores for a group of children. The
Z-score application is considered the simplest
way of describing the reference population and
making comparisons to it. It is the statistic
recommended for use when reporting results of
nutritional assessments. Examples of Z-score
calculations are presented in Appendix 1.
The Z-score or standard deviation unit (SD)
is defined as the difference between the value
20.
for an individual and the median value of
the reference population for the same age or
height, divided by the standard deviation of the
reference population. This can be written in
equation form as:
6.4. Percentage of the Median and Percentiles
The percentage of the median is defined as the
ratio of a measured or observed value in the
individual to the median value of the reference
data for the same age or height for the specific
sex, expressed as a percentage. This can be
written in equation form as:
(observed value) - (median reference value)
standard deviation of reference population
Z-score (or SD-score) =
observed value
median value of reference population
Percent of median = x 100
6. Comparison of Anthropometric Data to Reference
Standards
41
The median is the value at exactly the midpoint
between the largest and smallest. If a
child’s measurement is exactly the same as
the median of the reference population we say
that they are “100% of the median.” Examples
of calculations for percent of median can be
found in Appendix 1.
The percentile is the rank position of an
individual on a given reference distribution,
stated in terms of what percentage of the group
the individual equals or exceeds. Percentiles
will not be presented in this guide.
The distribution of Z-scores follows a normal
(bell-shaped or Gaussian) distribution. The
commonly used cut-offs of -3, -2, and -1 Zscores
are, respectively, the 0.13th, 2.28th,
and 15.8th percentiles. The percentiles can be
21.
thought of as the percentage of children in the
reference population below the equivalent cutoff.
Approximately 0.13 percent of children
would be expected to be below -3 Z-score in a
normally distributed population.
Z-score Percentile
-3 0.13
-2 2.28
-1 15.8
6.5. Cut-offs
The use of a cut-off enables the different
individual measurements to be converted into
prevalence statistics. Cut-offs are also used
for identifying those children suffering from
or at a higher risk of adverse outcomes. The
children screened under such circumstances
may be identified as eligible for special care.
The most commonly-used cut-off with
Z-scores is -2 standard deviations,
irrespective of the indicator used. This
means children with a Z-score for
underweight, stunting or wasting, below
-2 SD are considered moderately or
severely malnourished. For example, a
child with a Z-score for height-for-age of
-2.56 is considered stunted, whereas a child
with a Z-score of -1.78 is not classified as
stunted.
In the reference population, by definition,
2.28% of the children would be below -2 SD
and 0.13% would be below -3 SD (a cut-off
reflective of a severe condition). In some
cases, the cut-off for defining malnutrition
used is -1 SD (e.g. in Latin America). In the
reference or healthy population, 15.8% would
be below a cut-off of -1 SD. The use of -1 SD
is generally discouraged as a cut-off due to the
large percentage of healthy children normally
22.
falling below this cut-off. For example, the
1995 DHS survey using a –2 SD cut-off for
stunting in Uganda found a 36% prevalence of
stunting in under-three year olds. This level
of stunting is about 16 times the level of the
reference population.
A comparison of cutoffs for percent of median
and Z-scores illustrates the following:
90% = -1 Z-score
80% = -2 Z-score
70% = -3 Z-score (approx.)
60% = -4 Z-score (approx.)
6.5.1. Cut-off points for MUAC for the
6 - 59 month age group
MUAC cut-offs are somewhat arbitrary
due to its lack of precision as a measure of
malnutrition. A cut-off of 11.0 cm can be
used for screening severely malnourished
children. Those children with MUAC below
12.5 cm with or without edema are classified
as moderate and severe.
Global Acute Malnutrition is a term
generally used in emergency settings. The
global malnutrition rate refers to the percent
of children 6 to 59 months with weight-forheight
below -2 Z-scores or 80% median
or MUAC below 12.5 cm, with or without
edema. This refers to all moderate and severe
malnutrition combined. The combination of
a low weight-for-height and any child with
edema contributes to those children counted as
in the global acute malnutrition statistic.
C O M PA R I S O N O F A N T H RO P O M E T R I C DATA TO R
E F E R E N C E S TA N DA R D S PA RT 6 .
6. Comparison of Anthropometric Data to Reference
Standards
42
6.5.2. Malnutrition Classification
23.
Systems
The cut-off points for different malnutrition
classification systems are listed below. The
most widely used system is WHO classification
(Z-scores). The Road-to-Health (RTH) system
is typically seen in clinic-based growthmonitoring
systems. The Gomez system
was widely used in the 1960s and 1970s,
but is only used in a few countries now. An
analysis of prevalence elicits different results
from different systems. These results would
not be directly comparable. The difference
is especially broad at the severe malnutrition
cut-off between the WHO method (Z-scores)
and percent of median methods. At 60% of
the median, the closest corresponding Z-score
is –4. The WHO method is recommended for
analysis and presentation of data (see Part
6.2).
Mild, moderate and severe are different
in each of the classification systems listed
below. It is important to use the same system
to analyze and present data. The RTH and
Gomez classification systems typically use
weight-for-age.
System Cut-off Malnutrition classification
WHO < -1 to > -2 Z-score mild
< -2 to > -3 Z-score moderate
< -3 Z-score severe
RTH > 80% of median normal
60% - < 80% of median mild-to-moderate
< 60% of median severe
Gomez > 90% of median normal
75% - < 90% of median mild
60% - < 75% of median moderate
< 60% of median severe
_
_
24.
_
6. ComparisoOur girl therefore has moderate protein-
energy malnutrition, as defined by weight-for-height z-
score.
n
Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.
Be the first to comment