Standardization and Norming Explained

Test
Standardization and
Norming
Prepared by: Hannah Grace L. Gilo

What is Norms?
• In a singular manner, norm refer to the
behavior that is usual, average, normal,
standard, expected or typical.
• In psychometric text, norms are test
performance data of a particular group of
test takers that are designed for use as a
reference when evaluating or interpreting
individual test scores.

The Nature of Norms
• Norming is basically a procedure that facilitates the test user’s
interpretation of test score.
– In the absence of additional interpretative data, a raw score on
any psychological test is meaningless.
– Scores on psychological test are most commonly interpreted by
reference to norms that represent the test performance of the
standardization sample.
– The norms are thus empirically established by determining what
persons in a representative group actually do on a test.
– Any individual’s raw score is then referred to the distribution of
scores obtained by the standardization sample, discover where
he or she falls in that distribution.

Purpose of Norms
• They indicate the individual’s relative
standing in the normative sample, and
thus permit an evaluation of his
performance in reference to other persons.
• They provide comparable measures that
permit a direct comparison of that
individual’s performance in different tests.

The Nature of Norms
• Derived scores (norms) are expressed in
one of two major ways:
– 1. Developmental level attained
– 2. Relative position within a specified group

The Nature of Norms
• Normative sample
–the group of people whose performance on a
particular test is analyzed for reference in evaluating the
performance of individual test takers.
●Members of the normative sample will all be typical with
respect to some characteristic(s) of the people for whom
the particular test was designed.
●A test administration to this representative sample of test
takers yields a distribution (or distributions) of scores.
These data constitute the norms for the test

Sampling to Develop Norms
• The process of administering a test to a
representative sample of test takers for the
purpose of establishing norms is referred
to as standardization or test
standardization.

Sampling Methods
1. Stratified sampling
– considers certain characteristics that must
be proportionately represented in the
sample (helps prevent sampling bias and
ultimately aids in the interpretation of the
findings).
2. Stratified random sampling
–when members from the identified strata
are obtained randomly

Sampling Methods
3. Purposive sampling
–if we arbitrarily select some sample because we
believe it to be representative of the population
4. Incidental / Convenience sampling
– often used for practical reasons, utilizes the most
available individuals. Generalization of findings
from incidental samples must be made with
caution.

Developing norms for a
standardized test
◆ The test developer administers the test according to the standard set
of instructions that will be used in the test, including the recommended
setting (for the test).
◆ The test developer summarizes the data using descriptive statistics,
including measures of central tendency and variability.
●The test developer also provides a precise description of the
standardization sample itself.
● In order to best assist future test users, test developers are
encouraged to “describe the population(s) represented by any norms
or comparison group(s), the dates the data were gathered, and the
process used to select the sample of test takers” (Code of Fair Testing
Practices, 1988, p. 3).

Developing norms for a
standardized test
• The process of administering a test to a
representative sample of testtakers for the
purpose of establishing norms is referred to as
standardization or test standardization.
• Standardized test are tools designed to allow to
measure of students performance relative to all
others taking the same test, it has a clearly
specified procedures for administration and
scoring, typically including normative data.

Types of Norms
1. Percentile
-An expression of the percentage of people whose
score on a test or measure falls on a particular raw score.
-A ranking that conveys information about the
relative position of a score within a distribution of scores.
-A converted score that refers to a percentage of test
takers
-Percentage correct: refers to a distribution of test
scores –the number of items that were answered correctly
divided by the total number of items and multiplied by 100

In solving Percentile Rank, use
the formula:
PR= CF+0.5F X 100
n
where,
PR– percentage rank
CF– cumulative frequency below the given
score
F– frequency of the given score
n– number of scores in the distribution

■Disadvantage:
◆ Real differences between raw scores are minimized near the ends of
the distribution, and exaggerated in the middle of the distribution.
◆ Differences between raw scores that cluster in the middle may be too
small, yet even the smallest differences will appear as differences in
percentiles.
◆At the ends of the distribution, differences in raw scores may be
great, but these are reflected as relatively small differences in
percentiles.
◆ Percentiles show each individual’s relative position in the normative
sample, but not the amount of difference between scores

2. Age Norm (age-equivalent
scores)
• “indicate the average performance of different
samples of testtakers who were at various ages
at the time the test was administered.”
• Example: If the measurement under
consideration is height in inches, for example,
then we know that scores (heights) for children
will gradually increase at various rates as a
function of age up to the middle to late teens.

CONTD.
▪ In practice, a child of any chronological age
whose performance on a valid test of
intellectual ability indicated that he or she
had intellectual ability similar to that of an
average child of some other age was said to
have the “mental age” of the norm group in
which his or her test score fell

CONTD.
● The use of “mental age” can be problematic. A six-year old who
performs intellectually like a 12-year old, may be said to have that
mental age. But the six-year old is likely not to be similar at all to the
average 12-year old socially, psychologically, and in many other key
respects.
●IQ standard deviations are not constant with age. At one age, an IQ of
116 might be indicative of performance at 1 standard deviation above
the mean, whereas at another age, an IQ of 121 might be indicative of
performance at 1 standard deviation above the mean.
● Intellectual development progresses more rapidly at the earlier ages,
and gradually decreases as the individual matures

3. Grade Norms
• Designed to indicate the average test
performance of testtakers in a given school
grade, grade norms are developed by
administering the test to representative samples
of children over a range of consecutive grade
levels (such as ﬁrst through sixth grades).
• Based on the ten month scale, refers to grade
and month (ex. 7.3 is equivalent to seventh
grade, third month)

CONTD.
● They are found by administering the test to
representative samples of children over a range of
consecutive grade levels. Then, the mean or median score
for children at each grade is calculated.
♢ For example, if the average number of problems solved
correctly by fourth graders in the representative sample is
23, then a raw score of 23 corresponds to a grade
equivalent of 4.
♢ They have widespread application, especially to children
of elementary school age

CONTD.
♢ Issue: Does a student in twelfth grade who scores “6” on
a grade-normed spelling test have the same spelling
abilities as the average sixth grader?
♢The answer is . . . NO.
♢ Grade norms DO NOT provide information as to the
content of type of items that a student could or could not
answer correctly.
♢ The primary use of grade norms is as a convenient,
readily understandable gauge of how one student’s
performance compares with that of fellow students in the
same grade.

Note to remember
• Both grade norms and age norms are
referred to more generally as
developmental norms, a term applied
broadly to norms developed on the basis
of any trait, ability, skill, or other
characteristic that is presumed to develop,
deteriorate, or otherwise be affected by
chronological age, school grade, or stage
of life.

Other types of Norms
1. National Norms
• derived from a normative sample that was
nationally representative of the population
at the time the norming study was
conducted.
• Variables of interest include age, gender,
ethnic background, socio-economic strata,
geographical location, etc.

CONTD.
• Factors related to the representativeness of the school from which
members of the norming sample were drawn might also be criteria
for inclusion in or exclusion from the sample.
– For example, in the school, is the student attends publicly
funded, privately funded, religiously oriented, military, or
something else? How representative are the pupil/teacher ratios
in the school under consideration? Does the school have a
library, and if so, how many books are in it? These are only a
sample of the types of questions that could be raised in
assembling a normative sample to be used in the establishment
of national norms. The precise nature of the questions raised
when developing national norms will depend on whom the test is
designed for and what the test is designed to do.

Problem using National Norms
• Norms from many different tests may all claim
to have nationally representative samples. Still,
close scrutiny of the description of the sample
employed may reveal that the sample differs in
many important respects from similar tests also
claiming to be based on a nationally
representative sample.
• For this reason, it is always a good idea to check
the manual of the tests under consideration to
see exactly how comparable the tests are.

2. National Anchor Norms
–an equivalency table for scores on two
nationally standardized tests designed to
measure the same thing. They provide some
stability to test scores by anchoring
(comparing) them to other test scores.

3. Subgroup Norms
• –norms for any defined group within a
larger group.
• –normative information about some limited
population, frequently of specific interest to
a test user.
4. Local Norms

The Normative Sample
◆ Any norm, however expressed, is restricted to
the particular normative population from which it
was derived.
◆ Psychological test norms are not absolute,
universal, or permanent. They merely represent
the test performance of the persons constituting
the standardization sample.
-In the development and application of test norms,
considerable attention should be given to the
standardization sample.

The Normative Sample
◆ It should be large enough to provide stable
values;
◆ Similarly chosen people of the same population
should NOT yield norms that diverge appreciably
from those obtained;
◆ Should be representative of the population
under consideration
● Be careful of institutional samples
(schools,prisons, mental patients)
◆ Define the specific population to which norms
can be generalized.

Norm-Referenced Versus
Criterion Referenced
Evaluation

Criterion Referenced Evaluation
• Criterion referenced testing and assessment may be deﬁned as a
method of evaluation and a way of deriving meaning from test
scores by evaluating an individual’s score with reference to a set
standard.
• Absolute grading
Some examples:
■ To be eligible for a high-school diploma, students must
demonstrate at least a sixth-grade reading level.
■ To earn the privilege of driving an automobile, would-be
drivers must take a road test and demonstrate their driving skill to the
satisfaction of a state-appointed examiner.

CONTD.
Examples:
■ To be licensed as a psychologist, the applicant must achieve a score
that meets or exceeds the score mandated by the state on the licensing
test. The criterion in criterion-referenced assessments typically derives
from the values or standards of an individual or organization. For
example, in order to earn a black belt in karate, students must
demonstrate a black-belt level of proﬁciency in karate and meet related
criteria such as those related to self-discipline and focus. Each student
is evaluated individually to see if all of these criteria are met.
Regardless of the level of performance of all the testtakers, only
students who meet all the criteria will leave the dojo (training room)
with a brand-new black belt.
■ The NAEP, The ACT Test, The Smarter Balanced Assessment
Test(SBAT)

CONTD.
• Criterion-referenced scores are most appropriate when
an educator wants to assess the specific concepts or
skills a student has learned through classroom
instruction. Most criterion-referenced assessments have
a cut score, which determines success or failure based
on an established percentage correct.
• For example, in my class, in order for a student to
successfully demonstrate their knowledge of the math
concepts we discuss, they must answer at least 80% of
the test questions correctly. Your child earned an 85%
on his last fractions test; therefore, he demonstrated
knowledge of the subject area and passed.

Norm-Referenced Evaluation
• In norm-referenced interpretations of test data, a usual
area of focus is how an individual performed relative to
other people who took the test.
• Student’s performance is communicated in percentile
ranks, grade-equivalent scores, normal-curve
equivalents, and scaled scores.
• Relative grading
Example:
– The SAT Test
– Wechsler Intelligence Scale
– IQ Tests

Reference
• Cohen−Swerdlik. Psychological Testing
and Assessment: An Introduction to
Tests and Measurement 7th
Edition. McGraw-Hill

Standardization and Norming Explained

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Standardization and Norming Explained

Similar to Standardization and Norming Explained (20)

Recently uploaded

Recently uploaded (20)

Standardization and Norming Explained