Test
Standardization and
Norming
Prepared by: Hannah Grace L. Gilo
What is Norms?
• In a singular manner, norm refer to the
behavior that is usual, average, normal,
standard, expected or typical.
• In psychometric text, norms are test
performance data of a particular group of
test takers that are designed for use as a
reference when evaluating or interpreting
individual test scores.
The Nature of Norms
• Norming is basically a procedure that facilitates the test user’s
interpretation of test score.
– In the absence of additional interpretative data, a raw score on
any psychological test is meaningless.
– Scores on psychological test are most commonly interpreted by
reference to norms that represent the test performance of the
standardization sample.
– The norms are thus empirically established by determining what
persons in a representative group actually do on a test.
– Any individual’s raw score is then referred to the distribution of
scores obtained by the standardization sample, discover where
he or she falls in that distribution.
Purpose of Norms
• They indicate the individual’s relative
standing in the normative sample, and
thus permit an evaluation of his
performance in reference to other persons.
• They provide comparable measures that
permit a direct comparison of that
individual’s performance in different tests.
The Nature of Norms
• Derived scores (norms) are expressed in
one of two major ways:
– 1. Developmental level attained
– 2. Relative position within a specified group
The Nature of Norms
• Normative sample
–the group of people whose performance on a
particular test is analyzed for reference in evaluating the
performance of individual test takers.
●Members of the normative sample will all be typical with
respect to some characteristic(s) of the people for whom
the particular test was designed.
●A test administration to this representative sample of test
takers yields a distribution (or distributions) of scores.
These data constitute the norms for the test
Sampling to Develop Norms
• The process of administering a test to a
representative sample of test takers for the
purpose of establishing norms is referred
to as standardization or test
standardization.
Sampling Methods
1. Stratified sampling
– considers certain characteristics that must
be proportionately represented in the
sample (helps prevent sampling bias and
ultimately aids in the interpretation of the
findings).
2. Stratified random sampling
–when members from the identified strata
are obtained randomly
Sampling Methods
3. Purposive sampling
–if we arbitrarily select some sample because we
believe it to be representative of the population
4. Incidental / Convenience sampling
– often used for practical reasons, utilizes the most
available individuals. Generalization of findings
from incidental samples must be made with
caution.
Developing norms for a
standardized test
◆ The test developer administers the test according to the standard set
of instructions that will be used in the test, including the recommended
setting (for the test).
◆ The test developer summarizes the data using descriptive statistics,
including measures of central tendency and variability.
●The test developer also provides a precise description of the
standardization sample itself.
● In order to best assist future test users, test developers are
encouraged to “describe the population(s) represented by any norms
or comparison group(s), the dates the data were gathered, and the
process used to select the sample of test takers” (Code of Fair Testing
Practices, 1988, p. 3).
Developing norms for a
standardized test
• The process of administering a test to a
representative sample of testtakers for the
purpose of establishing norms is referred to as
standardization or test standardization.
• Standardized test are tools designed to allow to
measure of students performance relative to all
others taking the same test, it has a clearly
specified procedures for administration and
scoring, typically including normative data.
Types of Norms
1. Percentile
-An expression of the percentage of people whose
score on a test or measure falls on a particular raw score.
-A ranking that conveys information about the
relative position of a score within a distribution of scores.
-A converted score that refers to a percentage of test
takers
-Percentage correct: refers to a distribution of test
scores –the number of items that were answered correctly
divided by the total number of items and multiplied by 100
In solving Percentile Rank, use
the formula:
PR= CF+0.5F X 100
n
where,
PR– percentage rank
CF– cumulative frequency below the given
score
F– frequency of the given score
n– number of scores in the distribution
Example
■Disadvantage:
◆ Real differences between raw scores are minimized near the ends of
the distribution, and exaggerated in the middle of the distribution.
◆ Differences between raw scores that cluster in the middle may be too
small, yet even the smallest differences will appear as differences in
percentiles.
◆At the ends of the distribution, differences in raw scores may be
great, but these are reflected as relatively small differences in
percentiles.
◆ Percentiles show each individual’s relative position in the normative
sample, but not the amount of difference between scores
2. Age Norm (age-equivalent
scores)
• “indicate the average performance of different
samples of testtakers who were at various ages
at the time the test was administered.”
• Example: If the measurement under
consideration is height in inches, for example,
then we know that scores (heights) for children
will gradually increase at various rates as a
function of age up to the middle to late teens.
CONTD.
▪ In practice, a child of any chronological age
whose performance on a valid test of
intellectual ability indicated that he or she
had intellectual ability similar to that of an
average child of some other age was said to
have the “mental age” of the norm group in
which his or her test score fell
CONTD.
● The use of “mental age” can be problematic. A six-year old who
performs intellectually like a 12-year old, may be said to have that
mental age. But the six-year old is likely not to be similar at all to the
average 12-year old socially, psychologically, and in many other key
respects.
●IQ standard deviations are not constant with age. At one age, an IQ of
116 might be indicative of performance at 1 standard deviation above
the mean, whereas at another age, an IQ of 121 might be indicative of
performance at 1 standard deviation above the mean.
● Intellectual development progresses more rapidly at the earlier ages,
and gradually decreases as the individual matures
3. Grade Norms
• Designed to indicate the average test
performance of testtakers in a given school
grade, grade norms are developed by
administering the test to representative samples
of children over a range of consecutive grade
levels (such as first through sixth grades).
• Based on the ten month scale, refers to grade
and month (ex. 7.3 is equivalent to seventh
grade, third month)
CONTD.
● They are found by administering the test to
representative samples of children over a range of
consecutive grade levels. Then, the mean or median score
for children at each grade is calculated.
♢ For example, if the average number of problems solved
correctly by fourth graders in the representative sample is
23, then a raw score of 23 corresponds to a grade
equivalent of 4.
♢ They have widespread application, especially to children
of elementary school age
CONTD.
♢ Issue: Does a student in twelfth grade who scores “6” on
a grade-normed spelling test have the same spelling
abilities as the average sixth grader?
♢The answer is . . . NO.
♢ Grade norms DO NOT provide information as to the
content of type of items that a student could or could not
answer correctly.
♢ The primary use of grade norms is as a convenient,
readily understandable gauge of how one student’s
performance compares with that of fellow students in the
same grade.
Note to remember
• Both grade norms and age norms are
referred to more generally as
developmental norms, a term applied
broadly to norms developed on the basis
of any trait, ability, skill, or other
characteristic that is presumed to develop,
deteriorate, or otherwise be affected by
chronological age, school grade, or stage
of life.
Other types of Norms
1. National Norms
• derived from a normative sample that was
nationally representative of the population
at the time the norming study was
conducted.
• Variables of interest include age, gender,
ethnic background, socio-economic strata,
geographical location, etc.
CONTD.
• Factors related to the representativeness of the school from which
members of the norming sample were drawn might also be criteria
for inclusion in or exclusion from the sample.
– For example, in the school, is the student attends publicly
funded, privately funded, religiously oriented, military, or
something else? How representative are the pupil/teacher ratios
in the school under consideration? Does the school have a
library, and if so, how many books are in it? These are only a
sample of the types of questions that could be raised in
assembling a normative sample to be used in the establishment
of national norms. The precise nature of the questions raised
when developing national norms will depend on whom the test is
designed for and what the test is designed to do.
Problem using National Norms
• Norms from many different tests may all claim
to have nationally representative samples. Still,
close scrutiny of the description of the sample
employed may reveal that the sample differs in
many important respects from similar tests also
claiming to be based on a nationally
representative sample.
• For this reason, it is always a good idea to check
the manual of the tests under consideration to
see exactly how comparable the tests are.
2. National Anchor Norms
–an equivalency table for scores on two
nationally standardized tests designed to
measure the same thing. They provide some
stability to test scores by anchoring
(comparing) them to other test scores.
3. Subgroup Norms
• –norms for any defined group within a
larger group.
• –normative information about some limited
population, frequently of specific interest to
a test user.
4. Local Norms
The Normative Sample
◆ Any norm, however expressed, is restricted to
the particular normative population from which it
was derived.
◆ Psychological test norms are not absolute,
universal, or permanent. They merely represent
the test performance of the persons constituting
the standardization sample.
-In the development and application of test norms,
considerable attention should be given to the
standardization sample.
The Normative Sample
◆ It should be large enough to provide stable
values;
◆ Similarly chosen people of the same population
should NOT yield norms that diverge appreciably
from those obtained;
◆ Should be representative of the population
under consideration
● Be careful of institutional samples
(schools,prisons, mental patients)
◆ Define the specific population to which norms
can be generalized.
Norm-Referenced Versus
Criterion Referenced
Evaluation
Criterion Referenced Evaluation
• Criterion referenced testing and assessment may be defined as a
method of evaluation and a way of deriving meaning from test
scores by evaluating an individual’s score with reference to a set
standard.
• Absolute grading
Some examples:
■ To be eligible for a high-school diploma, students must
demonstrate at least a sixth-grade reading level.
■ To earn the privilege of driving an automobile, would-be
drivers must take a road test and demonstrate their driving skill to the
satisfaction of a state-appointed examiner.
CONTD.
Examples:
■ To be licensed as a psychologist, the applicant must achieve a score
that meets or exceeds the score mandated by the state on the licensing
test. The criterion in criterion-referenced assessments typically derives
from the values or standards of an individual or organization. For
example, in order to earn a black belt in karate, students must
demonstrate a black-belt level of proficiency in karate and meet related
criteria such as those related to self-discipline and focus. Each student
is evaluated individually to see if all of these criteria are met.
Regardless of the level of performance of all the testtakers, only
students who meet all the criteria will leave the dojo (training room)
with a brand-new black belt.
■ The NAEP, The ACT Test, The Smarter Balanced Assessment
Test(SBAT)
CONTD.
• Criterion-referenced scores are most appropriate when
an educator wants to assess the specific concepts or
skills a student has learned through classroom
instruction. Most criterion-referenced assessments have
a cut score, which determines success or failure based
on an established percentage correct.
• For example, in my class, in order for a student to
successfully demonstrate their knowledge of the math
concepts we discuss, they must answer at least 80% of
the test questions correctly. Your child earned an 85%
on his last fractions test; therefore, he demonstrated
knowledge of the subject area and passed.
Norm-Referenced Evaluation
• In norm-referenced interpretations of test data, a usual
area of focus is how an individual performed relative to
other people who took the test.
• Student’s performance is communicated in percentile
ranks, grade-equivalent scores, normal-curve
equivalents, and scaled scores.
• Relative grading
Example:
– The SAT Test
– Wechsler Intelligence Scale
– IQ Tests
Thank You for Listening!
Reference
• Cohen−Swerdlik. Psychological Testing
and Assessment: An Introduction to
Tests and Measurement 7th
Edition. McGraw-Hill

Test standardization and norming

  • 1.
  • 3.
    What is Norms? •In a singular manner, norm refer to the behavior that is usual, average, normal, standard, expected or typical. • In psychometric text, norms are test performance data of a particular group of test takers that are designed for use as a reference when evaluating or interpreting individual test scores.
  • 4.
    The Nature ofNorms • Norming is basically a procedure that facilitates the test user’s interpretation of test score. – In the absence of additional interpretative data, a raw score on any psychological test is meaningless. – Scores on psychological test are most commonly interpreted by reference to norms that represent the test performance of the standardization sample. – The norms are thus empirically established by determining what persons in a representative group actually do on a test. – Any individual’s raw score is then referred to the distribution of scores obtained by the standardization sample, discover where he or she falls in that distribution.
  • 5.
    Purpose of Norms •They indicate the individual’s relative standing in the normative sample, and thus permit an evaluation of his performance in reference to other persons. • They provide comparable measures that permit a direct comparison of that individual’s performance in different tests.
  • 6.
    The Nature ofNorms • Derived scores (norms) are expressed in one of two major ways: – 1. Developmental level attained – 2. Relative position within a specified group
  • 7.
    The Nature ofNorms • Normative sample –the group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual test takers. ●Members of the normative sample will all be typical with respect to some characteristic(s) of the people for whom the particular test was designed. ●A test administration to this representative sample of test takers yields a distribution (or distributions) of scores. These data constitute the norms for the test
  • 8.
    Sampling to DevelopNorms • The process of administering a test to a representative sample of test takers for the purpose of establishing norms is referred to as standardization or test standardization.
  • 9.
    Sampling Methods 1. Stratifiedsampling – considers certain characteristics that must be proportionately represented in the sample (helps prevent sampling bias and ultimately aids in the interpretation of the findings). 2. Stratified random sampling –when members from the identified strata are obtained randomly
  • 10.
    Sampling Methods 3. Purposivesampling –if we arbitrarily select some sample because we believe it to be representative of the population 4. Incidental / Convenience sampling – often used for practical reasons, utilizes the most available individuals. Generalization of findings from incidental samples must be made with caution.
  • 11.
    Developing norms fora standardized test ◆ The test developer administers the test according to the standard set of instructions that will be used in the test, including the recommended setting (for the test). ◆ The test developer summarizes the data using descriptive statistics, including measures of central tendency and variability. ●The test developer also provides a precise description of the standardization sample itself. ● In order to best assist future test users, test developers are encouraged to “describe the population(s) represented by any norms or comparison group(s), the dates the data were gathered, and the process used to select the sample of test takers” (Code of Fair Testing Practices, 1988, p. 3).
  • 12.
    Developing norms fora standardized test • The process of administering a test to a representative sample of testtakers for the purpose of establishing norms is referred to as standardization or test standardization. • Standardized test are tools designed to allow to measure of students performance relative to all others taking the same test, it has a clearly specified procedures for administration and scoring, typically including normative data.
  • 13.
    Types of Norms 1.Percentile -An expression of the percentage of people whose score on a test or measure falls on a particular raw score. -A ranking that conveys information about the relative position of a score within a distribution of scores. -A converted score that refers to a percentage of test takers -Percentage correct: refers to a distribution of test scores –the number of items that were answered correctly divided by the total number of items and multiplied by 100
  • 14.
    In solving PercentileRank, use the formula: PR= CF+0.5F X 100 n where, PR– percentage rank CF– cumulative frequency below the given score F– frequency of the given score n– number of scores in the distribution
  • 15.
  • 16.
    ■Disadvantage: ◆ Real differencesbetween raw scores are minimized near the ends of the distribution, and exaggerated in the middle of the distribution. ◆ Differences between raw scores that cluster in the middle may be too small, yet even the smallest differences will appear as differences in percentiles. ◆At the ends of the distribution, differences in raw scores may be great, but these are reflected as relatively small differences in percentiles. ◆ Percentiles show each individual’s relative position in the normative sample, but not the amount of difference between scores
  • 17.
    2. Age Norm(age-equivalent scores) • “indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered.” • Example: If the measurement under consideration is height in inches, for example, then we know that scores (heights) for children will gradually increase at various rates as a function of age up to the middle to late teens.
  • 18.
    CONTD. ▪ In practice,a child of any chronological age whose performance on a valid test of intellectual ability indicated that he or she had intellectual ability similar to that of an average child of some other age was said to have the “mental age” of the norm group in which his or her test score fell
  • 19.
    CONTD. ● The useof “mental age” can be problematic. A six-year old who performs intellectually like a 12-year old, may be said to have that mental age. But the six-year old is likely not to be similar at all to the average 12-year old socially, psychologically, and in many other key respects. ●IQ standard deviations are not constant with age. At one age, an IQ of 116 might be indicative of performance at 1 standard deviation above the mean, whereas at another age, an IQ of 121 might be indicative of performance at 1 standard deviation above the mean. ● Intellectual development progresses more rapidly at the earlier ages, and gradually decreases as the individual matures
  • 20.
    3. Grade Norms •Designed to indicate the average test performance of testtakers in a given school grade, grade norms are developed by administering the test to representative samples of children over a range of consecutive grade levels (such as first through sixth grades). • Based on the ten month scale, refers to grade and month (ex. 7.3 is equivalent to seventh grade, third month)
  • 21.
    CONTD. ● They arefound by administering the test to representative samples of children over a range of consecutive grade levels. Then, the mean or median score for children at each grade is calculated. ♢ For example, if the average number of problems solved correctly by fourth graders in the representative sample is 23, then a raw score of 23 corresponds to a grade equivalent of 4. ♢ They have widespread application, especially to children of elementary school age
  • 22.
    CONTD. ♢ Issue: Doesa student in twelfth grade who scores “6” on a grade-normed spelling test have the same spelling abilities as the average sixth grader? ♢The answer is . . . NO. ♢ Grade norms DO NOT provide information as to the content of type of items that a student could or could not answer correctly. ♢ The primary use of grade norms is as a convenient, readily understandable gauge of how one student’s performance compares with that of fellow students in the same grade.
  • 23.
    Note to remember •Both grade norms and age norms are referred to more generally as developmental norms, a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life.
  • 24.
    Other types ofNorms 1. National Norms • derived from a normative sample that was nationally representative of the population at the time the norming study was conducted. • Variables of interest include age, gender, ethnic background, socio-economic strata, geographical location, etc.
  • 25.
    CONTD. • Factors relatedto the representativeness of the school from which members of the norming sample were drawn might also be criteria for inclusion in or exclusion from the sample. – For example, in the school, is the student attends publicly funded, privately funded, religiously oriented, military, or something else? How representative are the pupil/teacher ratios in the school under consideration? Does the school have a library, and if so, how many books are in it? These are only a sample of the types of questions that could be raised in assembling a normative sample to be used in the establishment of national norms. The precise nature of the questions raised when developing national norms will depend on whom the test is designed for and what the test is designed to do.
  • 26.
    Problem using NationalNorms • Norms from many different tests may all claim to have nationally representative samples. Still, close scrutiny of the description of the sample employed may reveal that the sample differs in many important respects from similar tests also claiming to be based on a nationally representative sample. • For this reason, it is always a good idea to check the manual of the tests under consideration to see exactly how comparable the tests are.
  • 27.
    2. National AnchorNorms –an equivalency table for scores on two nationally standardized tests designed to measure the same thing. They provide some stability to test scores by anchoring (comparing) them to other test scores.
  • 28.
    3. Subgroup Norms •–norms for any defined group within a larger group. • –normative information about some limited population, frequently of specific interest to a test user. 4. Local Norms
  • 29.
    The Normative Sample ◆Any norm, however expressed, is restricted to the particular normative population from which it was derived. ◆ Psychological test norms are not absolute, universal, or permanent. They merely represent the test performance of the persons constituting the standardization sample. -In the development and application of test norms, considerable attention should be given to the standardization sample.
  • 30.
    The Normative Sample ◆It should be large enough to provide stable values; ◆ Similarly chosen people of the same population should NOT yield norms that diverge appreciably from those obtained; ◆ Should be representative of the population under consideration ● Be careful of institutional samples (schools,prisons, mental patients) ◆ Define the specific population to which norms can be generalized.
  • 31.
  • 32.
    Criterion Referenced Evaluation •Criterion referenced testing and assessment may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard. • Absolute grading Some examples: ■ To be eligible for a high-school diploma, students must demonstrate at least a sixth-grade reading level. ■ To earn the privilege of driving an automobile, would-be drivers must take a road test and demonstrate their driving skill to the satisfaction of a state-appointed examiner.
  • 33.
    CONTD. Examples: ■ To belicensed as a psychologist, the applicant must achieve a score that meets or exceeds the score mandated by the state on the licensing test. The criterion in criterion-referenced assessments typically derives from the values or standards of an individual or organization. For example, in order to earn a black belt in karate, students must demonstrate a black-belt level of proficiency in karate and meet related criteria such as those related to self-discipline and focus. Each student is evaluated individually to see if all of these criteria are met. Regardless of the level of performance of all the testtakers, only students who meet all the criteria will leave the dojo (training room) with a brand-new black belt. ■ The NAEP, The ACT Test, The Smarter Balanced Assessment Test(SBAT)
  • 34.
    CONTD. • Criterion-referenced scoresare most appropriate when an educator wants to assess the specific concepts or skills a student has learned through classroom instruction. Most criterion-referenced assessments have a cut score, which determines success or failure based on an established percentage correct. • For example, in my class, in order for a student to successfully demonstrate their knowledge of the math concepts we discuss, they must answer at least 80% of the test questions correctly. Your child earned an 85% on his last fractions test; therefore, he demonstrated knowledge of the subject area and passed.
  • 36.
    Norm-Referenced Evaluation • Innorm-referenced interpretations of test data, a usual area of focus is how an individual performed relative to other people who took the test. • Student’s performance is communicated in percentile ranks, grade-equivalent scores, normal-curve equivalents, and scaled scores. • Relative grading Example: – The SAT Test – Wechsler Intelligence Scale – IQ Tests
  • 38.
    Thank You forListening!
  • 39.
    Reference • Cohen−Swerdlik. PsychologicalTesting and Assessment: An Introduction to Tests and Measurement 7th Edition. McGraw-Hill