Many educators and members of the public fail to grasp the distinctions between criterion-
referenced and norm-referenced testing. It is common to hear the two types of testing referred
to as if they serve the same purposes, or shared the same characteristics. Much confusion can
be eliminated if the basic differences are understood.

The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood
Cliffs, New Jersey: Prentice-Hall, Inc.


                           Criterion-Referenced                       Norm-Referenced
    Dimension
                                     Tests                                  Tests

                     To determine whether each             To rank each student with respect to
                     student has achieved specific skills the
                     or concepts.                          achievement of others in broad areas
     Purpose                                               of knowledge.
                     To find out how much students
                     know before instruction begins        To discriminate between high and
                     and after it has finished.            low achievers.

                     Measures specific skills which
                     make up a designated curriculum.
                                                           Measures broad skill areas sampled
                     These skills are identified by
                                                           from a variety of textbooks, syllabi,
     Content         teachers and curriculum experts.
                                                           and the judgments of curriculum
                                                           experts.
                     Each skill is expressed as an
                     instructional objective.

                     Each skill is tested by at least four Each skill is usually tested by less
                     items in order to obtain an           than four items.
                     adequate sample of student
       Item          performance and to minimize the       Items vary in difficulty.
 Characteristics     effect of guessing.
                                                           Items are selected that discriminate

                     The items which test any given        between high

                     skill are parallel in difficulty.     and low achievers.

       Score         Each individual is compared with Each individual is compared with
  Interpretation     a preset standard for acceptable      other examinees and assigned a
achievement. The performance of score--usually expressed as a
                     other examinees is irrelevant.      percentile, a grade equivalent
                                                         score, or a stanine.
                     A student's score is usually
                     expressed as a percentage.          Student achievement is reported for
                                                         broad skill areas, although some
                     Student achievement is reported
                                                         norm-referenced tests do report
                     for individual skills.
                                                         student achievement for individual
                                                         skills.


The differences outlined are discussed in many texts on testing. The teacher or administrator
who wishes to acquire a more technical knowledge of criterion-referenced test or its norm-
referenced counterpart, may find the text from which this material was adapted particularly
helpful.
A comparison of norm-referencing and criterion-referencing methods
for determining student grades in higher education



The essential characteristic of norm-referencing is that students are awarded their grades on
the basis of their ranking within a particular cohort. Norm-referencing involves fitting a
ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding grades.
Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical
terminology), either by qualitative, informal rough-reckoning or by statistical techniques of
varying complexity. For large student cohorts (such as in senior secondary education),
statistical moderation processes are used to adjust or standardise student scores to fit a normal
distribution. This adjustment is necessary when comparability of scores across different
subjects is required (such as when subject scores are added to create an aggregate ENTER
score for making university selection decisions).

Norm-referencing is based on the assumption that a roughly similar range of human
performance can be expected for any student group. There is a strong culture of norm-
referencing in higher education. It is evident in many commonplace practices, such as the
expectation that the mean of a cohort’s results should be a fixed percentage year-in year-out
(often this occurs when comparability across subjects is needed for the award of prizes, for
instance), or the policy of awarding first class honours sparingly to a set number of students,
and so on.

In contrast, criterion-referencing, as the name implies, involves determining a student’s
grade by comparing his or her achievements with clearly stated criteria for learning outcomes
and clearly stated standards for particular levels of performance. Unlike norm-referencing,
there is no pre-determined grade distribution to be generated and a student’s grades is in no
way influenced by the performance of others. Theoretically, all students within a particular
cohort could receive very high (or very low) grades depending solely on the levels of
individuals’ performances against the established criteria and standards. The goal of criterion-
referencing is to report student achievement against objective reference points that are
independent of the cohort being assessed. Criterion-referencing can lead to simple pass-fail
grading schema, such as in determining fitness-to-practice in professional fields. Criterion-
referencing can also lead to reporting student achievement or progress on a series of key
criteria rather than as a single grade or percentage.

Which of these methods is preferable? Mostly, students’ grades in universities are decided on
a mix of both methods, even though there may not be an explicit policy to do so. In fact, the
two methods are somewhat interdependent, more so than the brief explanations above might
suggest. Logically, norm-referencing must rely on some initial criterion-referencing, since
students’ ‘raw’ scores must presumably be determined in the first instance by assessors who
have some objective criteria in mind. Criterion-referencing, on the other hand, appears more
educationally defensible. But criterion-referencing may be very difficult, if not impossible, to
implement in a pure form in many disciplines. It is not always possible to be entirely
objective and to comprehensively articulate criteria for learning outcomes: some subjectivity
in setting and interpreting levels of achievement is inevitable in higher education. This being
the case, sometimes the best we can hope for is to compare individuals’ achievements relative
to their peers.

Norm-referencing, on its own — and if strictly and narrowly implemented — is undoubtedly
unfair. With norm-referencing, a student’s grade depends – to some extent at least – not only
on his or her level of achievement, but also on the achievement of other students. This might
lead to obvious inequities if applied without thought to any other considerations. For
example, a student who fails in one year may well have passed in other years! The potential
for unfairness of this kind is most likely in smaller student cohorts, where norm-referencing
may force a spread of grades and exaggerate differences in achievement. Alternatively, norm-
referencing might artificially compress the range of difference that actually exists.

Criterion-referencing is worth aspiring towards. Criterion-referencing requires giving thought
to expected learning outcomes: it is transparent for students, and the grades derived should be
defensible in reasonably objective terms – students should be able to trace their grades to the
specifics of their performance on set tasks. Criterion-referencing lays an important
framework for student engagement with the learning process and its outcomes.

Recognising, however, that some degree of subjectivity is inevitable in higher education, it is
also worthwhile to monitor grade distributions – in other words, to use a modest process of
norm-referencing to watch the outcomes of a predominantly criterion-referenced grading
model. In doing so, if it is believed too many students are receiving low grades, or too many
students are receiving high grades, or the distribution is in some way oddly spread, then this
might suggest something is amiss and the assessment process needs looking at. There may be,
for instance, a problem with the overall degree of difficulty of the assessment tasks (for
example, not enough challenging examination questions, or too few, or assignment tasks that
fail to discriminate between students with differing levels of knowledge and skills). There
might also be inconsistencies in the way different assessors are judging student work.

Best practice in grading in higher education involves striking a balance between criterion-
referencing and norm-referencing. This balance should be strongly oriented towards
criterion-referencing as the primary and dominant principle.

In summary:

   1. begin with clear statements of expected learning outcomes and levels of achievement;
   2. communicate these statements to students (they should be written so they make sense
       to students);
   3. measure student achievement as objectively as possible against these statements, and
       compute results and grades transparently on this basis; and
   4. keep an eye on the spread of grades or scores that are emerging to be alert to anything
       amiss in assessment tasks and assessor interpretations.

Note 1

  • 1.
    Many educators andmembers of the public fail to grasp the distinctions between criterion- referenced and norm-referenced testing. It is common to hear the two types of testing referred to as if they serve the same purposes, or shared the same characteristics. Much confusion can be eliminated if the basic differences are understood. The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc. Criterion-Referenced Norm-Referenced Dimension Tests Tests To determine whether each To rank each student with respect to student has achieved specific skills the or concepts. achievement of others in broad areas Purpose of knowledge. To find out how much students know before instruction begins To discriminate between high and and after it has finished. low achievers. Measures specific skills which make up a designated curriculum. Measures broad skill areas sampled These skills are identified by from a variety of textbooks, syllabi, Content teachers and curriculum experts. and the judgments of curriculum experts. Each skill is expressed as an instructional objective. Each skill is tested by at least four Each skill is usually tested by less items in order to obtain an than four items. adequate sample of student Item performance and to minimize the Items vary in difficulty. Characteristics effect of guessing. Items are selected that discriminate The items which test any given between high skill are parallel in difficulty. and low achievers. Score Each individual is compared with Each individual is compared with Interpretation a preset standard for acceptable other examinees and assigned a
  • 2.
    achievement. The performanceof score--usually expressed as a other examinees is irrelevant. percentile, a grade equivalent score, or a stanine. A student's score is usually expressed as a percentage. Student achievement is reported for broad skill areas, although some Student achievement is reported norm-referenced tests do report for individual skills. student achievement for individual skills. The differences outlined are discussed in many texts on testing. The teacher or administrator who wishes to acquire a more technical knowledge of criterion-referenced test or its norm- referenced counterpart, may find the text from which this material was adapted particularly helpful.
  • 3.
    A comparison ofnorm-referencing and criterion-referencing methods for determining student grades in higher education The essential characteristic of norm-referencing is that students are awarded their grades on the basis of their ranking within a particular cohort. Norm-referencing involves fitting a ranked list of students’ ‘raw scores’ to a pre-determined distribution for awarding grades. Usually, grades are spread to fit a ‘bell curve’ (a ‘normal distribution’ in statistical terminology), either by qualitative, informal rough-reckoning or by statistical techniques of varying complexity. For large student cohorts (such as in senior secondary education), statistical moderation processes are used to adjust or standardise student scores to fit a normal distribution. This adjustment is necessary when comparability of scores across different subjects is required (such as when subject scores are added to create an aggregate ENTER score for making university selection decisions). Norm-referencing is based on the assumption that a roughly similar range of human performance can be expected for any student group. There is a strong culture of norm- referencing in higher education. It is evident in many commonplace practices, such as the expectation that the mean of a cohort’s results should be a fixed percentage year-in year-out (often this occurs when comparability across subjects is needed for the award of prizes, for instance), or the policy of awarding first class honours sparingly to a set number of students, and so on. In contrast, criterion-referencing, as the name implies, involves determining a student’s grade by comparing his or her achievements with clearly stated criteria for learning outcomes and clearly stated standards for particular levels of performance. Unlike norm-referencing, there is no pre-determined grade distribution to be generated and a student’s grades is in no way influenced by the performance of others. Theoretically, all students within a particular cohort could receive very high (or very low) grades depending solely on the levels of individuals’ performances against the established criteria and standards. The goal of criterion- referencing is to report student achievement against objective reference points that are
  • 4.
    independent of thecohort being assessed. Criterion-referencing can lead to simple pass-fail grading schema, such as in determining fitness-to-practice in professional fields. Criterion- referencing can also lead to reporting student achievement or progress on a series of key criteria rather than as a single grade or percentage. Which of these methods is preferable? Mostly, students’ grades in universities are decided on a mix of both methods, even though there may not be an explicit policy to do so. In fact, the two methods are somewhat interdependent, more so than the brief explanations above might suggest. Logically, norm-referencing must rely on some initial criterion-referencing, since students’ ‘raw’ scores must presumably be determined in the first instance by assessors who have some objective criteria in mind. Criterion-referencing, on the other hand, appears more educationally defensible. But criterion-referencing may be very difficult, if not impossible, to implement in a pure form in many disciplines. It is not always possible to be entirely objective and to comprehensively articulate criteria for learning outcomes: some subjectivity in setting and interpreting levels of achievement is inevitable in higher education. This being the case, sometimes the best we can hope for is to compare individuals’ achievements relative to their peers. Norm-referencing, on its own — and if strictly and narrowly implemented — is undoubtedly unfair. With norm-referencing, a student’s grade depends – to some extent at least – not only on his or her level of achievement, but also on the achievement of other students. This might lead to obvious inequities if applied without thought to any other considerations. For example, a student who fails in one year may well have passed in other years! The potential for unfairness of this kind is most likely in smaller student cohorts, where norm-referencing may force a spread of grades and exaggerate differences in achievement. Alternatively, norm- referencing might artificially compress the range of difference that actually exists. Criterion-referencing is worth aspiring towards. Criterion-referencing requires giving thought to expected learning outcomes: it is transparent for students, and the grades derived should be defensible in reasonably objective terms – students should be able to trace their grades to the specifics of their performance on set tasks. Criterion-referencing lays an important framework for student engagement with the learning process and its outcomes. Recognising, however, that some degree of subjectivity is inevitable in higher education, it is
  • 5.
    also worthwhile tomonitor grade distributions – in other words, to use a modest process of norm-referencing to watch the outcomes of a predominantly criterion-referenced grading model. In doing so, if it is believed too many students are receiving low grades, or too many students are receiving high grades, or the distribution is in some way oddly spread, then this might suggest something is amiss and the assessment process needs looking at. There may be, for instance, a problem with the overall degree of difficulty of the assessment tasks (for example, not enough challenging examination questions, or too few, or assignment tasks that fail to discriminate between students with differing levels of knowledge and skills). There might also be inconsistencies in the way different assessors are judging student work. Best practice in grading in higher education involves striking a balance between criterion- referencing and norm-referencing. This balance should be strongly oriented towards criterion-referencing as the primary and dominant principle. In summary: 1. begin with clear statements of expected learning outcomes and levels of achievement; 2. communicate these statements to students (they should be written so they make sense to students); 3. measure student achievement as objectively as possible against these statements, and compute results and grades transparently on this basis; and 4. keep an eye on the spread of grades or scores that are emerging to be alert to anything amiss in assessment tasks and assessor interpretations.