INTERPRETING TEST SCORES
AND NORMS
Presented To;
Dr. Muhammad Ramzan
Presented By;
Abdul Majid (RNo.02) MPhil Education
Interpreting Test Scores and
Norms
• Criteria Reference or Standard based
Test
• True Zero Point
(The point at which there is no height
at all or no weigh at all)
 Equal units provide uniform meaning
 Methods of expressing test scores
METHODS OF INTERPRETING
TEST SCORES
 Raw scores
 Criterion reference and standard based
interpretations
 Norm referenced Interpretation
 Grade norms
 Percentile Rank
 Standard scores
 Profiles
 Judging the adequacy of norms
 Using local norms
Raw Scores
 Numerical summary of student’s test
performance
 Criterion Referenced and Standard Based
interpretations
o Describe individual’s test performance
o Specify levels of performance
o Percentage score used
o Measures clearly stated learning tasks
o Criterion Referenced Interpretation of
Standardization Test
o Analyzing each student response
o Expectancy tables
Standardized Tests – What’s the
Difference?
 Criterion-Referenced Test
Criterion-referenced tests, also called mastery
tests, compare a person's performance to a set
of objectives. Anyone who meets the criterion
can get a high score.
Everyone knows what the benchmarks /
objectives are and can attain mastery to meet
them.
It is possible for ALL the test takers to achieve
100% mastery.
“Adjusting” The Raw Score
 We have already noted the most immediate result from an
assessment or test is the raw score. Sometimes, before we proceed
to discuss the meaning of the score (i.e., interpret it from either a
norm or criteria perspective), the raw score is adjusted. Usually this
is done by researchers, not classroom teachers. Two special
considerations
◦ Correction for Guessing
 Only for selected-response items
 Use has faded
◦ Factoring in Item Difficulty
 Students get a higher “Theta” Score based on doing well on the
more difficult items of a test.
 In fact, you may be looking at a test score report given in
percentiles or standard scores and not realize you are looking
at a transformation from a Theta score rather than the
traditional raw score.
6
Interpreting Student Performance
Norms or Criteria . . . .
 Intelligent interpretation of student performance is crucial for
the use of educational assessment information. We are
building toward this with previous discussions of
◦ Building / choosing good tests
◦ Determining reliability
◦ Determining validity
 So now we are set to explore some methods of interpretation.
These methods fall into two basic categories or approaches:
◦ Norm-referenced
 Compare this student with others.
◦ Criterion-referenced
 Compare this student with some judgment regarding
expected performance level irrespective of others.
7
Standardized Tests – What’s the
Difference?
 Norm-Referenced Test
Norm-referenced tests compare an individual's
performance with the performance of others.
They are designed to yield a normal curve, with 50%
of test takers scoring above the 50th percentile and
50% scoring below it, so half the test takers MUST
pass and half the test takers MUST fail
The test makers design the test with questions that
MOST people will get incorrect.
If too many people get a question correct, or too
many score well, then test questions are “thrown
out” until they achieve a normal curve again.
Norm Referenced
Interpretation
 How an individual compares with other
persons
 Rank the scores from highest to
lowest
Derived scores
o Obtain a general framework for norm
referenced interpretation
o Raw scores converted to derived
scores
o Numerical report of test performance
Interpreting Test Scores
(some definitions)
 Raw score. This is the number of items the
student answered correctly. It is used to
calculate the other, more useful scores.
 Stanine. One of nine equal sections of the
normal curve. Stanines can be easily averaged
and compared from test to test, but are less
precise than other scores.
 Normal curve equivalent (NCE). For these
scores, the normal curve is divided into equal
units ranging from 1 to 99, with an average of
50. These can be averaged and compared from
test to test or year to year.
Norms And It Types
o Based on students performance
a) Grade Equivalent or Norms
b) Percentile Ranks
c) Standard Scores
• Above first two indicate individuals relative
understanding with in group
• While Standard Scores indicate T-Scores,
Normal curve equivalent and Standard age
scores
• Differ only in numerical values
Six Misinterpretations(Assumptions
about Grade Equivalent)
1) Assuming norms are standards
2) Assuming grade equivalents indicate
appropriate grade
3) Assuming all students expected to
grow
4) Assuming units are equal
5) Assuming grade equivalent are
comparable
6) Assuming scores are based on
extrapolations to grades
Six Steps For Avoiding
Misinterpretations
1) Don't confuse norms with standards
of what should be
2) Don't interpret a grade equivalent as
an estimate of the grade where a
student should be placed.
3) Don't expect that all students gain 1.0
grad equivalent each year
 Don’t assume that the unit are equal
at different parts of the scale.
 Don’t assume that scores on different
tests are comparable
 Don’t interprete extreme scores as
dependable estimates of students
performance level.
Percentile Rank
at or below . . .
 Percentiles and Percentile Rank
◦ Definition: % of cases “at or below”
◦ As we noted earlier, these two terms are different
conceptually, however, in practice often both terms are
used interchangeably.
◦ Strengths:
 Easy to describe
 Easy to compute
◦ Weaknesses
 Confusion with a “percentage-right score”
 Inequality of units [see next slide]
15
Percentile Rank
 Widely used and easy understood.
 Indicates students relative position.
 Interpreted as percentage of
individuals.
Standard scores
 Showing how far the raw score is
above average or below average.
 Expressed in standard deviation and
mean.
 SD is a measure of the spread of
scores.
Normal Curve & Standard
Deviation Unit
 Symmetrical bell shaped.
 Curve contain fix percentages
Types of standard scores
 Z-score.
Z-score = X – M / SD
X = any raw score.
M = Arithmetic Mean.
SD = Standard deviation of raw scores.
Z-score is always negative when raw score
is smaller than the mean.
Pros & Cons of Standard Scores
 Strengths
◦ Wide applicability
◦ Nice statistical properties
◦ Teachers often build their narrative reports on
these standard scores using the “accepted
descriptive words” rather than the numbers.
 Weaknesses
◦ May be hard to explain to laypersons
◦ Need to know M and SD of original test
◦ Teachers often build their narrative reports on
these standard scores using the “accepted
descriptive words” rather than the numbers.
18
More Standard Scores of Interest . .
. .
◦ T-scores, SATs, GREs
◦ NCEs (Normal Curve Equivalent)
 Recall that the percentile rank scale is not an equal-interval
scale; NCEs solve this problem by converting percentile ranks
to an equal-interval scale. NCEs range from 1 to 99 with a
mean of 50. The major advantage of NCEs over percentile
ranks is that NCEs can be averaged.
 Used almost exclusively by federal reporting requirement for
achievement testing.
◦ Stanines
 Widely used in schools so we will look at them in more detail in
the next slide.
19
T-Score
 Types of normalized score.
T-score = 50 + 10 (z)
 Normalized standard scores.
can be calculated by
1. Converting distribution of raw scores
into percentile rank.
2. Looking up the Z-score to the
corresponding raw scores.
3. Converting the Z-scores to T-score
Stanines
 Pronounced (stay-nines)
 Single digit scores.
Strengths of stanines scores.
1. Nine point scale 9 is high, 1 is low and 5 is
average.
2. Possible to compare a student’s performance
on different tests.
3. Easy to combine diverse types of data (test
scores, rating, ranked data)
4. Uses single digit score, easily recorded and
takes less space.
Limitation
1. Growth can’t be shown from 1 year to the next.
Judging the adequacy of
norms
 Main purpose is to able to interpret
students test performance.
 Qualities in test norms.
1. Test norms should be relevant.
2. Test norms should be
representative.
3. Test norms should be upto date.
4. Test norms should be comparable.
5. Test norms should be adequately
described.
Using local norms
 Compare students with local norms.
 Published norms on aptitude,
educational experience, cultural
background.
 Prepared using percentile ranks or
stanines.
Cautions in interpreting test
scores
 A test should be interpreted in terms of
the specific test from which it was
derived.
 A test should be interpreted in light of all
of the students relevant characteristics.
 A test should be interpreted according to
the type of decision to be made.
 A test score should be interpreted as a
band of scores rather than as a specific
value.
 A test score should be verified by
supplementary evidence.

Six steps for avoiding misinterpretations

  • 1.
    INTERPRETING TEST SCORES ANDNORMS Presented To; Dr. Muhammad Ramzan Presented By; Abdul Majid (RNo.02) MPhil Education
  • 2.
    Interpreting Test Scoresand Norms • Criteria Reference or Standard based Test • True Zero Point (The point at which there is no height at all or no weigh at all)  Equal units provide uniform meaning  Methods of expressing test scores
  • 3.
    METHODS OF INTERPRETING TESTSCORES  Raw scores  Criterion reference and standard based interpretations  Norm referenced Interpretation  Grade norms  Percentile Rank  Standard scores  Profiles  Judging the adequacy of norms  Using local norms
  • 4.
    Raw Scores  Numericalsummary of student’s test performance  Criterion Referenced and Standard Based interpretations o Describe individual’s test performance o Specify levels of performance o Percentage score used o Measures clearly stated learning tasks o Criterion Referenced Interpretation of Standardization Test o Analyzing each student response o Expectancy tables
  • 5.
    Standardized Tests –What’s the Difference?  Criterion-Referenced Test Criterion-referenced tests, also called mastery tests, compare a person's performance to a set of objectives. Anyone who meets the criterion can get a high score. Everyone knows what the benchmarks / objectives are and can attain mastery to meet them. It is possible for ALL the test takers to achieve 100% mastery.
  • 6.
    “Adjusting” The RawScore  We have already noted the most immediate result from an assessment or test is the raw score. Sometimes, before we proceed to discuss the meaning of the score (i.e., interpret it from either a norm or criteria perspective), the raw score is adjusted. Usually this is done by researchers, not classroom teachers. Two special considerations ◦ Correction for Guessing  Only for selected-response items  Use has faded ◦ Factoring in Item Difficulty  Students get a higher “Theta” Score based on doing well on the more difficult items of a test.  In fact, you may be looking at a test score report given in percentiles or standard scores and not realize you are looking at a transformation from a Theta score rather than the traditional raw score. 6
  • 7.
    Interpreting Student Performance Normsor Criteria . . . .  Intelligent interpretation of student performance is crucial for the use of educational assessment information. We are building toward this with previous discussions of ◦ Building / choosing good tests ◦ Determining reliability ◦ Determining validity  So now we are set to explore some methods of interpretation. These methods fall into two basic categories or approaches: ◦ Norm-referenced  Compare this student with others. ◦ Criterion-referenced  Compare this student with some judgment regarding expected performance level irrespective of others. 7
  • 8.
    Standardized Tests –What’s the Difference?  Norm-Referenced Test Norm-referenced tests compare an individual's performance with the performance of others. They are designed to yield a normal curve, with 50% of test takers scoring above the 50th percentile and 50% scoring below it, so half the test takers MUST pass and half the test takers MUST fail The test makers design the test with questions that MOST people will get incorrect. If too many people get a question correct, or too many score well, then test questions are “thrown out” until they achieve a normal curve again.
  • 9.
    Norm Referenced Interpretation  Howan individual compares with other persons  Rank the scores from highest to lowest Derived scores o Obtain a general framework for norm referenced interpretation o Raw scores converted to derived scores o Numerical report of test performance
  • 10.
    Interpreting Test Scores (somedefinitions)  Raw score. This is the number of items the student answered correctly. It is used to calculate the other, more useful scores.  Stanine. One of nine equal sections of the normal curve. Stanines can be easily averaged and compared from test to test, but are less precise than other scores.  Normal curve equivalent (NCE). For these scores, the normal curve is divided into equal units ranging from 1 to 99, with an average of 50. These can be averaged and compared from test to test or year to year.
  • 11.
    Norms And ItTypes o Based on students performance a) Grade Equivalent or Norms b) Percentile Ranks c) Standard Scores • Above first two indicate individuals relative understanding with in group • While Standard Scores indicate T-Scores, Normal curve equivalent and Standard age scores • Differ only in numerical values
  • 12.
    Six Misinterpretations(Assumptions about GradeEquivalent) 1) Assuming norms are standards 2) Assuming grade equivalents indicate appropriate grade 3) Assuming all students expected to grow 4) Assuming units are equal 5) Assuming grade equivalent are comparable 6) Assuming scores are based on extrapolations to grades
  • 13.
    Six Steps ForAvoiding Misinterpretations 1) Don't confuse norms with standards of what should be 2) Don't interpret a grade equivalent as an estimate of the grade where a student should be placed. 3) Don't expect that all students gain 1.0 grad equivalent each year
  • 14.
     Don’t assumethat the unit are equal at different parts of the scale.  Don’t assume that scores on different tests are comparable  Don’t interprete extreme scores as dependable estimates of students performance level.
  • 15.
    Percentile Rank at orbelow . . .  Percentiles and Percentile Rank ◦ Definition: % of cases “at or below” ◦ As we noted earlier, these two terms are different conceptually, however, in practice often both terms are used interchangeably. ◦ Strengths:  Easy to describe  Easy to compute ◦ Weaknesses  Confusion with a “percentage-right score”  Inequality of units [see next slide] 15
  • 16.
    Percentile Rank  Widelyused and easy understood.  Indicates students relative position.  Interpreted as percentage of individuals. Standard scores  Showing how far the raw score is above average or below average.  Expressed in standard deviation and mean.  SD is a measure of the spread of scores.
  • 17.
    Normal Curve &Standard Deviation Unit  Symmetrical bell shaped.  Curve contain fix percentages Types of standard scores  Z-score. Z-score = X – M / SD X = any raw score. M = Arithmetic Mean. SD = Standard deviation of raw scores. Z-score is always negative when raw score is smaller than the mean.
  • 18.
    Pros & Consof Standard Scores  Strengths ◦ Wide applicability ◦ Nice statistical properties ◦ Teachers often build their narrative reports on these standard scores using the “accepted descriptive words” rather than the numbers.  Weaknesses ◦ May be hard to explain to laypersons ◦ Need to know M and SD of original test ◦ Teachers often build their narrative reports on these standard scores using the “accepted descriptive words” rather than the numbers. 18
  • 19.
    More Standard Scoresof Interest . . . . ◦ T-scores, SATs, GREs ◦ NCEs (Normal Curve Equivalent)  Recall that the percentile rank scale is not an equal-interval scale; NCEs solve this problem by converting percentile ranks to an equal-interval scale. NCEs range from 1 to 99 with a mean of 50. The major advantage of NCEs over percentile ranks is that NCEs can be averaged.  Used almost exclusively by federal reporting requirement for achievement testing. ◦ Stanines  Widely used in schools so we will look at them in more detail in the next slide. 19
  • 20.
    T-Score  Types ofnormalized score. T-score = 50 + 10 (z)  Normalized standard scores. can be calculated by 1. Converting distribution of raw scores into percentile rank. 2. Looking up the Z-score to the corresponding raw scores. 3. Converting the Z-scores to T-score
  • 21.
    Stanines  Pronounced (stay-nines) Single digit scores. Strengths of stanines scores. 1. Nine point scale 9 is high, 1 is low and 5 is average. 2. Possible to compare a student’s performance on different tests. 3. Easy to combine diverse types of data (test scores, rating, ranked data) 4. Uses single digit score, easily recorded and takes less space. Limitation 1. Growth can’t be shown from 1 year to the next.
  • 22.
    Judging the adequacyof norms  Main purpose is to able to interpret students test performance.  Qualities in test norms. 1. Test norms should be relevant. 2. Test norms should be representative. 3. Test norms should be upto date. 4. Test norms should be comparable. 5. Test norms should be adequately described.
  • 23.
    Using local norms Compare students with local norms.  Published norms on aptitude, educational experience, cultural background.  Prepared using percentile ranks or stanines.
  • 24.
    Cautions in interpretingtest scores  A test should be interpreted in terms of the specific test from which it was derived.  A test should be interpreted in light of all of the students relevant characteristics.  A test should be interpreted according to the type of decision to be made.  A test score should be interpreted as a band of scores rather than as a specific value.  A test score should be verified by supplementary evidence.