TEST SCORES
C O L L E C T E D A N D P R E S E N T E D B Y :
E M A N A W A D E L- S A W Y
Ch. 4
What is the meaning of Instrumentation?
It is the process of selecting or
developing measuring devices and
methods appropriate to a given
evaluation problem. (p. 101)
WHAT’S THE DEAL WITH TESTING?
As a society, we like numbers. If something can be
quantified, it is viewed as valid or more scientific.
Machine scoring of a test is fast, efficient, and cheap.
Hand scoring of a test is slow, time consuming, and
very expensive.
INTERPRETING TEST SCORES:
To interpret a test score ,two things must be known:
1. The nature of the score itself (what kind of scoring or scaling system
was used in the calculations?
2. The basis of comparison underlying the score (what reference
population or norm group does it present?)
Types of
test scores
1. Raw
score
2. Percentile
(centile ranks) and
percentiles
(centiles)
3. Stanine
scores
(standard nine)
4.
Standard
scores
Z-scores
T-scores
Other
standard
scores
5. Grade
level
scores
1. RAW SCORES
the actual score made on a test.
Simply the total number of points an individual gets on a test before it is
converted to any formal or standardized scoring system .
Limitations: A raw score by itself is uninterpretable since there is no way
of knowing how it compares with anything else.
2. PERCENTILE (CENTILE RANKS) AND PERCENTILES
(CENTILES)
Raw scores begin to have meaning when they are ranked
from high to low.
A convenient solution is to convert the scores into
percentage values.
Two statistics used for this purpose:
A. The Percentile (Centile Rank): a number between 0 & 100
indicating percent of cases in a norm group falling at or below
that score.
B. the percentile (centile): a point on a scale of scores at or below
which a given percent of the cases falls.
 Strengths:
 They are easily understood by lay people.
 They allow exact interpretation.
 They are appropriate for markedly skewed data than scores based
on the normal probability curve.
 Weaknesses
 Confusion with a “percentage-right score”
 Inequality of units.
 It is misleading to report results in percentage terms when the
sample size is under 100.
 They only permit statements about rank (greater than, equal to ,
less than).
 The intervals between units are not equal (between 60th and 70th #
between 80th and 90th).
3. STANINES
CONTRACTION OF “STANDARD NINE”
 Stanines divide the normal distribution into 9 units each of which
cover the same length along the base of the normal curve (except
the units which cover the two tails). Stanines have a M = 5 and SD =
2 and range 1 (lowest) – 9 (highest).
 They combine the understandability of percentages with the features
of the normal curve of probability.
Stanine scores are useful in comparing a student's performance
across different content areas. For example, a 6 in Mathematics
and an 8 in Reading generally indicate a meaningful difference
in a student's learning for the two respective content areas.
Advantages: Stanine score are coming into increasing use
because of their simplicity and utility.
9
10
4. STANDARD SCORES
The standard scores indicate a student’s relative
position in a group. It expresses test
performance in terms of standard deviation
units from the mean.
They are derived from the properties of the
normal probability curve and preserving the
absolute differences between scores.
Disadvantages:
1. They are inappropriate if data are markedly
skewed.
2. They are difficult to explain to lay audience.
5. GRADE LEVEL SCORES:
They are based on the relationship between
scores on a test and the average performance
of children at each of a series of grade levels.
However, developmental characteristics of certain
age levels may be due to maturity rather than
instruction.
They are most relevant in elementary schools
where subject matter tends to be more
continuous.
Beyond the sixth grade they lose meaning.
STANDARDIZED TESTS:
They report score based on a norm group representing a defined
population .
Until this comparison group is clearly known , a satisfactory interpretation
of the score is not possible.
1. Norm-referenced tests.
2. Criterion-referenced tests.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTINGMany educators and members of the public fail to
grasp the distinctions between criterion-
referenced and norm-referenced testing. It is
common to hear the two types of testing referred
to as if they serve the same purposes, or shared
the same characteristics. Much confusion can be
eliminated if the basic differences are
understood.
The following is adapted from: Popham, J. W.
(1975). Educational evaluation. Englewood Cliffs,
New Jersey: Prentice-Hall, Inc.
STANDARDIZED TESTS:
Criterion-Referenced Test
Criterion-referenced tests, also called mastery tests,
compare a person's performance to a set of objectives.
Anyone who meets the criterion can get a high score.
Everyone knows what the benchmarks / objectives are and
can attain mastery to meet them.
It is possible for ALL the test takers to achieve 100%
mastery.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Purpose To determine whether each
student has achieved specific
skills or concepts.
To find out how much
students know before
instruction begins and after
it has finished.
To rank each student with
respect to the
achievement of others in
broad areas of knowledge.
To discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Content Measures specific
skills which make up a
designated curriculum.
These skills are
identified by teachers
and curriculum
experts.
Each skill is expressed
as an instructional
objective.
Measures broad skill areas
sampled from a variety of
textbooks, syllabi, and the
judgments of curriculum
experts.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Item
Characteristics
Each skill is tested by at
least four items in order to
obtain an adequate sample
of student performance and
to minimize the effect of
guessing.
The items which test any
given skill are parallel in
difficulty.
Each skill is usually tested by
less than four items.
Items vary in difficulty.
Items are selected that
discriminate between high
and low achievers.
MEASUREMENT AND EVALUATION:
CRITERION- VERSUS NORM-REFERENCED
TESTING
Dimension Criterion-Referenced
Tests
Norm-Referenced
Tests
Score
Interpretation
Each individual is
compared with a preset
standard for acceptable
achievement. The
performance of other
examinees is irrelevant.
A student's score is usually
expressed as a percentage.
Student achievement is
reported for individual
skills.
Each individual is compared
with other examinees and
assigned a score--usually
expressed as a percentile, a
grade equivalent score, or a
stanine.
Student achievement is
reported for broad skill
areas, although some norm-
referenced tests do report
student achievement for
individual skills.

Measurement and instrumentaion

  • 1.
    TEST SCORES C OL L E C T E D A N D P R E S E N T E D B Y : E M A N A W A D E L- S A W Y Ch. 4
  • 2.
    What is themeaning of Instrumentation? It is the process of selecting or developing measuring devices and methods appropriate to a given evaluation problem. (p. 101)
  • 3.
    WHAT’S THE DEALWITH TESTING? As a society, we like numbers. If something can be quantified, it is viewed as valid or more scientific. Machine scoring of a test is fast, efficient, and cheap. Hand scoring of a test is slow, time consuming, and very expensive.
  • 4.
    INTERPRETING TEST SCORES: Tointerpret a test score ,two things must be known: 1. The nature of the score itself (what kind of scoring or scaling system was used in the calculations? 2. The basis of comparison underlying the score (what reference population or norm group does it present?)
  • 5.
    Types of test scores 1.Raw score 2. Percentile (centile ranks) and percentiles (centiles) 3. Stanine scores (standard nine) 4. Standard scores Z-scores T-scores Other standard scores 5. Grade level scores
  • 6.
    1. RAW SCORES theactual score made on a test. Simply the total number of points an individual gets on a test before it is converted to any formal or standardized scoring system . Limitations: A raw score by itself is uninterpretable since there is no way of knowing how it compares with anything else.
  • 7.
    2. PERCENTILE (CENTILERANKS) AND PERCENTILES (CENTILES) Raw scores begin to have meaning when they are ranked from high to low. A convenient solution is to convert the scores into percentage values. Two statistics used for this purpose: A. The Percentile (Centile Rank): a number between 0 & 100 indicating percent of cases in a norm group falling at or below that score. B. the percentile (centile): a point on a scale of scores at or below which a given percent of the cases falls.
  • 8.
     Strengths:  Theyare easily understood by lay people.  They allow exact interpretation.  They are appropriate for markedly skewed data than scores based on the normal probability curve.  Weaknesses  Confusion with a “percentage-right score”  Inequality of units.  It is misleading to report results in percentage terms when the sample size is under 100.  They only permit statements about rank (greater than, equal to , less than).  The intervals between units are not equal (between 60th and 70th # between 80th and 90th).
  • 9.
    3. STANINES CONTRACTION OF“STANDARD NINE”  Stanines divide the normal distribution into 9 units each of which cover the same length along the base of the normal curve (except the units which cover the two tails). Stanines have a M = 5 and SD = 2 and range 1 (lowest) – 9 (highest).  They combine the understandability of percentages with the features of the normal curve of probability. Stanine scores are useful in comparing a student's performance across different content areas. For example, a 6 in Mathematics and an 8 in Reading generally indicate a meaningful difference in a student's learning for the two respective content areas. Advantages: Stanine score are coming into increasing use because of their simplicity and utility. 9
  • 10.
  • 11.
    4. STANDARD SCORES Thestandard scores indicate a student’s relative position in a group. It expresses test performance in terms of standard deviation units from the mean. They are derived from the properties of the normal probability curve and preserving the absolute differences between scores. Disadvantages: 1. They are inappropriate if data are markedly skewed. 2. They are difficult to explain to lay audience.
  • 12.
    5. GRADE LEVELSCORES: They are based on the relationship between scores on a test and the average performance of children at each of a series of grade levels. However, developmental characteristics of certain age levels may be due to maturity rather than instruction. They are most relevant in elementary schools where subject matter tends to be more continuous. Beyond the sixth grade they lose meaning.
  • 13.
    STANDARDIZED TESTS: They reportscore based on a norm group representing a defined population . Until this comparison group is clearly known , a satisfactory interpretation of the score is not possible. 1. Norm-referenced tests. 2. Criterion-referenced tests.
  • 14.
    MEASUREMENT AND EVALUATION: CRITERION-VERSUS NORM-REFERENCED TESTINGMany educators and members of the public fail to grasp the distinctions between criterion- referenced and norm-referenced testing. It is common to hear the two types of testing referred to as if they serve the same purposes, or shared the same characteristics. Much confusion can be eliminated if the basic differences are understood. The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
  • 15.
    STANDARDIZED TESTS: Criterion-Referenced Test Criterion-referencedtests, also called mastery tests, compare a person's performance to a set of objectives. Anyone who meets the criterion can get a high score. Everyone knows what the benchmarks / objectives are and can attain mastery to meet them. It is possible for ALL the test takers to achieve 100% mastery.
  • 16.
    MEASUREMENT AND EVALUATION: CRITERION-VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Purpose To determine whether each student has achieved specific skills or concepts. To find out how much students know before instruction begins and after it has finished. To rank each student with respect to the achievement of others in broad areas of knowledge. To discriminate between high and low achievers.
  • 17.
    MEASUREMENT AND EVALUATION: CRITERION-VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Content Measures specific skills which make up a designated curriculum. These skills are identified by teachers and curriculum experts. Each skill is expressed as an instructional objective. Measures broad skill areas sampled from a variety of textbooks, syllabi, and the judgments of curriculum experts.
  • 18.
    MEASUREMENT AND EVALUATION: CRITERION-VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Item Characteristics Each skill is tested by at least four items in order to obtain an adequate sample of student performance and to minimize the effect of guessing. The items which test any given skill are parallel in difficulty. Each skill is usually tested by less than four items. Items vary in difficulty. Items are selected that discriminate between high and low achievers.
  • 19.
    MEASUREMENT AND EVALUATION: CRITERION-VERSUS NORM-REFERENCED TESTING Dimension Criterion-Referenced Tests Norm-Referenced Tests Score Interpretation Each individual is compared with a preset standard for acceptable achievement. The performance of other examinees is irrelevant. A student's score is usually expressed as a percentage. Student achievement is reported for individual skills. Each individual is compared with other examinees and assigned a score--usually expressed as a percentile, a grade equivalent score, or a stanine. Student achievement is reported for broad skill areas, although some norm- referenced tests do report student achievement for individual skills.