2. WHAT IS ASSESSMENT OF LEARNING?
• I focuses on the development and utilization of assessment
tools to improve the reaching-learning process.
• It emphasizes on the use of testing measuring knowledge,
comprehension and other thinking skills.
• It allows the students to go through the standard steps in
constitution for quality assessment.
• Students will experience how to develop rubrics for
performance-based and portfolio assessment.
3. MEASUREMENT
• Refers to the quantitative aspect of
evaluation. It involves outcomes that can be
quantified statically. It can also be defined
as the process in determining and
differentiating the information about the
attributes or characteristics of things.
4. EVALUATION
• Is the qualitative aspect of determining
outcomes of learning. It involves value
judgment. Evaluation is more
comprehensive then measurements. In fact,
measurement is one aspect of evaluation.
9. ACCORDING TO THE NATURE OF TEST:
• Personality test
• Intelligence test
• Aptitude test
• Achievement or summative test
• Sociometric test
• Diagnostic or formative test
• Trade of vocational test
12. DIAGNOSTIC TESTS
•Are used to measure a student’s
strengths and weaknesses, usually to
identify deficiencies in skills or
performance.
13. FORMATIVE AND SUMMATIVE TESTS
• Are terms often used with evaluation, but they may
also be used with testing. Formative testing is done to
monitor students’ attainment of the instructional
objectives. Formative testing occurs over a period of
time and monitors students progress. Summative
testing is done at the conclusions of instruction and
measures the extent to which students have attained
the desired outcomes.
14. STANDARDIZED TESTS
• Are already valid, reliable and objective.
Standardized tests are tests for which contents
have been selected and for which norms or
standards have been established. Psychological
tests and government national examinations are
examples of standardized tests.
15. STANDARDS OR NORMS
•Are goals to be achieved expressed in
terms of the average performance of
the population tested.
16. CRITERION-REFERENCED MEASURE
• Is a measuring device with a predetermined
level of success or standard on the part of
the test-takes. For example, a level of 75
percent score in all the test items could be
considered a satisfactory performance.
17. NORM-REFERENCED MEASURE
• Is a test that is scored on the basis of the
norm or standard level of accomplishment
by the whole group taking the test. The
grades of the students are based on the
normal curve of distribution.
22. NOMINAL MEASUREMENT
• Merely classify objects or events by
assigning numbers to them.
For example, one could nominally designate
baseball positions by assigning the pitcher
the number 1, 2; the first baseman and so
on.
23. ORDINAL MEASUREMENT
•Ordinal scaled classify but they also
assign rank order. Ranking
individuals according to their test
scores is an example of ordinal
measurement.
24. INTERVAL MEASUREMENT
• In order to be able to add and subtract
scores, we sue interval scales or sometimes
called equal interval or equal unit
measurement. This contains the nominal
and ordinal properties and is also
characterized by equal units between score
points.
25. RATIO MEASUREMENT
•It includes all the preceding
properties, but in ratio scale, the zero
point is not arbitrary; a score of zero
included the absence of what is being
measured.
26. NORM-REFERENCED AND CRITERION
REFERENCED MEASUREMENT
• When we contrast norm-referenced measurement (or
testing) with criterion-referenced measurement, we are
basically referring to two different ways of interpreting
information. However, Popham (1988, page 135) points
out that certain characteristics tend to go with each
type of measurement, and it is likely that results of
norm-referenced tests are interpreted in criterion-
referenced ways and vice versa.
27. NORM-REFERENCED INTERPRETATION
• An individual score is interpreted by
comparing it to the scores of a defined
groups, often called the normative group.
Norms represent the scored earned by one
or more groups of students who have taken
the test.
28. ACHIEVEMENT TEST AS AN EXAMPLE
• Most standardized achievement tests, especially those covering several
skills and academic areas, are primarily designed for norm-referenced
interpretations. However, the form of results and the interpretations of
these tests are somewhat complex and require concepts not yet introduced
in this text. Scores on teacher-constructed tests are often given norm-
referenced interpretations, Grading on the curve, for example, is a norm-
referenced interpretation of test scores on some type of performance
measure. Specified percentages of scores are assigned the different grades,
and : an individual's score is positioned in the distribution of scores. (We
mention this only as an example; we do not endorse this procedure.)
29. CRITERION-REFERENCED
INTERPRETATION
• The concepts of criterion-referenced testing have developed
with a dual meaning for criterion-referenced. On one hand,
it means referencing an individual’s performance to some
criterion that is a defined performance level. The
individual's score is interpreted in absolute rather than
relative terms. The criterion, in this situation, means some
level of specified performance that has been determined
independently of how others might perform.
30. DISTINCTIONS BETWEEN NORMS-
REFERENCED AND CRITERION-REFERENCED
TESTS
• Although interpretations, not characteristics, provide the
distinction between norm-referenced and criterion-referenced
tests, the two types do tend to differ in some ways. Norm-
referenced tests are usually more general and comprehensive
and cover a large domain of content and learning tasks. They
are used for survey testing, although this is not their exclusive
use.
31. • Criterion-referenced tests focus on a specific group of learner behaviors. To
show the contrast, consider an example. Arithmetic skills represent a
general and broad category of student outcomes and would likely be
measured by a norm-referenced test. On the other band, behaviors such as
solving addition problems with two five-digit numbers or determining the
multiplication products of three-and four digit numbers are much more
specific and may be measured by criterion-referenced tests.
32. • A criterion-referenced test tends to focus more on sub skills than
on broad skills. Thus, criterion-referenced tests tend to be shorter.
If mastery learning is involved, criterion-referenced measurement
would be used.
• Norm-referenced test scores are transformed to positions within
the normative group. Criterion-referenced test scores are usually
given in the percentage of correct answers or another indicator of
mastery or the tack thereof. Criterion-referenced tests tend to
lend
33. STAGES IN TEST CONSTRUCTION
• I. Planning the Test
• A. Determining the Objectives
• B. Preparing the Table of Specifications i
• C. Selecting the Appropriate Item Format
• D. Writing the Test Items
• E. Editing the Test Items
34. • II . Trying Out the Test
A. Administering the First Tryout - then Item Analysis
B. Administering the Second Tryout - then Item Analysis ‘.
C. Preparing the Final Form of the Test
Ill. Establishing Test Validity .
IV. Establishing the Test Reliability ]
V. Interpreting the Test Score
35. MAJOR CONSIDERATIONS IN TEST
CONSTRUCTION
Type of Test
Our usual idea of testing is an in-class test that is administered by the
teacher.
However, there are many variations on this theme: group tests, individual
tests, written tests, oral tests, speed tests, power tests, pretests and post
tests. Each of these has different characteristics that must be considered
when the tests are planned.
36. • Test Length
A major decision in the test planning is how many items should be
included on the test. There should be enough to cover the content
adequately, but the length of the class period or the attention span
or fatigue limits of the students usually restrict the test length.
Decisions about test length are usually based on practical
constraints more than on theoretical considerations.
37. • Item Formats
Determining what kind of items to include on the test is a major decision. Should they be
objectively scored formats such as multiple choice or matching type? Should they causes
the students to organize their own thoughts through short answer or essay formats?
These are important questions that can be answered only by the teacher in terms of the
local context, his or her students, his or her classroom, and the specific purpose of the
test. Once the planning decisions are made, the item writing begins. This tank 1s often
the most feared By the: beginning test constructors, However, the procedures are more
common sense than formal rules.
38. POINTS TO BE CONSIDERED IN PREPARING A TEST
1. Are the instructional objectives clearly defined?
2. What knowledge, skills and attitudes do you want to measure?
3. Did you prepare a table of specifications?
4. Did you formulate well defined and clear test items?
5. Did you employ correct English in writing the items?
6. Did you avoid giving clues to the correct answer?
7. Did you test the important ideas rather than the trivial?
8. Did you adapt the test's difficulty to your student's ability?
9. Did you avoid using textbook jargons?
10. Did you cast the items in positive form?
11. Did you prepare a scoring key?
12. Does each item have a single correct answer? .
13. Did you review your items?
39. GENERAL PRINCIPLES IN CONSTRUCTION
DIFFERENT TYPES OF TESTS
1. The test items should be selected very carefully. Only important facts should be
included.
2. The test should have extensive sampling of items
3. The test items should be carefully expressed in simple, clear, definite, and meaningful
sentences.
4. There should be only one possible correct response for each test item.
5. Each item should be independent.
6. Lifting sentences from books should not be done to encourage thinking and
understanding
7. The first personal pronouns I and we should not be used.
8. Various types of test items should be made to avoid monotomy.
9. Majority of the test items should be of moderate difficulty.
40. 10. The test items should be arranged in an ascending order of difficulty.
11. Clear, concise and complete directions should precede all types of test.
12. Items which can be answered by previous experience alone without knowledge of the
subject matter should not be included.
13. Catchy words should not be used in the test items.
14. Test items must be based upon the objectives of the course and upon the course
content.
15. The test should measure the degree of achievement or determine the difficulties of the
learners.
16. The test should emphasize ability to apply and use facts as well as knowledge of facts.
41. 17. The test should be of such length that it can be completed within the time allotted by
all or nearly all of the pupils.
18. Rules governing good language expression, grammar, spelling, punctuation, and
capitalization should be observed in all items.
19. Information on how scoring will be done should be provided.
20. Scoring keys in correcting and scoring tests should be provided.
42. POINTERS TO BE OBSERVED IN CONSTRUCTING AND
SCORING THE DIFFERENT TYPES OF TESTS
A. RECALL TYPES
1.Simple recatt type
a. This type consists of questions calling for a single word of expression as an
answer.
b. Items usually begin with who, where, when, and what,
c. Score ts the number of correct answers.
43. 2. Completion type
a. Only important words or phrases should be omitted to avoid confusion.
b. Blanks should be of equal lengths.
c. The blank, as much as possible, is placed near or at the end of the sentence.
d. Articles a, an, and the should not be provided before the omitted word or phrase to
avoid clues for answers.
e. Score is the number of correct answers.
44. 3. Enumeration type
a. The exact number of expected answers should be stated.
b. Blanks should be of equal lengths,
c. Score is the number of correct answers
4. Identification type
a. The items should make an examinee think of a word, number, or group of words
that would complete the statement or answer the problem.
b. Score is the number of correct answers.
45. B. RECOGNITION TYPES
1. True-false or alternate-response type
a. Declarative sentences should be used.
b. The number of “true” and “false” items should be more or less equal
c. The truth or falsity of the sentence should not be too evident.
d. Negative statements should be avoided.
e. The "modified true-false" is more preferable than the “plain true-false™.
f. In arranging the items, avoid the regular recurrence of “true” and “false”
Statements.
g. Avoid using specific determiners like all, always, never, none, nothing, most,
often, some, etc. and avoid weak statements as may, sometimes, as a rule, in
general ctc
h. Minimize the use of qualitative terms like: few, great, many, more, ete.
46. i. Avoid leading clues to answers to all stems.
j. Score is the number of correct answers in “modified true-false and right answers
minus wrong answers in “plain true-false”
47. 2. Yes-No type
a. The items should be in interrogative sentences.
b. The same rules as in “true-false” are applied
48. 3. Multiple-response type
a. There should be three to five choices. The number of choices used in the first item
should be the same number of choices in all the items of this type of test.
b. The choices should be numbered or lettered so that only the number or letter can be
written on the blank provided.
c. If the choices are figures, they should be arranged in ascending order.
d. Avoid the use of “a” or “an” as the last word prior to the fisting of the responses.
49. e. Random occurrence of responses should be employed
f. The choices, as much as possible, should be at the end of the statements.
g. The choices should be related in some way or should belong to the same class.
h. Avoid the use of “none of these” as one of the choices,
i. Score is the number of correct answers.
50. 4. Best answer type
a. There should be three to five choices all of which are right but vary in their degree
of merit, importance or desirability
b. The other rules for multiple-response items are applied here.
c. Score is the number of correct answers.
51. 5. Matching type
a. There should be two columns, Under “A” are the stimuli which should: be longer and =
more descriptive than the responses under column “p" The’ response may be a word, a
Phrase, g number, or a formula, . .
b. The stimuli under column “At should be numbered and the responses under column “B
should be lettered, Answers will be indicated by letters only on lines provided in column
“A”,
c. The number of Pairs Usually should Not exceed twenty items. Less than ten I.
introduces chance elements Twenty pairs may be used but more than Twenty is decidedly
wasteful of time
52. d. The number of responses in column “B" should be {wo or more than the number ii of
items in Column “A” to avoid guessing. a
e. Only one correct matching for each item should be Possible. -o
f. Matching sets should Neither be too long nor too short. 8
g. All items should be on the Same page to avoid turning of pages in the process of
.matching pairs 4
h. Score is the number of correct answers.
53. C. ESSAY TYPE EXAMINATIONS
Common types of essay questions. (The types are related to purposes of which the essay
examinations are to be used)
1. Comparison of two things
2. Explanation of the use or meaning of a statement or passage.
3. Analysis
4. Decisions for or against
5. Discussion
54. How to construct essay examinations
1. Determine the objectives or essentials for each question to be evaluated.
2. Phrase questions in simple, clear and concise language. af
3. Suit the length of the questions to the time available for answering the essay
examination. The teacher should try to answer the test herself.
55. 4. Scoring
a. Have a model answer in advance.
b. Indicate the number of points for each question.
c. Score a point for each essential.
56. ADVANTAGES AND DISADVANTAGES OF THE
OBJECTIVE TYPE OF TESTS
Advantages
a. The objective test is free from personal bias in scoring. '
b. It is easy to score. With a scoring key, the test can be corrected by different individuals
without affecting the accuracy of the grades given.
c. It has high validity because it is comprehensive with wide sampling of essentials.
d. It is less time-consuming since many items can be answered in a given time.
¢. It is fair to students since the slow writers can accomplish the test as fast as the fast
writers. ' ,
57. Disadvantages
a. It is difficult to construct and requires more time to prepare.
b. It does not afford the students the opportunity in training for self- and thought
organization a
c. It cannot be used to test ability in theme writing or journalistic writing.
58. ADVANTAGES AND DISADVANTAGES OF THE
ESSAY TYPE OF TESTSAdvantages
a. The essay examination can be used in practically all subjects of the school
curriculum.
b. It trains students for thought organization and self expression.
c. It affords students opportunities to express their originality and independence of
thinking.
d. Only the essay test can be used in some subjects like composition writing and
journalistic writing which cannot be tested by the objective type test.
59. e. Essay examination measures higher mental abilities like comparison,
y interpretation, criticism, defense of opinion and decision.
f. The essay test is easily prepared.
g. It is inexpensive
60. Disadvantages
a. The limited sampling of items makes the test unreliable measure of achievements or
abilities.
b. Questions usually are not well prepared.
c. Scoring is highly subjective due to the influence of the corrector’s personal judgment.
d. Grading of the essay test is inaccurate measure of pupils’ achievements due to
subjectivity of scoring.
61. STATISTICAL MEASURES OR TOOLS USED IN THE
INTEREPRETING NUMERICAL DATA
Frequency Distributions
A simple, common sense technique for describing a sct of
test scores is through the use of a frequency distribution. A
frequency distribution is merely a listing of the possible
score values and the number of persons who achieved each
score. Such an arrangement presents the scores in a more
simple and understandable manner than merely listing all
of the separate scores. Consider a specific set of scores to
clarify these ideas.
62. MEASURES OF CENTRAL TENDENCY
Frequency distributions are helpful for indicating
the shape to describe a distributions of scores, but
we need more information than the shape to
describe a distribution adequately. We need to
know where on the scale of measure of central
tendency.
63. MEASURES OF DISPERSION
Measures of central tendency are useful for summarizing average performance, but they
tell us nothing about how the scores are distributed or “spread out” around the averages.
Two sets of test scores may have equal measures of central tendency, but they might differ
in other ways. One of the distributions may have the scores tightly clustered around the
average, and the other distribution may have scores that are widely separated. As you may
have anticipated, there are descriptive statistics that measure dispersion, which also are
called measures of variability. These measures indicate how spread out the scores tend to
be.
64. Graphing Distributions
A graph of a distribution of test scores is often better understood than is the frequency
distribution or a mere table of numbers. The general pattern of scores, as well as any
unique characteristics of the distribution, can be seen easily in simple graphs. There are
several kinds of graphs that can be used, but a simple bar graph, or histogram, is as useful
as any.