Assessment and Evaluation in Learning
(PGDT 423)
Course Instructor: Mulugeta S.
1
1.1 Meanings of Basic Terms in Assessment & Evaluation of
Learning
 How do you define measurement, evaluation and assessment?
 Do you think that these terms could be used interchangeably? If
not, describe the differences among measurement, evaluation &
assessment.
2
A) Educational measurement
- A systematic description of pupil's performance in terms of
numbers.
- The process of assigning number or quantifying to represent an
individual's performance is called measurement.
Therefore, evaluation is a much more comprehensive and
inclusive term than measurement, which is limited to quantitative
description of pupil's performance.
E.g.:- Haritu solved correctly 15 of the 20 quadratic
equations.
Measurement includes neither qualitative descriptions of pupil's
performance nor value judgments concerning the worth or value
of the obtained results.
E.g.:- 1. Tonja's work was neat.
2. Madebo is making a good progress in
mathematics.
These two examples do not belong to measurement, but they only
belong to evaluation.
3
B) Evaluation
- Evaluation refers to a systematic process of determining the
extent to which instructional objectives are achieved by pupils.
- May include either quantitative or qualitative descriptions of
pupil’s performance or both. Moreover, it always involves value
judgments concerning the desirability of the results. The following
diagram shows that evaluation may or may not based on
measurement, and when it is, it goes beyond the simple
quantitative descriptions.
Evaluation = Quantitative descriptions (measurement)
and/or
Qualitative descriptions (non-measurement)
Plus
Value judgments
-Educational measurement involves using measuring devices.
 What are the measurement devices commonly used in teaching-
learning Process?
4
Test
 Involves a series of questions with different item types. It is given
formally while a course is on progress.
 Its purpose is to assess learning progress and identify if there is
learning difficulties.
Quiz
 A short and informal test.
 It is given at class hour just at the beginning or at the end.
Examination
 Covers a large area of content.
 It is given at the end of a course or semester.
 Its main purpose is to assign grade.
 The number of items included is large.
5
C) Assessment
- The process of collecting, interpreting and synthesizing
information to aid in decision making. For many people,
classroom assessment means using paper and pencil tests to grade
pupils. However, it is more than testing.
-Includes information gathering on pupils, instruction
and classroom climate by teachers.
- Interpreting and synthesizing this information to help
teachers understand their pupils, plan and monitor
instruction and establish a conducive classroom
atmosphere.
Some examples of assessment are:-
portfolios, which include planned collections of student work
performance assessments, in which student performance on a
complex task is observed
use of a series of related but informal observations in order to
determine complex attributes of one student or a group of
students.
6
Measurement and evaluation is highly required in educational
enterprise when there is a need to make:
Instructional decisions,- to evaluate and change /modify our teaching
methods .
Curricular decisions,
Selection decisions, and
Placement or classification decisions based on the present or
anticipated educational status of the child.
 The ultimate goal of measurement and evaluation is to facilitate
learning.
7
Other functions
A) Diagnostic function
Diagnosis deals with identifying learning difficulties. It
concerns itself with gaps in students' knowledge,
understanding or skills. Diagnosis puts emphasis on
causes of deficiencies. The individual's educational
weaknesses are pinpointed in order to:-
plan remedial work for him,
revise teaching strategies, and
revamp or rearrange some elements of the curriculum
with a view to solving the manifested deficiencies.
Diagnosis test in most cases implemented when pupils do not
respond to the feedback corrective prescriptions of formative
testing.
8
B) Placement function
Placement refers to the classification of pupils in classes or
sections of a class according to their demonstrated knowledge or
ability in some subject area(s). Placement concerns itself with the
present status of a pupil or an individual. In other words,
placement is concerned with pupils’ entry performance.
Placement tests refer to pre-tests designed to measure:-
whether pupils posses that knowledge and skills needed to begin a
course
to what extent pupils have already achieved the objectives of planned
instruction
determine possessions of prerequisite skills, degree of mastery of
course objectives and/or best made of learning.
9
C) Predictive function
When the results of evaluation are used in selecting a sample of
pupils from a given population of pupils exposed to the test on the
assumption that those so selected can benefit from further
instruction or perform well on similar educational tasks in the
future, such a test is said to have been used for predictive
purposes. For such a test to be worth more than the piece of paper
on which it is written, it must possess high predictive validity
index.
D) Guidance and Counseling function
Results of evaluation are especially useful in assisting pupils with
educational and vocational decisions. They help pupils to solve
personal and social adjustment problems which require objective
knowledge of pupils' abilities, interests, attitudes and other
personal characteristics.
10
E) Feedback provision
Evaluation results provide feedback (knowledge of result) both
for the teacher and pupils concerning learning success and failure
F) For reporting of pupil progress to parents
Evaluation results enable the teacher to report an objective and
comprehensive picture of each pupil’s progress to parents. This
kind of over all objective information about pupils provides the
foundation for the most effective cooperation between parents
and teachers.
11
One of the distinctive features of evaluation process is the use of
wide variety of approaches. These may be classified and described
in many different ways, depending on the frame of reference used.
Some of the approaches which can be used in the process of
evaluation are:
Summative versus Formative Evaluation
Criterion-referenced versus Norm–referenced Evaluation
Achievement Test versus Aptitude Tests
12
A) Summative Evaluation:- is conducted at the end of an instructional
segment to determine if learning is sufficiently completed to warrant
moving the learner to the next segment of instruction. It is designed to
determine the extent to which instructional objectives have been achieved.
Summative evaluation follows instruction and typically involves unit tests,
mid-term and final exams, projects at end of a unit or other assignments.
Summative evaluation is used primarily for:-
assigning course grades or marks,
licensing and certifying pupil mastery of the intended learning
outcomes,
reporting a student’s overall achievement,
appraising for a starting point in a following grade or course,
predicting success in related course,
reporting the overall achievement of a class.
In addition, it provides information for judging the appropriateness of
the course objectives and the effectiveness of instruction.
13
B) Formative Evaluation:- is conducted to monitor the instructional
process, to determine whether learning is taking place as planned. It occurs
during instruction. The major function of formative evaluation in the
classroom is to provide continuous feedback to both students and teacher
concerning learning success and failure or how things are going in
instructional process. Such feedback provides an opportunity for the:-
teacher for modifying instructional methods or materials and for
prescribing group and individual remedial work to facilitate effective
learning.
students for obtaining reinforcement of successful learning and for
identifying the specific learning errors which are in need of correction.
Formative evaluation requires the gathering of fairly detailed information
on frequent occasions. Information is obtained through:
teacher observation,
classroom oral questioning,
homework assignments,
classroom assignments, and
quizzes or informal inventories
14
The terms criterion-referenced and norm-referenced
evaluation pertain to interpreting the performance of students
on tests and other evaluation instruments.
A) Criterion-referenced Evaluation:- is concerned with a
way of interpreting a test score which compares an
individual’s performance to the established standard of
performance. The success or failure of an individual on
criterion – referenced evaluation depends on what he/she
scores on a test in relation to the set standard.
The criterion-referenced interpretations enable us to describe
what an individual can do without reference to the
performance of others.
15
B) Norm-referenced Evaluation:- refers to a form of interpreting
a test score that employees the practice of comparing a student’s
performance to the class performance or some external
average performance such as local, state or national averages.
The success or failure of an individual on norm-referenced
evaluation is determined on the basis of how he/she performs in
relation to his/her colleagues’ performance on the test.
 Which one of the following is an example of CRE or NRE?
1. Hundito computes simple linear equations.
2. Abebe computes simple linear equation better than 75% of the
students in the class.
3. Shallamo can spell words better than half of his classmates in the
language class of elementary school.
4. Huluagerish can convert temperature from the Celsius to the
Fahrenheit scale.
16
A) Achievement Tests:- are tests those which have representative
sampling of the course contents and are designed to measure the
extent of students’ present knowledge. They are constructed either
by a classroom teacher or by professional test makers. Based on
this, there are two types of achievement tests, which are classroom
(teacher-made) achievement tests and standardized achievement
tests.
i) Classroom achievement tests:- are designed by teachers to measure
the degree to which students have acquired information or skills taught
in a school or formal setting. They indicate the degree of students’
success in past learning. Classroom (Teacher-made) achievement tests:-
usually cover a single unit of work or that of a term
are constructed by respective teachers individually
do not contain norms of one type or another
assess specific objectives more satisfactorily than standardized
achievement tests
17
ii) Standardized achievement tests:- are designed by test
specialists to assess students either one year’s learning or more
than one year’s learning. In standardized achievement tests, the
systematic sampling of performance has been obtained under
prescribed direction of administration. Standardized achievement
tests:-
usually cover a wide range of material (contents)
are constructed by test specialists in cooperation with experts in the
subject matter area, curriculum specialists and statisticians
contain norms of one type or another
measure generally accepted goals rather than specific or particular
instructional objectives
18
B) Aptitude Tests
- used to assess learning potentials and to measure general
academic principles that would be found in a typical curriculum.
They are not designed to measure the content of any specific
curriculum.
- Aptitude tests are primarily designed to predict individuals’
success in some future jobs or learning activities. The scores
derived from aptitude tests are usually used in selecting people for
jobs, learning and the like with the hope that those so selected will
perform well on their future jobs or benefit from future learning.
19
In planning classroom achievement test, a series of basic steps
should be followed. This includes consideration in each of the
following areas:
Determining the purpose of testing;
Developing the test specification;
Selecting the appropriate item types; and
Preparing relevant test items.
20
Mental activity, such as how well a student ‘’Understands why,’’
is not directly observable in the way a student’s physical height is
directly observable. There is a need of developing clever
techniques for getting students to display what ‘’goes on in their
heads.’’ In order to achieve this purpose educationalists and
psychologists developed the techniques of stating instructional
objectives in different levels and classifying these objectives into
different categories.
Instructional Objectives- Expected behavioral outcomes after
the completion of a particular educational outcome.
 Are the terms goals, aims & objectives the same?
 How do you use them so far?
 Do you use them interchangeably or differently?
21
Goals, Aims and Objectives
The terms goals, aims and objectives are most often used
interchangeably by most educators. There are some educators,
however, that make distinction between them. For example,
Rowntres (1974) admits the existing terminological confusion in
these intentions and deliberately restricts his explanation to the
distinction of aims and objectives:
…aims represent the most abstract level among statements of
educational intent ... they provide the ethical standard
against which objectives are justified . . . if we are to know
what is strive for in the behavior or observable activity of
students, aims must be translated into objectives.
22
Similarly Hough and Duncan (1970) are confined to goals
and objectives.
Goals are the broadest, most general statements of
educational ends . . . teacher are expected to implement
these decisions in their teaching as they translate goals
into objectives.
These definitions give a general indication of the similarities
of goals with aims and their differences from objectives. The
basic difference, to save the trouble of confusion of
terminology in this area, in that the words “goals and aims”
are used in broad and abstract level of purposes while the
term ''objectives'' is used in more specific ways at operational
levels.
23
Bloom (1956) who has written extensively about the
Taxonomy of Educational Objectives defines objectives as
under:
Objectives are explicit formulation of the ways in which
students are expected to be changed by educative process.
That is the ways in which they will change in their
thinking, their feelings and their actions.
According to Bloom (1956), sources of objectives are:-
information about students' interests, attitudes and needs,
activities they are expected to perform and occupations
they are likely to have, and
the nature of the subject matter and the type of learning.
24
 Setting instructional objectives that tell what types of performance
students are expected to demonstrate will help the teacher to
choose appropriate and effective teaching methods and materials.
 Instructional objectives also communicate the intent of instruction
to students, parents, school leaders and the public.
 Thus, they:-
have a motivating effect on the lesson and students’ study
habits;
describe the intended performance to be measured at the end
setting some basis for testing and evaluation.
 In more general situation, instructional objectives provide
educators with insights to cope with the educational needs of
contemporary society.
25
Instructional objectives can be classified as general
objectives and specific objectives. According to
Gronlund (1981), general objectives are intended
outcomes of instruction that have been stated in general
enough terms to encompass a set of specific learning
outcomes. General objectives are:-
found in the syllabus, annual plan and resources of
specific learning outcomes;
not measurable and observable as that of specific
objectives.
Properly defined instructional objectives serve three
major functions. These are :-
providing direction for the instructional process;
conveying the instructional intent to others; and
providing basis for evaluating students’ learning.
26
According to Gronlund (1981), specific objectives refer to
intended (expected) outcomes of instruction that have been
stated in specific, measurable and observable student's
performance (SMART). A set of specific learning outcomes
describe a sample of the types of performance learners will
be able to exhibit when they have achieved a general
instructional objectives. Other names for specific
instructional objectives are:-
performance objectives
behavioral objectives
measurable objectives
The instructional objectives should consist of two important
elements in it: the content (subject matter) and the
behavioral construct (the manner in which students are
expected to deal with or behave toward the objectives of the
content).
27
In writing specific instructional objectives, action verbs are key
elements. The selections of these verbs play an important role in
obtaining a clearly defined set of specific instructional objectives.
The verbs are important to:-
clearly convey our instructional intent;
precisely specify the students’ performance we are willing to
accept as evidence that general objectives has been attained;
and
select appropriate evaluation techniques.
To sum up, when one writes specific instructional objectives for
the purpose of daily teaching, the objectives should be
‘’SMART’’.
S ________ Specific
M ________ Measurable
A ________ Attainable
R ________ Realistic
T ________ Time bounded
28
No.
Guidelines for Stating
General Instructional Objectives Specific Learning Outcomes
1 State general instructional objectives
as an intended learning outcome or
terminal performance of a student
State specific learning outcome as a sample of
outcome describing the terminal behavior
performed by the student
2 Begin general instructional objective
with a verb like knows, understands,
develops, appreciates, interprets, etc.
Begin specific learning outcome with an action
verb specifying the observable performance (e.
g. writes, identifies, computes)
3 Include only one general learning
outcome in a statement.
Include only one specific learning outcome in a
statement and make sure it is relevant to the
general instructional objective it describes.
4 State each general instructional
objective at a proper level of
generality.
Include enough specific learning outcomes to
describe the terminal performance included in
general instructional objective.
5 Keep each general instructional
objective sufficiently free of course
content so that it can be used with
various topics of study.
Keep all specific learning outcomes sufficiently
free of course content for similar reason.
6 Avoid or minimize the overlap with
other general instructional objectives.
Avoid or minimize the overlap with other
specific learning outcomes in the set. 29
 The taxonomy of educational objectives provides a three-domain
scheme for classifying all possible instructional objectives.
 According to Bloom (1956), Nathwon (1964) and Simpson
(1972), instructional objectives have been classified into three
categories as cognitive, affective and psychomotor domain
objectives.
 Cognitive domain objectives are concerned with knowledge
outcomes, intellectual abilities and skills.
 Affective domain objectives are about attitudes, interests,
appreciations, and modes of adjustment
 Psychomotor domain objectives deal with perceptual and motor
skills (Gronlund, 1985: Gronlund and Linn, 1990).
 Each domain is subdivided into a series of categories that are
arranged in hierarchical order from simple to complex.
30
1. Cognitive Domain
Cognitive domain varies from simple remembering of
learned material to highly creative ways of combining and
synthesizing new ideas and materials. Cognitive domain is
classified into six sub categories: knowledge,
comprehension, application, analysis, synthesis and
evaluation.
2. Affective Domain
The affective domain involves feelings, emotions and
degrees of acceptance or rejection of the material to be
learned. It varies from simple intention to complex qualities
of character and conscience. They are expressed as interests,
attitudes, appreciations, values, and emotional set of biases.
Affective domain has five subcategories: receiving,
responding, valuing, organization and characterization
by value or value complex. Affective objectives are
concerned with the attitudes. To affect preferences, opinions,
and desires is to teach values.
31
3. Psychomotor Domain
Psychomotor domain refers to some muscular or motor
skills, some manipulation of material and objects or some act
which require a neuromuscular coordination. This domain
has been classified into five sub categories: imitation,
manipulation, precision, articulation and naturalization.
Note
1. The taxonomy of educational objectives introduced above
are useful in identifying the types of learning outcomes that
should be considered as we develop list of objectives for
courses, lessons or units.
2. Unlike cognitive and psychomotor objectives, affective
objectives are not concerned with students’ abilities to do any
thing.
32
____1. The student is able to evaluate accurately the best of
two solutions to a geometry problem using standards given
by the teacher.
____2. The student responds with tolerance for others by
displaying good manners toward those of minority groups.
____3. The student identifies the names and contributions of
the five key curriculum workers as described in class.
____4. The student properly knits a baby blanket, with a
frequency of ten strands per minute.
33
___5. The student is willing to respond to the
questions on the course evaluation inventory.
____6. The student applies instructional principles properly in
planning daily lessons.
____7. The student plays table tennis for one-hour duration,
beating three inexperienced girls 100 percent of the
time.
____8. The student comprehends the Ginbot 20
overthrown of the Derg regime.
____9. The student distinguishes 30 percent of the words on a
spelling quiz.
____10. The student displays a value for higher
mathematics by attending lectures on his subject.
34
used to outline the content coverage of a test
is a two-way grid
is called a test blue print
relates instructional objectives to course content
specifies the relative emphasis to be given to each type of
learning outcomes
consists of two dimensional chart.
Vertical dimension- lists the content areas
Horizontal dimension- lists the categories of performance
35
Constructing a “Table of Specification” requires:-
obtaining a list of instructional objectives;
outlining the course/subject content; and
preparing a two – way grid (chart).
Table of Specification
Content
Objectives
Total
Knowledge Comprehension Application …
…..Evaluation
Surface area 3 3 0 6(12%)
Rectangle 2 2 6 10(20%)
Right-triangle 2 2 6 10(20%)
Rectangular prism 2 2 6 10(20%)
Cylinder 2 2 6 10(20%)
Real life problem 0 0 4 4(8%)
Total 11(22%) 11(22%) 28(56%) 50
36
The three most important attributers of a good measuring
instrument are:
Validity
Reliability
Usability
37
 The fidelity or accuracy with which a test measures what it
purports to be measuring.
 The most central and essential quality in the development,
interpretation and use of educational measures.
A test constructed to measure some characteristics is valid to the
degree that accurately and constantly measures that
characteristic.
38
Some of the measurement errors and other conditions that affect
the validity of test results are:-
poorly constructed test items
poor sampling of content
inappropriate level of difficulty
ambiguity
reading difficulty of vocabulary and sentence construction
unclear direction
cheating
guessing, etc
39
1. Content Validity: - is concerned with the adequacy of the
content of the test samples the domain of subject matter.
 is determined by a thorough inspection of the items.
 requires a balanced sample of the total number of possible items
based on the content of the subject matter and behavioral
outcomes to be measured.
 To assure high content validity of a test, teachers should
incorporate these two factors into selection of items.
40
2. Criterion-related Validity:- pertains to the technique of
studying the relationship between the test scores and independent
external measures.
2.1. Predictive Validity:- compares test scores with another
measure of performance obtained at a later time for prediction.
2.2. Concurrent Validity:- compares test scores with
another measure of performance obtained
concurrently for estimating present status.
3. Construct Validity:- is the degree to which the test scores can be
accounted for by certain explanatory constructs in a
psychological theory. (Intelligence, achievement motivation,
anxiety, attitude, interest)
41
 is the degree to which a test or other observation measures
something consistently.
 is a prerequisite to having validity
 is a necessary but not sufficient condition for quality in an
educational achievement test.
Factors that diminish the reliability of educational measures are
inconsistencies between:- earlier and later measures; test items;
alternative skills in the same content domain; raters; lack of
internal consistency within one test; ambiguous test items;
guessing and emotional factors.
42
1. Scorer Reliability
 Inter-scorers reliability concerns itself with the degree of
agreement between two markers or scorers of the same test
answer.
 Intra-scorer’s reliability deals with the degree of consistency in
grading the same test answer by the same marker on two different
occasions
The scorer reliability for an objective test is perfect if there are no
mistakes in the scoring.
2. Content Reliability:- deals with the ability of all the items of a
test to measure competencies in the same general content area.
 Refers to the internal consistency measures of the items of a test.
3. Temporal Reliability:- concerns itself with the stability of the
results of a test over time.
43
1. Test-retest method:- involves the use of a single test
 is administered twice on the same sample of subjects with about
two week time interval.
 measures the stability of the results of a test.
2. Alternate forms method: - involves administering two
forms/versions of a test to the same students
 measures the equivalency of the results of a test.
3. Split-half method: is used for measuring the internal consistency
of the items of a test.
4. Kuder – Richardson method:- is a measure of internal
consistency.
44
1. Test length:- The larger a test is, the more reliable it is.
2. Sample of heterogeneity: - the more heterogeneous a
test sample is, the higher the reliability coefficient of the
test.
3. Test timing:- The shorter the time, the higher the
reliability coefficient between the two sets of scores of the
two testing sessions
4. Irregularities:-
failure by examinees to follow test instructions.
physical conditions- lighting condition, sitting
arrangement and poor ventilation of the examination room.
personal conditions and behavior of the examinees- illness,
lack of motivation and cheating during an examination.
45
Usability deals with all the practical considerations that go into our
decisions to use a particular test or another.
 availability of the test,
 relevance of the test,
 easiness to administer the test,
 easiness to scoring, and
 interpretability of the scores.
46
The construction of good test items requires:
Thorough grasp of subject matter,
Clear conception of the desired learning outcomes,
Psychological understanding of pupils,
Sound judgment,
Persistence,
Touch of creativity, and
The skillful application of an array of simple but important
rules and suggestions.
47
 Written tests: - include both objective and subjective tests.
 Performance tests
 Oral tests
Teacher-made written tests are divided into two:
1. The objective test: is highly structured and does not give room
for both intra-scorer’s and inter-scorers’ variability in scoring.
2. The subjective/essay test: permits the pupils to select, organize
and present the answer in essay form and gives room for both
intra-scorer’s and inter scorers’ variability in scoring.
48
The objective test item can be classified into those that require the
student to:-
supply the answer, and
select the answer from a given number of alternatives.
Supply Types:- 1. Short-answer
2. Completion
Selection Types:-1. True-false
2. Matching
3. Multiple-choice
49
Suggestions for constructing short-answer and completion
type of items:
1. Word the item so that the required answer is both brief and
definite.
2. Don’t take statements directly from textbook to use as a
base for a completion or short – answer item.
3. A direct question is more desirable and natural to the student
than an incomplete statement.
4. When the answer is to be expressed in numerical units,
indicate the type of answer wanted.
5. Blanks for answers should be equal in length and in column
to the right of the question.
6. When completion items are used, do not use too many
blanks.
50
Advantages of supply test items:-
 Easy to construct.
 Minimize the opportunity of guessing.
 Ensure content validity.
Limitations of supply test items:-
Are restricted in measuring recall of simple learning
outcomes.
Are more likely to be scored erroneously than the
objectively scored formats.
Require more time to score.
Fewer computerized procedures have been developed for
processing responses to short answer items.
51
1. Alternative response/True-false Items
In writing true-false type of items, the test constructor
should bear the following “Don’ts” regulations in mind.
1. Don’t use item statements that cannot be unequivocally
classified as true or false statements.
2. Don’t use all inclusive words such as “all, always, never,
none, no” etc within the framework of a true-false test item.
3. Don’t use indefinite terms such as “greatly, usually,
frequently, and sometimes” etc.
4. Don’t use double negatives in framing a true false type of test
item.
5. Don’t use more than one single idea per item.
6. Don’t allow distinctive difference in terms of lengths between
a true statement and a false one.
The number of true statements and false statements should be
approximately equal (40 – 60%).
52
Advantages of true-false items
Easy to construct.
Cover a large area of content.
Scoring is objective.
Marking is quick and easy.
Limitations of true-false items
Measure simple learning outcomes.
Highly susceptible for guessing.
Are used only when dichotomous answers represent
sufficient response options.
53
Guidelines for constructing matching type of items:
1. Provide clear informative directions.
2. Within a single matching exercise, make the
premises and responses homogeneous.
3. Avoid one to one correspondence/perfect
matching) between premises and responses.
4. Place the longer phrases in the premise list and
the shorter phrases in the response list.
5. Use numbers to identify the premises and letters
to identify the responses.
6. Keep all the premises and responses belonging to
a single matching exercise on the same page.
54
Direction: Match the measures of angles in Column “A” with the
names under column “B” and write the letter of the
correct answer on the space provided. A response may be
used more than once or not at all.
55
Advantages of Matching Items
Measure a large quantity of associated factual
materials
Scoring is objective and easy.
Easy to construct
Limitations of Matching Items
Measure only simple learning outcomes.
Time consuming for the student.
Difficulty to find homogenous materials
Highly susceptible to the presence of irrelevant
clues.
56
Guidelines for constructing multiple-choice test items:
1. The stem of the item should be meaningful by
itself and should present definite problem.
2. The item stem should be free from irrelevant
material and be clear.
3. Use a negatively stated item stem only when
significant learning outcomes require it.
4. All of the alternatives should be grammatically
consistent with the stem of the item.
5. An item should contain only one correct or clearly
best answer.
57
6. Items used to measure understanding should contain
some novelty but not too much novelty.
7. All distracters should be plausible
8. Verbal association between the stem and the correct
answer should be avoided.
9. The relative length of the alternatives should be
approximately equal. Longer alternative tend to be the
correct answer that it may provide a clue.
10. The correct answer should appear in each of the
alternative position approximately an equal number of
times.
11. Use special alternatives such as “none of the above”
or “all of the above” sparingly.
58
Advantages of multiple-choice items:
Measure a variety of simple and complex learning outcomes.
Reduce the opportunity for guessing the correct answer.
Cover a wide range of contents.
Scoring is simple and objective.
Limitations of multiple-choice items
Difficult to construct.
Inappropriate for measuring learning outcomes requiring the
ability to organize or present ideas.
59
A) Restricted Response Type: places limitation on the
nature, length or organization of the response
 minimizes some of the weaknesses of the
extended response type
B) Extended Response Type: permits the pupil
to:-
decide which facts they think are most pertinent,
select their own method of organization, and
write as much as seems necessary to provide a
comprehensive
60
Guidelines for constructing subjective/essay test items:-
1. Restrict the use of essay items to those learning
outcomes which cannot be measured by objective items.
2. Formulate questions which will call forth the behavior
specified in the learning outcomes.
3. Phrase each question so that the student's task is clearly
indicated.
4. Indicate an appropriate time limit for each question (use
questions and generous time).
5. Avoid the use of optional items.
61
Advantages of Essay Items
Measure complex learning outcomes which
cannot be measured by other items.
Have desirable influence on the student’s
study habits.
Easy to construct.
Limitations of Essay Items
Scoring is unreliable.
Scoring is time consuming.
Limited sampling
62
Arrangement of items can be determined based on:
The type of items used,
The learning outcomes measured,
The difficulty of the items, and
The subject matter measured.
Suggested Arrangement
1. True – false/alternative response items
2. Matching items
3. Short – answer items
4. Multiple – choice items
5. Interpretive exercises
6. Essay items
63
3.1. Administering of Classroom Achievement
Test
Guiding principle:- All students must be given a fair
chance to demonstrate their achievement of the
learning outcomes being measured.
Desirable physical conditions:- adequate work
place, proper light, ventilation, and comfortable
temperature must be assured by the teacher
before administering the test.
64
Some of the things that create excessive test anxiety
are:-
Threatening pupils with tests, if they do not behave,
Warning pupils to do their best “because this test is
important,”
Telling pupils they must work fast to complete the test on
time, and
Threatening dire consequences if they fail the test.
Teachers are advised to keep the following basic guidelines in
mind while administering classroom test.
Do not talk unnecessarily before the test.
Keep interruption during the test to a minimum.
Avoid giving hints to pupils who ask about individual item.
Discourage cheating as much as possible.
65
Suggestions for scoring subjective test items:-
1. Prepare an outline of the expected answer in advance:
contains the major and characteristics of the answer to be
included and evaluated, and the amount of credit to be
allotted to each.
2. Use the Scoring method which is most appropriate:-
Either Point/Analytic method or Rating/Relative method.
3. Decide on provisions for handling factors which are
irrelevant to the learning outcomes being measured.
4. Evaluate all answers to one question before going on
to the next question.
5. Evaluate the answers without looking at pupil’s name.
6. If especially important decisions are to be based on the
results, obtain two or more independent ratings.
66
 Item analysis is the process of studying the
characteristics of a test items based on data
obtained from examinees.
 Item analysis indicates which item is difficult or
easy, which item effectively discriminates
between high and low achievers and whether the
item functions as it was intended or not.
 Item analysis data provides a basis for:-
efficient class discussion of the test result,
remedial work,
the general improvement of classroom instruction,
and
increased skill in test construction.
67
 The term item difficulty is an indication of the extent to
which an item is difficult for respondents.
 The difficulty index ranges from 0.00 – 100%.
 The items of higher indices of difficulty are easy items
and those of lower indices of difficulty are difficult items.
 For most classroom tests, it is desirable that all of the
items cluster around 50% difficulty with none of them
extremely difficult.
 The distribution of indices of difficulty with a range of 25
– 75% is also supported by many test specialists.
 Easy items are needed to discriminate among the
poorer students while difficult items are needed to
discriminate among the better ones.
68
An item difficulty level is determined by:
P =
where, P = difficulty index
CRu = number of correct responses
from upper group which is 27% of
the total respondents.
CRL = number of correct responses
from lower group (27% of the total
respondents)
Nu = number of respondents in the upper
group
NL = number of respondents in the lower
group
69
100



L
U
L
U
N
N
CR
CR
Table: Difficulty level of the item
70
Difficulty level Evaluation
0%-45% Very Difficult
45% - 55% Medium Difficult
55% - 100% Easy
Options Upper 27% Lower 27% Total
A 1 4 5
B 23 17 40
C 1 3 4
D 2 3 5
Omit 0 0 0
Total 27 27 58
From the above data,
CRu = 23 Nu = 27
CRL = 17 NL = 27
Then, P = %
74
100
74
.
0
100
54
40
100
27
27
17
23








Comment: The item is somewhat easy and appropriate for
classroom achievement test. 71
The term discriminating index is used to indicate the extent
to which a response to an item could distinguish between
high achievers and low achievers. An item discriminates in a
positive direction if more students in the upper group than
the lower group get the item right. Positive discrimination
index indicates that the item functions as it is intended. An
item discriminating power is determined by:-
From the preceding data,
CRu = 23, CRL = 17, Nu = 27 and NL = 27.
Therefore,
Comment: - It is a marginal item.
or
U
L
U
N
CR
CR
D


L
L
U
N
CR
CR
D 

r
22
.
0
27
6
27
17
23




D
72
An item will have a maximum positive discriminating
power if all students from the upper group get the item
right and all from the lower group miss it.
That is,
An item will have no discriminating power if all students
both from the upper and lower groups get the item right
or miss it.
That is, or
An item will have negative discriminating power if more
students from the lower group than the upper group get
the item right. Such items should be revised so that
they discriminate positively or discarded. Moreover,
items answered correctly or incorrectly by all
examinees can’t discriminate at all and should be
revised so that they discriminate or discarded.
1
27
0
27



D
0
27
27
27



D 0
27
0
0



D
73
Generally, indices of item discrimination can be
evaluated on the following terms as suggested by Ebel
(1972)
Table: - Discrimination Indices for Item
Evaluation
Therefore, very good classroom test items should have indices of
discrimination of 0.40 or above. Poor items to be rejected or
improved by revision are those which have indices of
discrimination below 0.19.
Index of
Discrimination
0.40 and up Very good
0.30 – 0.39
Good
0.20 – 0.29 Marginal that needs improvement
Below 0.19
Poor items (Should be discarded)
74
 Distracters should be as attractive as that of the
correct answer.
 In a properly constructed multiple-choice items, each
distracter will be selected by some students.
Specifically, it should attract more students from the
lower group than from the upper group.
 If a distracter is not selected by any one, it makes no
contribution to the functioning of the item and should
be eliminated or revised.
75
Difficulty Index (P) =
Discriminating Power (D) =
Options Upper 27% Lower 27% Total
A 12 7 19
B 12 10 22
C 0 0 0
D 3 10 13
Omit 0 0 0
100



L
U
L
U
N
N
CR
CR
%
2
.
35
100
352
.
0
100
54
19
100
27
27
7
12









P
U
L
U
N
CR
CR 
19
.
0
185
.
0
27
5
27
7
12





D
76
Options Upper 27% Lower 27% Total
A 12 7 19
B 12 10 22
C 0 0 0
D 3 10 13
Omit 0 0 0
Comment:
1. The item discriminates in a positive direction since
12 in the upper group and 7 in the lower group got
the item right, index of discriminating power (D) is
low.
2. The item is difficult for classroom achievement test.
3. Alternative “B” is a poor distracter because it attracts
more students from the upper group than from the
lower group.
4. Alternative “C” is inefficient since it attracted no one.
5. Alternative “D” is functioning as intended for it
attracted more students from the lower group than
from the upper group
6. The discriminating power of the item may be
improved by removing ambiguity in the statement of
the item and replacing alternative “B” and “C”
77
78

Assessment and Evaluation 2011.pptx

  • 1.
    Assessment and Evaluationin Learning (PGDT 423) Course Instructor: Mulugeta S. 1
  • 2.
    1.1 Meanings ofBasic Terms in Assessment & Evaluation of Learning  How do you define measurement, evaluation and assessment?  Do you think that these terms could be used interchangeably? If not, describe the differences among measurement, evaluation & assessment. 2
  • 3.
    A) Educational measurement -A systematic description of pupil's performance in terms of numbers. - The process of assigning number or quantifying to represent an individual's performance is called measurement. Therefore, evaluation is a much more comprehensive and inclusive term than measurement, which is limited to quantitative description of pupil's performance. E.g.:- Haritu solved correctly 15 of the 20 quadratic equations. Measurement includes neither qualitative descriptions of pupil's performance nor value judgments concerning the worth or value of the obtained results. E.g.:- 1. Tonja's work was neat. 2. Madebo is making a good progress in mathematics. These two examples do not belong to measurement, but they only belong to evaluation. 3
  • 4.
    B) Evaluation - Evaluationrefers to a systematic process of determining the extent to which instructional objectives are achieved by pupils. - May include either quantitative or qualitative descriptions of pupil’s performance or both. Moreover, it always involves value judgments concerning the desirability of the results. The following diagram shows that evaluation may or may not based on measurement, and when it is, it goes beyond the simple quantitative descriptions. Evaluation = Quantitative descriptions (measurement) and/or Qualitative descriptions (non-measurement) Plus Value judgments -Educational measurement involves using measuring devices.  What are the measurement devices commonly used in teaching- learning Process? 4
  • 5.
    Test  Involves aseries of questions with different item types. It is given formally while a course is on progress.  Its purpose is to assess learning progress and identify if there is learning difficulties. Quiz  A short and informal test.  It is given at class hour just at the beginning or at the end. Examination  Covers a large area of content.  It is given at the end of a course or semester.  Its main purpose is to assign grade.  The number of items included is large. 5
  • 6.
    C) Assessment - Theprocess of collecting, interpreting and synthesizing information to aid in decision making. For many people, classroom assessment means using paper and pencil tests to grade pupils. However, it is more than testing. -Includes information gathering on pupils, instruction and classroom climate by teachers. - Interpreting and synthesizing this information to help teachers understand their pupils, plan and monitor instruction and establish a conducive classroom atmosphere. Some examples of assessment are:- portfolios, which include planned collections of student work performance assessments, in which student performance on a complex task is observed use of a series of related but informal observations in order to determine complex attributes of one student or a group of students. 6
  • 7.
    Measurement and evaluationis highly required in educational enterprise when there is a need to make: Instructional decisions,- to evaluate and change /modify our teaching methods . Curricular decisions, Selection decisions, and Placement or classification decisions based on the present or anticipated educational status of the child.  The ultimate goal of measurement and evaluation is to facilitate learning. 7
  • 8.
    Other functions A) Diagnosticfunction Diagnosis deals with identifying learning difficulties. It concerns itself with gaps in students' knowledge, understanding or skills. Diagnosis puts emphasis on causes of deficiencies. The individual's educational weaknesses are pinpointed in order to:- plan remedial work for him, revise teaching strategies, and revamp or rearrange some elements of the curriculum with a view to solving the manifested deficiencies. Diagnosis test in most cases implemented when pupils do not respond to the feedback corrective prescriptions of formative testing. 8
  • 9.
    B) Placement function Placementrefers to the classification of pupils in classes or sections of a class according to their demonstrated knowledge or ability in some subject area(s). Placement concerns itself with the present status of a pupil or an individual. In other words, placement is concerned with pupils’ entry performance. Placement tests refer to pre-tests designed to measure:- whether pupils posses that knowledge and skills needed to begin a course to what extent pupils have already achieved the objectives of planned instruction determine possessions of prerequisite skills, degree of mastery of course objectives and/or best made of learning. 9
  • 10.
    C) Predictive function Whenthe results of evaluation are used in selecting a sample of pupils from a given population of pupils exposed to the test on the assumption that those so selected can benefit from further instruction or perform well on similar educational tasks in the future, such a test is said to have been used for predictive purposes. For such a test to be worth more than the piece of paper on which it is written, it must possess high predictive validity index. D) Guidance and Counseling function Results of evaluation are especially useful in assisting pupils with educational and vocational decisions. They help pupils to solve personal and social adjustment problems which require objective knowledge of pupils' abilities, interests, attitudes and other personal characteristics. 10
  • 11.
    E) Feedback provision Evaluationresults provide feedback (knowledge of result) both for the teacher and pupils concerning learning success and failure F) For reporting of pupil progress to parents Evaluation results enable the teacher to report an objective and comprehensive picture of each pupil’s progress to parents. This kind of over all objective information about pupils provides the foundation for the most effective cooperation between parents and teachers. 11
  • 12.
    One of thedistinctive features of evaluation process is the use of wide variety of approaches. These may be classified and described in many different ways, depending on the frame of reference used. Some of the approaches which can be used in the process of evaluation are: Summative versus Formative Evaluation Criterion-referenced versus Norm–referenced Evaluation Achievement Test versus Aptitude Tests 12
  • 13.
    A) Summative Evaluation:-is conducted at the end of an instructional segment to determine if learning is sufficiently completed to warrant moving the learner to the next segment of instruction. It is designed to determine the extent to which instructional objectives have been achieved. Summative evaluation follows instruction and typically involves unit tests, mid-term and final exams, projects at end of a unit or other assignments. Summative evaluation is used primarily for:- assigning course grades or marks, licensing and certifying pupil mastery of the intended learning outcomes, reporting a student’s overall achievement, appraising for a starting point in a following grade or course, predicting success in related course, reporting the overall achievement of a class. In addition, it provides information for judging the appropriateness of the course objectives and the effectiveness of instruction. 13
  • 14.
    B) Formative Evaluation:-is conducted to monitor the instructional process, to determine whether learning is taking place as planned. It occurs during instruction. The major function of formative evaluation in the classroom is to provide continuous feedback to both students and teacher concerning learning success and failure or how things are going in instructional process. Such feedback provides an opportunity for the:- teacher for modifying instructional methods or materials and for prescribing group and individual remedial work to facilitate effective learning. students for obtaining reinforcement of successful learning and for identifying the specific learning errors which are in need of correction. Formative evaluation requires the gathering of fairly detailed information on frequent occasions. Information is obtained through: teacher observation, classroom oral questioning, homework assignments, classroom assignments, and quizzes or informal inventories 14
  • 15.
    The terms criterion-referencedand norm-referenced evaluation pertain to interpreting the performance of students on tests and other evaluation instruments. A) Criterion-referenced Evaluation:- is concerned with a way of interpreting a test score which compares an individual’s performance to the established standard of performance. The success or failure of an individual on criterion – referenced evaluation depends on what he/she scores on a test in relation to the set standard. The criterion-referenced interpretations enable us to describe what an individual can do without reference to the performance of others. 15
  • 16.
    B) Norm-referenced Evaluation:-refers to a form of interpreting a test score that employees the practice of comparing a student’s performance to the class performance or some external average performance such as local, state or national averages. The success or failure of an individual on norm-referenced evaluation is determined on the basis of how he/she performs in relation to his/her colleagues’ performance on the test.  Which one of the following is an example of CRE or NRE? 1. Hundito computes simple linear equations. 2. Abebe computes simple linear equation better than 75% of the students in the class. 3. Shallamo can spell words better than half of his classmates in the language class of elementary school. 4. Huluagerish can convert temperature from the Celsius to the Fahrenheit scale. 16
  • 17.
    A) Achievement Tests:-are tests those which have representative sampling of the course contents and are designed to measure the extent of students’ present knowledge. They are constructed either by a classroom teacher or by professional test makers. Based on this, there are two types of achievement tests, which are classroom (teacher-made) achievement tests and standardized achievement tests. i) Classroom achievement tests:- are designed by teachers to measure the degree to which students have acquired information or skills taught in a school or formal setting. They indicate the degree of students’ success in past learning. Classroom (Teacher-made) achievement tests:- usually cover a single unit of work or that of a term are constructed by respective teachers individually do not contain norms of one type or another assess specific objectives more satisfactorily than standardized achievement tests 17
  • 18.
    ii) Standardized achievementtests:- are designed by test specialists to assess students either one year’s learning or more than one year’s learning. In standardized achievement tests, the systematic sampling of performance has been obtained under prescribed direction of administration. Standardized achievement tests:- usually cover a wide range of material (contents) are constructed by test specialists in cooperation with experts in the subject matter area, curriculum specialists and statisticians contain norms of one type or another measure generally accepted goals rather than specific or particular instructional objectives 18
  • 19.
    B) Aptitude Tests -used to assess learning potentials and to measure general academic principles that would be found in a typical curriculum. They are not designed to measure the content of any specific curriculum. - Aptitude tests are primarily designed to predict individuals’ success in some future jobs or learning activities. The scores derived from aptitude tests are usually used in selecting people for jobs, learning and the like with the hope that those so selected will perform well on their future jobs or benefit from future learning. 19
  • 20.
    In planning classroomachievement test, a series of basic steps should be followed. This includes consideration in each of the following areas: Determining the purpose of testing; Developing the test specification; Selecting the appropriate item types; and Preparing relevant test items. 20
  • 21.
    Mental activity, suchas how well a student ‘’Understands why,’’ is not directly observable in the way a student’s physical height is directly observable. There is a need of developing clever techniques for getting students to display what ‘’goes on in their heads.’’ In order to achieve this purpose educationalists and psychologists developed the techniques of stating instructional objectives in different levels and classifying these objectives into different categories. Instructional Objectives- Expected behavioral outcomes after the completion of a particular educational outcome.  Are the terms goals, aims & objectives the same?  How do you use them so far?  Do you use them interchangeably or differently? 21
  • 22.
    Goals, Aims andObjectives The terms goals, aims and objectives are most often used interchangeably by most educators. There are some educators, however, that make distinction between them. For example, Rowntres (1974) admits the existing terminological confusion in these intentions and deliberately restricts his explanation to the distinction of aims and objectives: …aims represent the most abstract level among statements of educational intent ... they provide the ethical standard against which objectives are justified . . . if we are to know what is strive for in the behavior or observable activity of students, aims must be translated into objectives. 22
  • 23.
    Similarly Hough andDuncan (1970) are confined to goals and objectives. Goals are the broadest, most general statements of educational ends . . . teacher are expected to implement these decisions in their teaching as they translate goals into objectives. These definitions give a general indication of the similarities of goals with aims and their differences from objectives. The basic difference, to save the trouble of confusion of terminology in this area, in that the words “goals and aims” are used in broad and abstract level of purposes while the term ''objectives'' is used in more specific ways at operational levels. 23
  • 24.
    Bloom (1956) whohas written extensively about the Taxonomy of Educational Objectives defines objectives as under: Objectives are explicit formulation of the ways in which students are expected to be changed by educative process. That is the ways in which they will change in their thinking, their feelings and their actions. According to Bloom (1956), sources of objectives are:- information about students' interests, attitudes and needs, activities they are expected to perform and occupations they are likely to have, and the nature of the subject matter and the type of learning. 24
  • 25.
     Setting instructionalobjectives that tell what types of performance students are expected to demonstrate will help the teacher to choose appropriate and effective teaching methods and materials.  Instructional objectives also communicate the intent of instruction to students, parents, school leaders and the public.  Thus, they:- have a motivating effect on the lesson and students’ study habits; describe the intended performance to be measured at the end setting some basis for testing and evaluation.  In more general situation, instructional objectives provide educators with insights to cope with the educational needs of contemporary society. 25
  • 26.
    Instructional objectives canbe classified as general objectives and specific objectives. According to Gronlund (1981), general objectives are intended outcomes of instruction that have been stated in general enough terms to encompass a set of specific learning outcomes. General objectives are:- found in the syllabus, annual plan and resources of specific learning outcomes; not measurable and observable as that of specific objectives. Properly defined instructional objectives serve three major functions. These are :- providing direction for the instructional process; conveying the instructional intent to others; and providing basis for evaluating students’ learning. 26
  • 27.
    According to Gronlund(1981), specific objectives refer to intended (expected) outcomes of instruction that have been stated in specific, measurable and observable student's performance (SMART). A set of specific learning outcomes describe a sample of the types of performance learners will be able to exhibit when they have achieved a general instructional objectives. Other names for specific instructional objectives are:- performance objectives behavioral objectives measurable objectives The instructional objectives should consist of two important elements in it: the content (subject matter) and the behavioral construct (the manner in which students are expected to deal with or behave toward the objectives of the content). 27
  • 28.
    In writing specificinstructional objectives, action verbs are key elements. The selections of these verbs play an important role in obtaining a clearly defined set of specific instructional objectives. The verbs are important to:- clearly convey our instructional intent; precisely specify the students’ performance we are willing to accept as evidence that general objectives has been attained; and select appropriate evaluation techniques. To sum up, when one writes specific instructional objectives for the purpose of daily teaching, the objectives should be ‘’SMART’’. S ________ Specific M ________ Measurable A ________ Attainable R ________ Realistic T ________ Time bounded 28
  • 29.
    No. Guidelines for Stating GeneralInstructional Objectives Specific Learning Outcomes 1 State general instructional objectives as an intended learning outcome or terminal performance of a student State specific learning outcome as a sample of outcome describing the terminal behavior performed by the student 2 Begin general instructional objective with a verb like knows, understands, develops, appreciates, interprets, etc. Begin specific learning outcome with an action verb specifying the observable performance (e. g. writes, identifies, computes) 3 Include only one general learning outcome in a statement. Include only one specific learning outcome in a statement and make sure it is relevant to the general instructional objective it describes. 4 State each general instructional objective at a proper level of generality. Include enough specific learning outcomes to describe the terminal performance included in general instructional objective. 5 Keep each general instructional objective sufficiently free of course content so that it can be used with various topics of study. Keep all specific learning outcomes sufficiently free of course content for similar reason. 6 Avoid or minimize the overlap with other general instructional objectives. Avoid or minimize the overlap with other specific learning outcomes in the set. 29
  • 30.
     The taxonomyof educational objectives provides a three-domain scheme for classifying all possible instructional objectives.  According to Bloom (1956), Nathwon (1964) and Simpson (1972), instructional objectives have been classified into three categories as cognitive, affective and psychomotor domain objectives.  Cognitive domain objectives are concerned with knowledge outcomes, intellectual abilities and skills.  Affective domain objectives are about attitudes, interests, appreciations, and modes of adjustment  Psychomotor domain objectives deal with perceptual and motor skills (Gronlund, 1985: Gronlund and Linn, 1990).  Each domain is subdivided into a series of categories that are arranged in hierarchical order from simple to complex. 30
  • 31.
    1. Cognitive Domain Cognitivedomain varies from simple remembering of learned material to highly creative ways of combining and synthesizing new ideas and materials. Cognitive domain is classified into six sub categories: knowledge, comprehension, application, analysis, synthesis and evaluation. 2. Affective Domain The affective domain involves feelings, emotions and degrees of acceptance or rejection of the material to be learned. It varies from simple intention to complex qualities of character and conscience. They are expressed as interests, attitudes, appreciations, values, and emotional set of biases. Affective domain has five subcategories: receiving, responding, valuing, organization and characterization by value or value complex. Affective objectives are concerned with the attitudes. To affect preferences, opinions, and desires is to teach values. 31
  • 32.
    3. Psychomotor Domain Psychomotordomain refers to some muscular or motor skills, some manipulation of material and objects or some act which require a neuromuscular coordination. This domain has been classified into five sub categories: imitation, manipulation, precision, articulation and naturalization. Note 1. The taxonomy of educational objectives introduced above are useful in identifying the types of learning outcomes that should be considered as we develop list of objectives for courses, lessons or units. 2. Unlike cognitive and psychomotor objectives, affective objectives are not concerned with students’ abilities to do any thing. 32
  • 33.
    ____1. The studentis able to evaluate accurately the best of two solutions to a geometry problem using standards given by the teacher. ____2. The student responds with tolerance for others by displaying good manners toward those of minority groups. ____3. The student identifies the names and contributions of the five key curriculum workers as described in class. ____4. The student properly knits a baby blanket, with a frequency of ten strands per minute. 33
  • 34.
    ___5. The studentis willing to respond to the questions on the course evaluation inventory. ____6. The student applies instructional principles properly in planning daily lessons. ____7. The student plays table tennis for one-hour duration, beating three inexperienced girls 100 percent of the time. ____8. The student comprehends the Ginbot 20 overthrown of the Derg regime. ____9. The student distinguishes 30 percent of the words on a spelling quiz. ____10. The student displays a value for higher mathematics by attending lectures on his subject. 34
  • 35.
    used to outlinethe content coverage of a test is a two-way grid is called a test blue print relates instructional objectives to course content specifies the relative emphasis to be given to each type of learning outcomes consists of two dimensional chart. Vertical dimension- lists the content areas Horizontal dimension- lists the categories of performance 35
  • 36.
    Constructing a “Tableof Specification” requires:- obtaining a list of instructional objectives; outlining the course/subject content; and preparing a two – way grid (chart). Table of Specification Content Objectives Total Knowledge Comprehension Application … …..Evaluation Surface area 3 3 0 6(12%) Rectangle 2 2 6 10(20%) Right-triangle 2 2 6 10(20%) Rectangular prism 2 2 6 10(20%) Cylinder 2 2 6 10(20%) Real life problem 0 0 4 4(8%) Total 11(22%) 11(22%) 28(56%) 50 36
  • 37.
    The three mostimportant attributers of a good measuring instrument are: Validity Reliability Usability 37
  • 38.
     The fidelityor accuracy with which a test measures what it purports to be measuring.  The most central and essential quality in the development, interpretation and use of educational measures. A test constructed to measure some characteristics is valid to the degree that accurately and constantly measures that characteristic. 38
  • 39.
    Some of themeasurement errors and other conditions that affect the validity of test results are:- poorly constructed test items poor sampling of content inappropriate level of difficulty ambiguity reading difficulty of vocabulary and sentence construction unclear direction cheating guessing, etc 39
  • 40.
    1. Content Validity:- is concerned with the adequacy of the content of the test samples the domain of subject matter.  is determined by a thorough inspection of the items.  requires a balanced sample of the total number of possible items based on the content of the subject matter and behavioral outcomes to be measured.  To assure high content validity of a test, teachers should incorporate these two factors into selection of items. 40
  • 41.
    2. Criterion-related Validity:-pertains to the technique of studying the relationship between the test scores and independent external measures. 2.1. Predictive Validity:- compares test scores with another measure of performance obtained at a later time for prediction. 2.2. Concurrent Validity:- compares test scores with another measure of performance obtained concurrently for estimating present status. 3. Construct Validity:- is the degree to which the test scores can be accounted for by certain explanatory constructs in a psychological theory. (Intelligence, achievement motivation, anxiety, attitude, interest) 41
  • 42.
     is thedegree to which a test or other observation measures something consistently.  is a prerequisite to having validity  is a necessary but not sufficient condition for quality in an educational achievement test. Factors that diminish the reliability of educational measures are inconsistencies between:- earlier and later measures; test items; alternative skills in the same content domain; raters; lack of internal consistency within one test; ambiguous test items; guessing and emotional factors. 42
  • 43.
    1. Scorer Reliability Inter-scorers reliability concerns itself with the degree of agreement between two markers or scorers of the same test answer.  Intra-scorer’s reliability deals with the degree of consistency in grading the same test answer by the same marker on two different occasions The scorer reliability for an objective test is perfect if there are no mistakes in the scoring. 2. Content Reliability:- deals with the ability of all the items of a test to measure competencies in the same general content area.  Refers to the internal consistency measures of the items of a test. 3. Temporal Reliability:- concerns itself with the stability of the results of a test over time. 43
  • 44.
    1. Test-retest method:-involves the use of a single test  is administered twice on the same sample of subjects with about two week time interval.  measures the stability of the results of a test. 2. Alternate forms method: - involves administering two forms/versions of a test to the same students  measures the equivalency of the results of a test. 3. Split-half method: is used for measuring the internal consistency of the items of a test. 4. Kuder – Richardson method:- is a measure of internal consistency. 44
  • 45.
    1. Test length:-The larger a test is, the more reliable it is. 2. Sample of heterogeneity: - the more heterogeneous a test sample is, the higher the reliability coefficient of the test. 3. Test timing:- The shorter the time, the higher the reliability coefficient between the two sets of scores of the two testing sessions 4. Irregularities:- failure by examinees to follow test instructions. physical conditions- lighting condition, sitting arrangement and poor ventilation of the examination room. personal conditions and behavior of the examinees- illness, lack of motivation and cheating during an examination. 45
  • 46.
    Usability deals withall the practical considerations that go into our decisions to use a particular test or another.  availability of the test,  relevance of the test,  easiness to administer the test,  easiness to scoring, and  interpretability of the scores. 46
  • 47.
    The construction ofgood test items requires: Thorough grasp of subject matter, Clear conception of the desired learning outcomes, Psychological understanding of pupils, Sound judgment, Persistence, Touch of creativity, and The skillful application of an array of simple but important rules and suggestions. 47
  • 48.
     Written tests:- include both objective and subjective tests.  Performance tests  Oral tests Teacher-made written tests are divided into two: 1. The objective test: is highly structured and does not give room for both intra-scorer’s and inter-scorers’ variability in scoring. 2. The subjective/essay test: permits the pupils to select, organize and present the answer in essay form and gives room for both intra-scorer’s and inter scorers’ variability in scoring. 48
  • 49.
    The objective testitem can be classified into those that require the student to:- supply the answer, and select the answer from a given number of alternatives. Supply Types:- 1. Short-answer 2. Completion Selection Types:-1. True-false 2. Matching 3. Multiple-choice 49
  • 50.
    Suggestions for constructingshort-answer and completion type of items: 1. Word the item so that the required answer is both brief and definite. 2. Don’t take statements directly from textbook to use as a base for a completion or short – answer item. 3. A direct question is more desirable and natural to the student than an incomplete statement. 4. When the answer is to be expressed in numerical units, indicate the type of answer wanted. 5. Blanks for answers should be equal in length and in column to the right of the question. 6. When completion items are used, do not use too many blanks. 50
  • 51.
    Advantages of supplytest items:-  Easy to construct.  Minimize the opportunity of guessing.  Ensure content validity. Limitations of supply test items:- Are restricted in measuring recall of simple learning outcomes. Are more likely to be scored erroneously than the objectively scored formats. Require more time to score. Fewer computerized procedures have been developed for processing responses to short answer items. 51
  • 52.
    1. Alternative response/True-falseItems In writing true-false type of items, the test constructor should bear the following “Don’ts” regulations in mind. 1. Don’t use item statements that cannot be unequivocally classified as true or false statements. 2. Don’t use all inclusive words such as “all, always, never, none, no” etc within the framework of a true-false test item. 3. Don’t use indefinite terms such as “greatly, usually, frequently, and sometimes” etc. 4. Don’t use double negatives in framing a true false type of test item. 5. Don’t use more than one single idea per item. 6. Don’t allow distinctive difference in terms of lengths between a true statement and a false one. The number of true statements and false statements should be approximately equal (40 – 60%). 52
  • 53.
    Advantages of true-falseitems Easy to construct. Cover a large area of content. Scoring is objective. Marking is quick and easy. Limitations of true-false items Measure simple learning outcomes. Highly susceptible for guessing. Are used only when dichotomous answers represent sufficient response options. 53
  • 54.
    Guidelines for constructingmatching type of items: 1. Provide clear informative directions. 2. Within a single matching exercise, make the premises and responses homogeneous. 3. Avoid one to one correspondence/perfect matching) between premises and responses. 4. Place the longer phrases in the premise list and the shorter phrases in the response list. 5. Use numbers to identify the premises and letters to identify the responses. 6. Keep all the premises and responses belonging to a single matching exercise on the same page. 54
  • 55.
    Direction: Match themeasures of angles in Column “A” with the names under column “B” and write the letter of the correct answer on the space provided. A response may be used more than once or not at all. 55
  • 56.
    Advantages of MatchingItems Measure a large quantity of associated factual materials Scoring is objective and easy. Easy to construct Limitations of Matching Items Measure only simple learning outcomes. Time consuming for the student. Difficulty to find homogenous materials Highly susceptible to the presence of irrelevant clues. 56
  • 57.
    Guidelines for constructingmultiple-choice test items: 1. The stem of the item should be meaningful by itself and should present definite problem. 2. The item stem should be free from irrelevant material and be clear. 3. Use a negatively stated item stem only when significant learning outcomes require it. 4. All of the alternatives should be grammatically consistent with the stem of the item. 5. An item should contain only one correct or clearly best answer. 57
  • 58.
    6. Items usedto measure understanding should contain some novelty but not too much novelty. 7. All distracters should be plausible 8. Verbal association between the stem and the correct answer should be avoided. 9. The relative length of the alternatives should be approximately equal. Longer alternative tend to be the correct answer that it may provide a clue. 10. The correct answer should appear in each of the alternative position approximately an equal number of times. 11. Use special alternatives such as “none of the above” or “all of the above” sparingly. 58
  • 59.
    Advantages of multiple-choiceitems: Measure a variety of simple and complex learning outcomes. Reduce the opportunity for guessing the correct answer. Cover a wide range of contents. Scoring is simple and objective. Limitations of multiple-choice items Difficult to construct. Inappropriate for measuring learning outcomes requiring the ability to organize or present ideas. 59
  • 60.
    A) Restricted ResponseType: places limitation on the nature, length or organization of the response  minimizes some of the weaknesses of the extended response type B) Extended Response Type: permits the pupil to:- decide which facts they think are most pertinent, select their own method of organization, and write as much as seems necessary to provide a comprehensive 60
  • 61.
    Guidelines for constructingsubjective/essay test items:- 1. Restrict the use of essay items to those learning outcomes which cannot be measured by objective items. 2. Formulate questions which will call forth the behavior specified in the learning outcomes. 3. Phrase each question so that the student's task is clearly indicated. 4. Indicate an appropriate time limit for each question (use questions and generous time). 5. Avoid the use of optional items. 61
  • 62.
    Advantages of EssayItems Measure complex learning outcomes which cannot be measured by other items. Have desirable influence on the student’s study habits. Easy to construct. Limitations of Essay Items Scoring is unreliable. Scoring is time consuming. Limited sampling 62
  • 63.
    Arrangement of itemscan be determined based on: The type of items used, The learning outcomes measured, The difficulty of the items, and The subject matter measured. Suggested Arrangement 1. True – false/alternative response items 2. Matching items 3. Short – answer items 4. Multiple – choice items 5. Interpretive exercises 6. Essay items 63
  • 64.
    3.1. Administering ofClassroom Achievement Test Guiding principle:- All students must be given a fair chance to demonstrate their achievement of the learning outcomes being measured. Desirable physical conditions:- adequate work place, proper light, ventilation, and comfortable temperature must be assured by the teacher before administering the test. 64
  • 65.
    Some of thethings that create excessive test anxiety are:- Threatening pupils with tests, if they do not behave, Warning pupils to do their best “because this test is important,” Telling pupils they must work fast to complete the test on time, and Threatening dire consequences if they fail the test. Teachers are advised to keep the following basic guidelines in mind while administering classroom test. Do not talk unnecessarily before the test. Keep interruption during the test to a minimum. Avoid giving hints to pupils who ask about individual item. Discourage cheating as much as possible. 65
  • 66.
    Suggestions for scoringsubjective test items:- 1. Prepare an outline of the expected answer in advance: contains the major and characteristics of the answer to be included and evaluated, and the amount of credit to be allotted to each. 2. Use the Scoring method which is most appropriate:- Either Point/Analytic method or Rating/Relative method. 3. Decide on provisions for handling factors which are irrelevant to the learning outcomes being measured. 4. Evaluate all answers to one question before going on to the next question. 5. Evaluate the answers without looking at pupil’s name. 6. If especially important decisions are to be based on the results, obtain two or more independent ratings. 66
  • 67.
     Item analysisis the process of studying the characteristics of a test items based on data obtained from examinees.  Item analysis indicates which item is difficult or easy, which item effectively discriminates between high and low achievers and whether the item functions as it was intended or not.  Item analysis data provides a basis for:- efficient class discussion of the test result, remedial work, the general improvement of classroom instruction, and increased skill in test construction. 67
  • 68.
     The termitem difficulty is an indication of the extent to which an item is difficult for respondents.  The difficulty index ranges from 0.00 – 100%.  The items of higher indices of difficulty are easy items and those of lower indices of difficulty are difficult items.  For most classroom tests, it is desirable that all of the items cluster around 50% difficulty with none of them extremely difficult.  The distribution of indices of difficulty with a range of 25 – 75% is also supported by many test specialists.  Easy items are needed to discriminate among the poorer students while difficult items are needed to discriminate among the better ones. 68
  • 69.
    An item difficultylevel is determined by: P = where, P = difficulty index CRu = number of correct responses from upper group which is 27% of the total respondents. CRL = number of correct responses from lower group (27% of the total respondents) Nu = number of respondents in the upper group NL = number of respondents in the lower group 69 100    L U L U N N CR CR
  • 70.
    Table: Difficulty levelof the item 70 Difficulty level Evaluation 0%-45% Very Difficult 45% - 55% Medium Difficult 55% - 100% Easy
  • 71.
    Options Upper 27%Lower 27% Total A 1 4 5 B 23 17 40 C 1 3 4 D 2 3 5 Omit 0 0 0 Total 27 27 58 From the above data, CRu = 23 Nu = 27 CRL = 17 NL = 27 Then, P = % 74 100 74 . 0 100 54 40 100 27 27 17 23         Comment: The item is somewhat easy and appropriate for classroom achievement test. 71
  • 72.
    The term discriminatingindex is used to indicate the extent to which a response to an item could distinguish between high achievers and low achievers. An item discriminates in a positive direction if more students in the upper group than the lower group get the item right. Positive discrimination index indicates that the item functions as it is intended. An item discriminating power is determined by:- From the preceding data, CRu = 23, CRL = 17, Nu = 27 and NL = 27. Therefore, Comment: - It is a marginal item. or U L U N CR CR D   L L U N CR CR D   r 22 . 0 27 6 27 17 23     D 72
  • 73.
    An item willhave a maximum positive discriminating power if all students from the upper group get the item right and all from the lower group miss it. That is, An item will have no discriminating power if all students both from the upper and lower groups get the item right or miss it. That is, or An item will have negative discriminating power if more students from the lower group than the upper group get the item right. Such items should be revised so that they discriminate positively or discarded. Moreover, items answered correctly or incorrectly by all examinees can’t discriminate at all and should be revised so that they discriminate or discarded. 1 27 0 27    D 0 27 27 27    D 0 27 0 0    D 73
  • 74.
    Generally, indices ofitem discrimination can be evaluated on the following terms as suggested by Ebel (1972) Table: - Discrimination Indices for Item Evaluation Therefore, very good classroom test items should have indices of discrimination of 0.40 or above. Poor items to be rejected or improved by revision are those which have indices of discrimination below 0.19. Index of Discrimination 0.40 and up Very good 0.30 – 0.39 Good 0.20 – 0.29 Marginal that needs improvement Below 0.19 Poor items (Should be discarded) 74
  • 75.
     Distracters shouldbe as attractive as that of the correct answer.  In a properly constructed multiple-choice items, each distracter will be selected by some students. Specifically, it should attract more students from the lower group than from the upper group.  If a distracter is not selected by any one, it makes no contribution to the functioning of the item and should be eliminated or revised. 75
  • 76.
    Difficulty Index (P)= Discriminating Power (D) = Options Upper 27% Lower 27% Total A 12 7 19 B 12 10 22 C 0 0 0 D 3 10 13 Omit 0 0 0 100    L U L U N N CR CR % 2 . 35 100 352 . 0 100 54 19 100 27 27 7 12          P U L U N CR CR  19 . 0 185 . 0 27 5 27 7 12      D 76 Options Upper 27% Lower 27% Total A 12 7 19 B 12 10 22 C 0 0 0 D 3 10 13 Omit 0 0 0
  • 77.
    Comment: 1. The itemdiscriminates in a positive direction since 12 in the upper group and 7 in the lower group got the item right, index of discriminating power (D) is low. 2. The item is difficult for classroom achievement test. 3. Alternative “B” is a poor distracter because it attracts more students from the upper group than from the lower group. 4. Alternative “C” is inefficient since it attracted no one. 5. Alternative “D” is functioning as intended for it attracted more students from the lower group than from the upper group 6. The discriminating power of the item may be improved by removing ambiguity in the statement of the item and replacing alternative “B” and “C” 77
  • 78.