2015 PGDT 423 (1).pptx

PGDT 2016
HARAMAYA UNIVERSITY
COLLEGE OF EDUCATION AND
BEHAVIORAL SCIENCES
Department of Psychology
Assessment and Evaluation of
Learning

Unit 1: Assessment: Concept,
Purpose, and Principles
 You might have come across with the
concepts
1. test,
2. measurement,
3. assessment, &
4. evaluation. How do you
understand them?
Test
 Test in educational context is presentation of a
standard set of questions to be answered by students.
 It is one instrument that is used for collecting
information about students’ behaviors or

Cont.
o A test is a task or a serious of tasks or questions
that students must answer or perform.
o It is used to get information regarding the extent to
which the students have mastered the subject
matter taught and the attainment of instructional
objectives.
o Test is a systematic procedure for observing (i.e.,
getting information) and describing one or more
characteristics of a person with the aid of either a
numerical scale (measurement such as test
scores) or a category (qualitative means).

Cont.
Measurement
 Measurement is the process by which the
attributes of a person are measured and described
in numbers based on certain rules.
 It is a quantitative description of the behavior or
performance of students.
 Measurement permits more objective description
concerning traits and facilitates comparisons.

Assessment
 Assessment is the planned process of gathering and synthesizing
information relevant to the purposes of discovering and
documenting students' strengths and weaknesses, planning and
enhancing instruction, and/or to check progress or status to make it
ready for decision making.
 Information may be collected using various instruments including
tests, observations of students, checklists, questionnaires and
interviews.

Cont.
 It is a process of collecting, synthesizing, and
interpreting information to aid in decision-
making (Nitko, 1996; and Airasian, 1996).
 It can be qualitative and quantitative.

Evaluation
 Is the processes of judging the quality of student learning on the basis
of established performance standards and assigning a value to
represent the worthiness or quality of that learning or performance.
 It is concerned with determining how well they have learned.
 When we evaluate, we are saying that something is good, appropriate,
valid, positive, and so forth.
 Evaluation includes both quantitative and qualitative descriptions of
student behavior and value judgment concerning the desirability of that
behavior

Cont.
 Evaluation = Quantitative description of
students’ behavior (measurement) + qualitative
description of students’ behavior (non-
measurement) + value judgment
 Evaluation involves judgment. The quantitative
values that we obtain through measurement
will not have any meaning until they are
evaluated against some standards.
Importance and Purposes of Assessment
 It can be summarized that assessment in education
focuses on:
helping LEARNING, and;
improving TEACHING.

 With regards to the learner, assessment is aimed at
providing information that will help us make decisions
concerning remediation, enrichment, selection,
exceptionality, progress and certification.
 With regard to teaching, assessment provides
information about the attainment of objectives, the
effectiveness of teaching methods and learning
materials.
Overall, assessment serves the following main purposes
in education.
1. Assessment is used to inform and guide teaching and
learning
2. Assessment is used to help students set learning
goals
3. Assessment is used to assign report card grades

for
 Grading
 Awards
 Diagnosis
 Placement
 Information on entry behavior
 Evidence of effectiveness and/or possible
shortcomings
 Opportunity to communicate to stakeholders
 Guidance
 Plan and adapt instruction
 Provide feedback and incentives(to motivate students)

Role of objectives in assessment
 The first step in planning any good teaching is to
clearly define the learning objectives or
outcomes.
 A learning objective is an outcome statement
that captures specifically what knowledge, skills,
attitudes learners should be able to exhibit
following instruction.
 Effective assessment practice requires relating
the assessment procedures as directly as
possible to the learning objectives.
 Instructional objectives which are commonly
known as learning outcomes play a key role in
both the instructional and assessment process.

Cont.
 They serve as guides for both teaching and
learning, communicate the intent of instruction
to others, and provide guidelines for assessing
students learning.
 A learning outcome stated in this way clearly
indicates the kind of performance students are
expected to exhibit as a result of the
instruction.
 Well stated learning outcomes make clear the
types of students performance we are willing to
accept as evidence that the instruction has
been successful.

Principles of Assessment
Different educators and school systems
have developed different sets of assessment
principles.
Miller, Linn and Grunland (2009) have
identified the following general principles of
assessment.
 Clearly specifying what is to be assessed has
priority in the assessment process.
 An assessment procedure should be selected
because of its relevance to the characteristics or
performance to be measured.
 Comprehensive assessment requires a variety of
procedures.
 Proper use of assessment procedures require an
awareness of their limitations.

Cont.
 The New South West Wales Department of
Education and Training (2008) in Australia are
more inclusive and listed the following
principles:
 Assessment should be relevant
 Assessment should be appropriate
 Assessment should be fair.
 Assessment should be accurate
 Assessment should provide useful information
 Assessment should be integrated into the teaching
and learning cycle.
 Assessment should draw on a wide range of
evidence
 Assessment should be manageable

Assessment and Some Basic
Assumptions
 The quality of student learning is directly, although
not exclusively related to the quality of teaching.
 To improve their effectiveness, teachers need first
to make their goals and objectives explicit and
then to get specific, comprehendible feedback on
the extent to which they are achieving those goals
and objectives.
 To improve their learning, students need to receive
appropriate and focused feedback early and often;
they also need to learn how to assess their own
learning.

Concept of Continuous
Assessment
 Continuous assessment is a learning process between teachers,
students and stakeholders (parents).
 It is a process that fosters dialogue between these stakeholders to bring
out the child’s best learning.
 It is also a holistic process that not only brings together multiple
stakeholders, but also integrates assessment and teaching as
interconnected activities that are integral to the child’s learning.
 It is an assessment approach which involves the use of a variety of
assessment instruments, assessing various components of learning.

Characteristics of Continuous
Assessment
Characteristics of continuous assessment.
a) Systematic
b) Comprehensive
c) Cumulative
d) Guidance –Oriented
A/ Systematic nature of CA
 It requires an operational plan which indicates what
measurements are to be made about the pupils’
performance, at what time intervals or times during the
school year, the measurements to be made and the
results recorded, and the nature of the tools or
instruments to be used in the measurements.
 It is planned, graded to suit the age and experience of
the children and given at suitable intervals during the
school year.

Cont.
B/ Comprehensive nature of CA
 It is comprehensive in the sense that many types of instruments are
used in determining the performance.
 It means that Continuous assessment is not focused on academic skills
alone.
 It embraces the cognitive, psychomotor and affective domains. A child
is assessed as a total entity using all the psychometric devises such as
test and non test techniques.

Cont.
(c) Cumulative Nature of CA: It is cumulative since any decision to be
made at any point on the pupil takes into account all previous
decisions.
d) Guidance-Oriented Nature of CA:
 It means that the information collected is to be used for educational,
vocational and personal- social decision-making for the child.
Guidance and counseling activities thrive better on valid, sequel,
systematic, continuous, cumulative and comprehensive information.

Assessment
1. It measures what a student knows, understands and
can do.
2. It is fair
3. It is carried out periodically over a term and over a
year
4. There are a number of different types of assessment
activities
5. Assessment and instruction are similar/going side by
side
6. There is a lack of pupil fear
7. Focus is on pupil progress

Assessment, Learning, and the Involvement of
Students
Classroom assessment promotes learning when
teachers use it in the following ways:
 When they use it to become aware of the knowledge, skills, and
beliefs that their students bring to a learning task, and;
 When they use this knowledge as a starting point for new
instruction, and monitor students’ changing perceptions as
instruction proceeds.
 When teachers and students collaborate and use ongoing
assessment and pertinent feedback to move learning forward.
 When classroom assessment is frequent and varied, teachers can
learn a great deal about their students.

Cont.
Assessment provides the feedback loop for this process.
 By increasing students’ motivation. Motivation is essential for
students’ engagement in their learning. The higher the motivation, the
more time and energy a student is willing to devote to any given task.
Even when a student finds the content interesting and the activity
enjoyable, they show sustained concentration and effort.
 Assessment can be a motivator, not through reward and punishment,
but by stimulating students’ intrinsic interest. Assessment can enhance
student motivation by:

Cont.
• emphasizing progress and achievement rather than failure
• providing feedback to move learning forward
• reinforcing the idea that students have control over, and responsibility
for their own learning
• building confidence in students so they can and need to take risks being
relevant, and appealing to students’ imaginations
• providing the scaffolding that students need to genuinely succeed

Assessment and Teacher Professional
Competence
o A teacher should have some basic competencies on classroom
assessment so as to be able to effectively assess his/her students
learning.
o Assessment activities occur prior, during, and after instruction.
o In the American education system a list of seven standards for teacher
competence in educational assessment of students has been developed.

Cont.
The seven standards are stated below.
Teachers should be skilled in:-
1. Choosing assessment options appropriate for instructional decisions.
2. Developing assessment methods appropriate for instructional
decisions.
3. Administering, scoring, and interpreting the results of assessment
methods.
4. Using assessment results when making decisions about individual
students, planning teaching, developing curriculum, and school
improvement

Cont.
5. Developing valid student grading procedures that use
student assessments
6. Communicating assessment results to students, parents,
other audiences, and educators.
7. Recognizing unethical, illegal, and otherwise inappropriate
assessment methods and uses of assessment
information.

UNIT TWO
ASSESSMENT STRATEGIES, METHODS, AND
TOOLS
There are three pairs of assessment typologies:
o formal vs. informal,
o formative vs. summative assessments
o criterion referenced vs. norm referenced,

Formal and Informal Assessment
 Formal Assessment: implies a written document,
such as a test, quiz, or paper. A formal assessment
gives a numerical score or grade based on student
performance.
 Informal Assessment: "Informal“ indicate
techniques that can easily be incorporated into
classroom routines and learning activities.
 can be used at anytime without interfering with
instructional time.
 It usually occurs in a more casual manner and may
include observation, inventories, checklists, rating
scales, rubrics, performance and portfolio
assessments, participation, peer and self

Cont.
 Methods for informal assessment can be
unstructured (e.g., student work samples, journals)
and structured (e.g., checklists, observations).
 Unstructured methods frequently are somewhat
more difficult to score and evaluate, but they can
provide a great deal of valuable information about
the skills of the students.
 Structured methods can be reliable and valid
techniques when time is spent creating the
"scoring" procedures.
 Informal assessments actively involve the
students in the evaluation process - they are not
just paper-and-pencil tests.

Formative and Summative Assessments
 Functional role during classroom instruction -formative
and summative assessment
 Formative Assessment: used to shape and guide classroom
instruction.
 Include both informal and formal assessments
 It can be given before, during, and even after instruction, the
goal is to improve instruction.
 It is ongoing assessments, appraisal, and observations in a
classroom.
 It serves a diagnostic function for both students and
teachers.

Cont.
 It helps students to adjust, improve their performance or engagement
in the unit.
 Teachers receive feedback on the quality of learners’ understandings
and consequently, can modify their teaching approaches to provide
enrichment or remedial activities to more effectively guide learners.
 It is also known by the name ‘assessment for learning’ or ‘continuous
assessment’.

Cont.
 Summative Assessment: comes at the end of a course (or unit) of
instruction.
 It evaluates the quality of students’ learning and assigns a mark to
students’ work based on how effectively learners have addressed the
performance standards and criteria.
 Assessment tasks conducted during the progress of a semester may be
regarded as summative in nature if they only contribute to the final
grades of the students.
 A particular assessment task can be both formative and summative

Criterion-referenced and Norm-referenced Assessments
Based on interpreting student performance:
 Criterion-referenced Assessment- is carried out against
previously specified criteria and performance standards or
the subject matter.
 Grade is assigned on the basis of the standard the student
has achieved on each of the criteria.
 Norm-referenced Assessment: This type of assessment
has its end point the determination of student performance
based on a position within a cohort of students – the norm
group.

Assessment Strategies
 Assessment strategy refers to those assessment tasks
(methods/approaches/activities) in which students are
engaged to ensure that all the learning objectives of a
subject, a unit or a lesson have been adequately
addressed.
Criteria for selecting assessment strategies involves:
 Its appropriateness for the particular behavior being
assessed.
 It should also be related to the course material and
relevant to students’ lives.

Cont.
 There are many different ways to categorize learning goals for
students.
o Knowledge and understanding: What facts do students know outright?
What information can they retrieve? What do they understand?
o Reasoning proficiency: Can students analyze, categorize, and sort into
component parts? Can they generalize and synthesize what they have
learned?
o Skills: We have certain skills that we want students to master such as
reading fluently, working productively in a group, making an oral
presentation, speaking a foreign language, or designing an experiment.

Cont.
 Ability to create products: Another kind of learning
target is student-created products - tangible evidence
that the student has mastered knowledge, reasoning,
and specific production skills. Examples include a
research paper, a piece of furniture, or artwork.
 Dispositions: We also frequently care about student
attitudes and habits of mind, including attitudes
toward school, persistence, responsibility, flexibility,
and desire to learn.

Cont.
 The various assessment strategies that can be used by
classroom teachers, some are described below.
Classroom presentations:
 requires students to verbalize their knowledge, select and present
samples of finished work, and organize their thoughts about a topic in
order to present a summary of their learning.
 Conferences: is a formal or informal meeting between the teacher and
a student for the purpose of exchanging information or sharing ideas.

Cont.
 Exhibitions/Demonstrations: is a performance in a public setting,
during which a student explains and applies a process, procedure, etc.,
in concrete ways to show individual achievement of specific skills and
knowledge.
 Interviews:-interview is a face-to-face conversation in which teacher
and student use inquiry to share their knowledge and understanding of
a topic or problem.
This form of assessment can be used by the teacher to:
 explore the student’s thinking;
 assess the student’s level of understanding of a concept or procedure;
and
 gather information, obtain clarification, determine positions, and probe
for motivations

Cont.
o Observation: is a process of systematically
viewing and recording students while they work,
for the purpose of making instructional decisions.
o It can take place at any time and in any setting. It
provides information on students' strengths and
weaknesses, learning styles, interests, and
attitudes.
 Observations may be informal or highly structured,
and incidental or scheduled over different periods of
time in different learning contexts.
There are various observational techniques. They
include anecdotal records, checklists, rating scales,
socio-metric techniques.

Cont.
Performance tasks: students create, produce, perform,
or present works on "real world" issues.
It may be used to assess a skill or proficiency, and
provides useful information on the process as well as
the product.
Portfolios: is a collection of samples of a student’s work
over time.
o It offers a visual demonstration of a student’s
achievement, capabilities, strengths, weaknesses,
knowledge, and specific skills, over time and in a
variety of contexts.
o For a portfolio to serve as an effective assessment
instrument, it has to be focused, selective, reflective,
and collaborative.

Cont.
 Questions and answers: Perhaps, this is a widely used
strategy by teachers with the intention of involving
their students in the learning and teaching process. In
this strategy, the teacher poses a question and the
student answers verbally, rather than in writing.

Cont.
Students’ self-assessments:
 It is the student’s own assessment of personal progress in
terms of knowledge, skills, processes, or attitudes. Self-
assessment leads students to a greater awareness and
understanding of themselves as learners
Checklists, Rating Scales and Rubrics
 These are tools that state specific criteria and allow teachers and
students to gather information and to make judgments about what
students know and can do in relation to the outcomes.

Cont.
 Checklists usually offer a yes/no format in relation to student
demonstration of specific criteria. They may be used to record
observations of an individual, a group or a whole class.
 Rating Scales allow teachers to indicate the degree or frequency of the
behaviors, skills and strategies displayed by the learner. Rating scales
state the criteria and provide three or four response selections to
describe the quality or frequency of student work.
 Rubrics use a set of criteria to evaluate a student's performance. They
consist of a fixed measurement scale and detailed description of the
characteristics for each level of performance. These descriptions focus
on the quality of the product or performance and not the quantity.

Cont.
The purpose of checklists, rating scales and rubrics is
to:
 provide tools for systematic recording of observations
 provide tools for self-assessment
 provide samples of criteria for students prior to
collecting and evaluating data on their work
 record the development of specific skills, strategies,
attitudes and behaviors necessary for demonstrating
learning
 clarify students' instructional needs by presenting a
record of current accomplishments.

Cont.
 One- Minute paper: During the last few minutes of the
class period, you may ask students to answer on a
half-sheet of paper: "What is the most important point
you learned today?" and, "What point remains least
clear to you?"
 Muddiest Point: This is similar to ‘One-Minute Paper’
but only asks students to describe what they didn't
understand and what they think might help.
 It is to determine which key points of the lesson were
missed by the students.
 Here also you have to review before next class
meeting and use to clarify, correct, or elaborate.

Cont.
 Student- generated test questions: You may allow students to write test
questions and model answers for specified topics, in a format
consistent with course exams. This will give students the opportunity
to evaluate the course topics, reflect on what they understand, and
what good test items are. You may evaluate the questions and use the
goods ones as prompts for discussion.
 Tests: is the type of assessment that you are mostly familiar with. A
test requires students to respond to prompts in order to demonstrate
their knowledge (orally or in writing) or their skills (e.g., through
performance).

Assessment in large classes
 Due to time and resources constraints, teachers
often use less time-demanding assessment
methods.
 Assessment issues associated with large classes
include:
1.Surface Learning Approach: teachers rely on time-efficient and
exam-based assessment methods for assessing large classes, such
as multiple choices and short answer question examinations.
 Assess learning at the lower levels of intellectual complexity.
 Students tend to adopt a surface rote learning approach when
preparing for these kinds of assessment methods.

Cont.
 Feedback is often inadequate
 Inconsistency in marking: Large class usually consists of a
diverse and complex group of students. The issues of different
perception towards assessments, cultural and educational
background, prior knowledge and level of interest to the subject
all pose challenges to the fairness of marking and grading.
 Difficulty in monitoring cheating and plagiarism
 Lack of interaction and engagement
 When teachers raise questions in large classes, many students
are not willing to respond.

Cont.
Assessment in large class for Effective student
learning include:
Front ending: putting in an increased effort at the
beginning in setting up the students for the
work they are going to do, the work submitted
can be improved.
 Therefore the time needed to mark it is reduced.
Making use of in-class assignments
In-class assignments are usually quick and relatively easy
to mark and provide feedback on, help you to identify gaps
in understanding.

Cont.
• Self-assessment reduces the marking load
because it ensures a higher quality of work that is
submitted, thereby minimizing the amount of time
expended on marking and feedback.
Peer-assessment
• provide useful learning experiences for students
at the same time as reducing the marking load of
staff.

Cont.
 Group Assessments: significantly reduces the marking
load if the group submits only one work.
 The major problem is that group members may not
contribute equally, so how are they to be rewarded
fairly.
 Changing the assessment method, or at least
shortening it
Being faced with large numbers of students will
present challenges but may also provide opportunities
to either modify existing assessments or to explore
new methods of assessment.

Selecting and developing assessment methods
and tools
 The process of assessing student performance
must begin with educational outcomes.
 A wide variety of tools are available for assessing
student performance.

Cont.
Constructing Tests
 Classroom tests can be consist of objective test
items and performance assessments.
 Objective tests are highly structured and require
the test taker to select the correct answer from
several alternatives or to supply a word or short
phrase to answer a question or complete a
statement.
 They are called objective because they have a
single right or best answer that can be determined
in advance.
 Performance assessment tasks permit the student
to organize and construct the answer in essay
form, by using equipment, generating hypothesis,
making observations, constructing something or

Cont.
Constructing Objective Test Items
 There are various types of objective test items.
 These can be classified into supply type items and selection type items.
 Supply type items include completion items and short answer
questions.
 Selection type test items include True/False, multiple choice and
matching.
True/False Test Items
Advantage of true/false items is that:
 do not require the student much time for
answering.

Cont.
 allows a teacher to cover a wide range of content.
 can be scored quickly, reliably, and objectively
 measuring higher mental processes of understanding, application
and interpretation.
Disadvantage
 promote memorization of factual information
 encourage students for guessing and cheating.
 Do not discriminate b/n students of varying ability like other test
items
 include more irrelevant clues than other item types
 lead a teacher to favor testing of trivial knowledge.
 Items are not unequivocally (clearly) true or false (difficulty of
writing statements which are clearly true or false)

Suggestions
Construct good quality true/false test items.
 Avoid negative statements, and never use double negatives.
 If opinion is used, attribute it to some source, unless the
ability to identify opinion is being specifically measured.
 Restrict single-item statements to single concepts
 Avoid ambiguous words and statements of broad general
statements
 Use an approximately equal number of items, reflecting the
two categories tested

Cont.
 Make statements representing both categories equal in
length.
 Avoid trivial statements-which have little importance
for knowledge and understanding.
 Avoid specific determiners like most, all, always,
sometimes, in most cases etc
 Avoid long complex sentences.

Matching Items
 A matching item consists of two lists of words or
phrases.
 The test-taker must match components in one list (the
premises, on the left) with components in the other list
(the responses, presented on the right), according to a
particular kind of association.
 It can cover a good deal of content in an efficient
fashion.

Conti…
 Used in the memorization of factual information that
are related.
 It is compact form, which makes it measure a large
amount of related factual material in a relatively
short time.
 It is easy for scoring. It is easy of construction

Limitation
 Restricted to the measurement of factual information
based on rote learning.
 the difficulty of finding homogenous material
 include in their matching items material which is less
significant.
 Susceptible to irrelevant clues.
In teacher made matching type tests, some of the
more common faults are found to be that:
 the set directions are vague
 the items to be matched are excessively long
 the list of responses lacks homogeneity
 the premises are vaguely stated.

Construction of good matching items.
1. Use fairly brief lists, placing the shorter entries on
the right. The words and phrases that make up the
premises should be short, and those that make
up the responses should be shorter still. If too
long, students tend to lose track of what they
originally set out to look for.
2. Employ homogeneous lists.
3. List responses in a logical order
4. Describe the basis for matching and the number of
times a response can be used:

Cont.
5. Try to place all premises and responses for any
matching item on a single page
6. Make sure that there are never multiple correct
responses for one stem
7. Avoid giving inadvertent grammatical clues to the
correct response
8. Use no more than 10 items in one set.
9. Provide more responses than stems to make
process-of-elimination guessing less effective.
10. Use capital letters for the response signs rather than
lower-case letters.

Short Answer/Completion Test Items
 The short answer type uses a direct question, where
as the completion test item consists of an incomplete
statement requiring the student to complete.
 The short-answer test items are one of the easiest to
construct, partly because of the relatively simple
learning outcomes it usually measures.
 Except for the problem-solving outcomes measured in
Mathematics and Science, it is used almost exclusively
to measure the recall of memorized information.

Cont.
 Partial knowledge, which might enable them to choose the correct
answer on a selection item, is insufficient for answering a short
answer test item correctly.
 There are two limitations:
 unsuitable for assessing complex learning outcomes.
 difficulty of scoring.
 This is especially true where the item is not clearly phrased to
require a definitely correct answer and the student’s spelling
ability.

Cont.
The following suggestions will help to make short-
answer type test items to function as intended.
1. Word the item so that the required answer is both brief
and specific.
2. Do not take statements directly from textbooks to use
as a basis for short-answer items..
3. A direct question is generally more desirable than an
incomplete statement.
4. If the answer is to be expressed in numerical units,
indicate the type of answer wanted.

Cont.
5. Avoid using a long quote with multiple blanks to
complete.
6. Require only one word or phrase in each blank.
7. Facilitate scoring by having the students write their
responses on lines arranged in a column to the left of
the items.
8. Only ask students important terms or expressions in
completion items.
9. Avoid providing grammatical clues to the correct
answer by using a/an, etc., instead of specific
modifiers.

Multiple-Choice Items
 This is the most popular or versatile type of selected-
response item.
 It can effectively measure learning outcomes
measured by the short-answer item, the true-false
item, and the matching item types.
 It can measure a variety of complex cognitive learning
outcomes.
 A multiple-choice item consists of a problem and a list
of suggested solutions.
 A student is first given either a question or a partially
complete statement. This part of the item is referred to
as the item’s stem.
 Then three or more potential answer-options are
presented. These are usually called alternatives,

Cont.
Variants in a multiple-choice item:
(1)The stem consists of a direct question or an incomplete
statement, and
(2)The student’s choice of alternatives to be a correct answer or a
best answer
Advantages
 widespread applicability to the assessment of cognitive skills and
knowledge,
 It’s possible to make them quite varied in the levels of difficulty
they possess.
 Items are fairly easy to score.
 The results are amenable to diagnosis.
 They provide greater structure to the question e.g. South America
. . . .a) ---b)---c)-- d)-- .The alternatives make it clear.

Cont.
 They can test students’ ability to think quickly under
pressure.
 They can be easier to modify in order to test students
at the appropriate level.
Limitation/weakness of multiple-choice
 when students review a set of alternatives for an item,
they may be able to recognize a correct answer. So it
can present an exaggerated picture of a student’s
understanding or competence, which might lead
teachers to invalid inferences.
 can never measure a student’s ability to creatively
synthesize content of any sort.
 They are difficult to construct, especially getting
plausible distracters is difficult.
 It is relatively labour intensive and time consuming to
prepare the test.

Cont.
 In an effort to come up with the necessary number of plausible
alternatives, novice item-writers sometimes toss in some alternatives
that are obviously incorrect.
 They are not well adapted to measure some learning outcomes in
mathematics, chemistry and physics etc.
 There is a possibility that students may guess the correct answer if it is
subjected to many irrelevant clues.
 As a result of recycling questions, students may get access to the
questions and achieve good marks without achieving the instructional
objectives.

1. The question or problem in the stem must be self-
contained. The stem should contain as much of the
item’s content as possible, thereby rendering the
alternatives much shorter than would otherwise be the
case.
2. Avoid negatively stated stems. Just as with the
True/False items, negatively stated stems can create
genuine confusion in students.
3. Each alternative must be grammatically consistent
with the item’s stem.
4. Make all alternatives plausible, but be sure that one of
them is indisputably the correct or best answer.
Here are some useful rules for you to
follow

Cont.
5. Randomly use all answer positions in approximately
equal numbers.
6. List alternatives on separate lines rather than
including them as part of the stem so that they can be
clearly distinguished.
7. Keep all alternatives in a similar format (e.g., all
phrases, all sentences, etc.).
8. Try to make alternatives for an item approximately the
same length. (Making the correct response
consistently longer is a common error.)
9. Use misconceptions which students have indicated in
class or errors commonly made by students in the
class as the basis for incorrect alternatives.

Cont.
10. If possible, do not use “all of the above” and “none
of the above” or use them sparingly since these
alternatives are often chosen on the basis of
incomplete knowledge.
11. Never use words such as “all,” “always,” and
“never” they are likely to signal incorrect options.
12. Use capital letters (A, B, C, D, E) on tests as
responses rather than lower-case letters (“a” gets
confused with “d” and “c” with “e” if the type or
duplication is poor).
13. Try to write items with equal numbers of alternatives
in order to avoid asking students to continually adjust
to a new pattern caused by different numbers.
14. Put the incomplete part of the sentence at the end
rather than the beginning of the stem.

Suggestions for Constructing good
distracters
 Base distracters on the most frequent errors made by students in
homework, assignments or class discussions related to that
concept.
 Use words in the distracters that are associated with words in the
stem (for example, explorer-exploration).
 Use concepts from the instructional material that have similar
vocabulary or were used in the same context as the correct
answer.
 Use distracters that are similar in content or form to the correct
answer (for example, if the correct answer is the name of a
place, have all distracters be places instead of using names of
people and other facts).
 Make the distracters similar to the correct answer in terms of
complexity, sentence structure, and length.

Constructing Performance Assessments
 Outcomes as the ability to recall, organize, and
integrate ideas; the ability to express oneself in
writing; and the ability to create.
 The most familiar form of performance-based
assessment is essay question.
 Learning outcomes concerned with the ability to
conceptualize, construct, organize, relate, and evaluate
ideas.

Cont.
Essay questions can be classified into two types –
restricted-response essay questions and extended
response essay questions.
o Restricted-response: Are usually limit both the content
and the response. The content is usually restricted by
the scope of the topic to be discussed.
o Extended response
These types of questions allow students:
 To select any factual information that they think is
relevant,
 To organize the answer in accordance with their best
judgment;
 To integrate and evaluate ideas as they deem
appropriate.

Cont.
In addition to measuring higher order thinking skills the
advantages also include the following:
 Extended-response essays focus on the integration
and application of thinking and problem solving skills.
 Essay assessments enable the direct evaluation of
writing skills.
 Essay questions, as compared to objective tests, are
easy to construct.
 Essay questions have a positive effect on students
learning.

Cont.
Limitations
 The most commonly is their unreliability of scoring.
Thus, the same paper may be scored differently by
different teachers, and even the same teacher may
give different scores for the same paper at different
times.
 The amount of time required for scoring.
 The limited sampling of content they provide.
The improvement of the essay question requires
attention to two problems:
 How to construct essay questions that call forth the
desired student response, and
 How to score the answers so that achievement is
reliably measured

Suggestions for the construction of good
essay questions
 Restrict the use of essay questions to those learning
outcomes that can not be measured satisfactorily by
objective items
 Structure items so that the student’s task is explicitly
bounded
 For each question, specify the point value, an
acceptable response-length, and a recommended time
allocation
 Employ more questions requiring shorter answers
rather than fewer questions requiring longer answers
 Don’t employ optional questions
 Test a question’s quality by creating a trial response to
the item.

Guidelines in the scoring of essay items
The following helps to make scoring easier and more
reliable.
 ensure that you are firm emotionally, mentally before
scoring
 All responses to one item should be scored before
moving to the next item
 Write out in advance a model answer to guide yourself
in grading the students’ answers
 Shuffle exam papers after scoring every question
before moving to the next
 The names of test takers should not be known while
scoring to avoid bias

Table of Specification and Arrangement of
Items
Table of Specification
 The development of valid, reliable and usable
questions involves proper planning.
 The validity, reliability and usability of such test
depend on the care with their planning and
preparation.
 Planning helps to ensure that the test covers the pre-
specified instructional objectives and the subject
matter (content).
 Planning classroom test involves identifying the
instructional objectives earlier stated and the subject
matter (content) covered during the teaching/learning
process.

Planning a classroom test.
1. Determine the purpose of the test;
2. Describe the instructional objectives and content
to be measured.
3. Determine the relative emphasis to be given to
each learning outcome;
4. Select the most appropriate item formats (essay
or objective);
5. Develop the test blue print to guide the test
construction;
6. Prepare test items that are relevant to the
learning outcomes specified in the test plan;

Cont.
7. Decide on the pattern of scoring and the
interpretation of result;
8. Decide on the length and duration of the test, and
9. Assemble the items into a test, prepare direction
and administer the test.
 The instructional objectives of the course are critically
considered while developing the test items.

Cont.
 A table of specification is a two-way table that matches
the objectives and content taught with the level at
which you expect your students to perform.
 It contains an estimate of the percentage of the test to
be associated to each topic at each level at which it is
to be measured.
 In effect we establish how much emphasis to give to
each objective or content.

Cont.
Developing a table of specification involves:
1. Preparing a list of learning outcomes, i.e. the type
of performance students are expected to
demonstrate
2. Outlining the contents of instruction, i.e. the area
in which each type of performance is to be
shown, and
3. Preparing the two way chart that relates the
learning outcomes to the instructional content.

Cont.
Cont
ents
Instructional objectives Per
cen
tag
e
Know
ledge
Com
preh
ensio
n
Appli
catio
n
Analy
sis
Synth
esis
Evalua
tion
Tot
al
Air
pressur
e
2 2 1 1 - - 6 24%
Wind 1 1 1 1 - - 4 16%
Temper
ature
2 2 1 1 - 1 7 28%
Rainfall 1 2 1 - 1 - 5 20%
Clouds 1 1 - 1 - - 3 12%
Total 7 8 4 4 1 1 25

Cont.
 The rows show the content areas from which the test
is to be sampled; and the columns indicate the level
of thinking students are required to demonstrate in
each of the content areas.
 Thus, the test items are distributed among each of the
five content areas with their corresponding
representation among the six levels of the cognitive
domain.
 The percentage row and column also shown the
degree of representation of both the contents and
levels of the cognitive domain in this particular test.
 Objectives that is more important should get more
representation in the test items.
 Similarly, content areas on which you have spent
more instructional time should be allotted more test

Cont.
 There are also other ways of developing a test blue
print.
 One of this is a way of showing the distribution of test
items among the content areas and the type of test
items to be developed from each content area.

Cont.
Cont
ents
Type of Items
True/
False
Matc
hing
Short
answ
er
Multi
ple
choic
e
Total Perc
enta
ge
Air
pressur
e
1 1 1 3 6 24%
Wind 1 1 1 1 4 16%
Temper
ature
1 2 1 3 7 28%
Rainfall 1 1 1 2 5 20%
Clouds 1 1 1 3 12%
Total 5 5 5 10 25
Percent 20% 20% 20% 40% 100%

Arrangement of test items
For most purposes the items can be arranged by a
systematic consideration of:
 The type of items used
 The learning outcomes measured
 The difficulty of the items, and
 Subject matter measured
First, the items should be arranged in sections by item
type. That is all True-false items should be grouped
together, then matching items, then all short answer or
completion items, and then all multiple choice items

Cont.
 Extended-response essay questions and performance
tasks usually take a lot of time that they would be
administered alone.
 If combined with some of the other types of items and
tasks, the extended response tasks should come last.

Cont.
This has the following advantages
- we will have a single set of direction for each type
- students can maintain the same mental set through out
each section
- scoring will be easier
Linn and Gronlund (2000: 350-351)suggest the following
arrangement order of items by format
- True False
- Matching
- Short Answer/completion
- Multiple choice
- Essay

Cont.
 For this purpose, items that measure similar outcomes
should be placed together and then arranged in order
of ascending difficulty.
For example, the items under the multiple choice section
might be arranged in the following order:
 knowledge of terms,
 knowledge of specific facts,
 knowledge of principles, and
 application of principles.
 Keeping together items that measure similar learning
outcomes is especially helpful in determining the type
of learning outcomes causing students the greatest
difficulty.

Cont.
 If it is not feasible to group the items by the learning
outcomes measured, then it is still desirable to
arrange them in order of increasing difficulty.
 Beginning with the easiest items and proceeding
gradually to the most difficult has a motivating effect
on students.
 Encountering difficult items early in the test often
causes students to spend a disproportionate amount
of time on such items.

Cont.
o Items within each test section can be arranged in
order of increasing difficulty.
o To summarize, the most effective method for
organizing items in the typical classroom test is to:
 Form sections by item type
 Group the items within each section by the
learning outcomes measured, and
 Arrange both the sections and the items
within sections in an ascending order of
difficulty, And subject matter.

Administration of Tests
 IT is the procedure of actually presenting the learning
task that the examinees are required to perform in
order to ascertain the degree of learning that has taken
place during the teaching-learning process.
 It is as important as the process of preparing the test.
 This is because the validity and reliability of test scores
can be greatly reduced when test is poorly
administered.

Cont.
 This requires the provision of a physical and
psychological environment which is conducive to
make their best efforts
Conditions that may create test anxiety on students
includs:
 Threatening students with tests if they do not behave
 Warning students to do their best “because the test
is important”
 Telling students they must work fast in order to
finish on time.
 Threatening dire consequences if they fail.

i. Ensuring Quality in Test Administration
 Guidelines and steps in ensuring quality in test
administration are:
 Collection of the question papers in time from course
teacher.
 Ensure compliance with the stipulated sitting
arrangements
 Ensure orderly and proper distribution of questions
papers to the test takers.
 Do not talk unnecessarily before the test.
 Avoid unnecessary remarks, instructions or threat that
may develop test anxiety.
 remind the test takers of the need to avoid
unprofessional conduct

Cont.
 Avoid giving hints to test takers who ask about particular
items.
 Make corrections or clarifications to the test takers whenever
necessary.
 Keep interruptions during the test to a minimum
ii) Credibility and Civility
Credibility is the value the eventual recipients and users of
results of assessment place on the result with respect to the
grades obtained, certificates issued or the issuing institution.

Conti…
Civility is whether the persons being assessed
are in such conditions as to give their best
without hindrances and burdens in the attributes
being assessed and whether the exercise is seen
as integral to or as external to the learning
process.

Cont.
 Instructions: Test should contain a set of instructions
which are usually of two types.
 One is the instruction to the test administrator while
the other one is to the test taker.
 The instruction to the test administrator should explain
how the test is to be administered the arrangements to
be made for proper administration of the test and the
handling of the scripts and other materials.
 The instructions to the administrator should be clear
for effective compliance. For the test takers, the
instruction should direct them on the amount of work
to be done or of tasks to be accomplished.

Cont.
 The instruction should explain how the test should be
performed. The language used for the instruction
should be appropriate to the level of the test takers.
The administrators should explain the test takers
instruction for proper understanding especially when
the ability to understand and follow instructions is not
part of the test.
 Duration of the Test: The time for accomplishing the
test is technically important in test administration and
should be clearly stated for both the test
administrators and test takers. Ample time should be
provided for candidates to demonstrate what they
know and what they can do. The duration of test
should reflect the age and attention span of the test
takers and the purpose of the test.

Cont.
 Venue and Sitting Arrangement: The test environment
should be learner friendly.
 Adequate physical conditions such as work space,
good and comfortable writing desks, proper lighting,
good ventilation, moderate temperature, conveniences
within reasonable distance and serenity necessary for
maximum concentration.
 Adequate lighting, good ventilation and moderate
temperature reduce test anxiety and loss of
concentration.
 Other necessary conditions: Other necessary
conditions include the fact that the questions and
question papers should be friendly with bold
characters, neat, decent, clear and appealing and not
such that intimidates test taker into mistakes.

It is the process involved in examining or analyzing
testes’ responses to each item on a test with a basic
intent of judging the quality of item.
It is the process of examining students’ responses to each
item to determine the quality of test items.
In item analysis involves: determining difficulty level,
discrimination power of test items and judging how
effectively distracters are functioning in case of multiple
choice items.
It helps to determine the adequacy of the items
within a test as well as the adequacy of the test
itself.

Cont.
Some of the reasons for Item Analysis are :
1) Identify content that has not been adequately covered and
should be re-taught,
2) Provide feedback to students,
3) Determine if any items need to be revised, be used again or
become part of an item file or bank.
4) Identify items that may not have functioned as they were
intended,
5) Direct the teacher's attention to individual student
weaknesses.

 It is a measure of the proportion of examinees who answered the
item correctly; for this reason it is frequently called the p-value.
 If scores from all students in a group are included the difficulty
index is simply the total percent correct.
 When there is a sufficient number of scores available (i.e., 100 or
more) difficulty indexes are calculated using scores from the top
and bottom 27 percent of the group.
Item analysis procedures
 Rank the papers in order from the highest to the lowest score.
 For each test item, tabulate the number of students in the upper &
lower groups who selected each option
Item difficulty level index

Cont.
 Compute the difficulty of each item (% of students
who got the right item)
 Item difficulty index can be calculated using the
following formula:
P= Success in HSG+Success in LSG
N or (HSG+LSG)
 Where, HSG = High Scoring Groups
 LSG = Low Scoring Groups
 N = the total number of HSG and LSG
The difficulty indexes can range between 0.0 and
1.0 and are usually expressed as a percentage.
A higher value indicates that a greater proportion
of examinees responded to the item correctly, and
it was thus an easier item.

Cont.
P-Value Percent
Range
Interpretation
> or = 0.75 75-100 Easy
< or = 0.25 0-25 Difficult
between .25 & .75 26-74 Average
o The average difficulty of a test is the average of the
individual item difficulties.
o For maximum discrimination among students, an average
difficulty of .60 is ideal.

Cont.
 For criterion-referenced tests, with their emphasis on
mastery-testing, many items on an exam form will have
p-values of .9 or above.
 Norm-referenced tests , are designed to be harder
overall and to spread out the examinees’ scores. Thus,
many of the items on an NRT will have difficulty
indexes between .4 and .6.

Item discrimination index
 The index of discrimination is a numerical
indicator that enables us to determine whether the
question discriminates appropriately between
lower scoring and higher scoring students.
 When students who earn high scores are
compared with those who earn low scores, we
would expect to find more students in the high
scoring group answering a question correctly than
students from the low scoring group.

 In the case of very difficult items which no one in
either group answered correctly or fairly easy
questions which even the students in the low group
answered correctly, the numbers of correct answers
might be equal for the two groups.
 What we would not expect to find is a case in which
the low scoring students answered correctly more
frequently than students in the high group.
 Item discrimination index can be calculated using the
following formula:
D= Success in HSG-Success in LSG
0.5(HSG+LSG)
 Where, HSG = High Scoring Groups
 LSG = Low Scoring Groups

Cont.
 The item discrimination index can vary from -1.00 to
+1.00.
 A negative discrimination index (between -1.00 and
zero) results when more students in the low group
answered correctly than students in the high group.
 A discrimination index of zero means equal numbers
of high and low students answered correctly, so the
item did not discriminate between groups.

Conti…
 A positive index occurs when more students in the
high group answer correctly than the low group.
 If the students in the class are fairly homogeneous
in ability and achievement, their test performance
is also likely to be similar, resulting in little
discrimination between high and low groups.

Cont.
 Questions that have an item difficulty index of 1.00 or
0.00 need not be included when calculating item
discrimination indices.
 An item difficulty of 1.00 indicates that everyone
answered correctly, while 0.00 means no one answered
correctly.
 Either type of item discriminates between students.
 When computing the discrimination index, the scores
are divided into three groups with the top 27% of the
scores in the upper group and the bottom 27% in the
lower group.

Cont.
 The number of correct responses for an item by the
lower group is subtracted from the number of correct
responses for the item in the upper group.
 The difference is divided by the number of students in
either group. The process is repeated for each item.
The value is interpreted in terms of both:
 direction (positive or negative) and
 strength (non-discriminating to strongly-discriminating).
These values can range from -1.00 to +1.00.
The possible range of the discrimination index is -1.0
to 1.

Cont.
D-Value Direction Interpretation
Strength
> +.40 Positive Strong
+.20 to +.40 Positive Moderate
-.20 to +.20 None Non-
discriminating
< -.20 Negative Moderate to
strong

Cont.
 For a small group of students, an index of
discrimination for an item that exceeds .20 is
considered satisfactory.
 For larger groups, the index should be higher because
more difference between groups would be expected.
 The guidelines for an acceptable level of
discrimination depend upon item difficulty.
 For very easy or very difficult items, low discrimination
levels would be expected; most students, regardless
of ability, would get the item correct or incorrect as the
case may be.
 For items with a difficulty level of about 70 percent, the

Cont.
 When an item is discriminating negatively, overall the
most knowledgeable examinees are getting the item
wrong and the least knowledgeable examinees are
getting the item right.
 A negative discrimination index may indicate that the
item is measuring something other than what the rest
of the test is measuring. More often, it is a sign that the
item has been mis-keyed.

Distracter Analysis
 One important element in the quality of a multiple choice
item is the quality of the item’s distracters. However,
neither the item difficulty nor the item discrimination index
considers the performance of the incorrect response
options, or distracters.
 A distracter analysis evaluates the effectiveness of the
distracters in each item by comparing the number of
students in the upper and lower groups who selected each
incorrect alternative (a good distracter will attract more
students from the lower group than the upper group).

Cont.
 Just as the key, or correct response option, must be
definitively correct, the destructors must be clearly
incorrect (or clearly not the "best" option). In addition
to being clearly incorrect, the destructors must also be
plausible. That is, the destructors should seem likely
or reasonable to an examinee who is not sufficiently
knowledgeable in the content area.
 If a destructor appears so unlikely that almost no
examinee will select it, it is not contributing to the
performance of the item. In fact, the presence of one or
more plausible destructors in a multiple choice item
can make the item artificially far easier than it ought to
be.

Cont.
 It is not desirable to have one of the distracters chosen
more often than the correct answer. This result
indicates a potential problem with the question.
 If students do not know the correct answer and are
purely guessing, their answers would be expected to
be distributed among the distracters as well as the
correct answer.
 If one or more distracters are not chosen the
unselected distracters probably are not plausible. If
the teacher wants to make the test more difficult, those
distracters should be replaced in the next tests.

Cont.
 Whenever the proportion of examinees who selected a
distracter is greater than the proportion of examinees
who selected the key, the item should be examined to
determine if it has been mis-keyed or double-keyed.
 If examinees consistently fail to select a given
distracter, this may be evidence that the distracter is
implausible or simply too easy.

Item Banking
 Building a file of effective test items and assessment
tasks involves recording the items or tasks, adding
information from analyses of students responses, and
filing the records by both the content area and the
objective that the item or task measures.
 Such a file is especially valuable in areas of complex
achievement, when the construction of test items and
assessment tasks is difficult and time consuming.
When enough high-quality items and tasks have been
assembled, the burden of preparing tests and
assessments is considerably lightened. Computer item
banking makes tasks even easier.

UNIT FOUR
RELIABILITY AND VALIDITY OF ASSESSMENT TOOLS
VALIDITY
 Validity is the most important idea to consider when preparing or
selecting a test or other measuring instrument for use. The drawing of
correct conclusions or making decisions based on the data obtained
from an assessment is validity.
 If the test lacks validity, the information it provides is useless. The
validity of a test can be viewed as the "correctness" of the decisions or
conclusions made from performance of students gathered through the
tests.
 Validity has been defined as referring to the appropriateness,
meaningfulness, and usefulness of the decisions teachers make based
on the information they collect from their students using tests and
other instruments.

Cont.
 It should be related to (1) performance on a universe of
items (content validity), (2) performance on some criterion
(criterion-related validity), or (3) the degree to which certain
psychological traits or constructs are actually represented by
test performance (construct validity).
 The term validity refers to the accuracy of test results of
students; that is, it addresses the question of how confident
can we be that a test actually indicates a person's true score
on a trait.
Types of Validity
 Validity is divided into three categories or types.
A. Content Validity
 The relevant type of validity in the measurement of a
behavior is content validity

 In assessing the content validity of a test, the teacher
asks, "To what extent does the test require
demonstration by the students who have taken the test
that constitute all aspects of the knowledge or other
behavior being measured?" This type of validity refers
to the adequacy of the assessment. According to
Whitely (1996), adequate assessment has two
components:
1. Relevance of the content of the test to the objectives
or behavior being measured, and
2. Representativeness of the content in the test.
 A relevant test assesses only those objectives that are
stated in the instruction.
 For a test to have high content validity, it should be a
representative sample of both the objectives and
contents being measured.

Cont.
1. For critical examination of the items of a measure in
relation to the behavior or the purpose of the tests, one
must make the following professional judgments:
A. Does the test content parallel to the instructional
objectives to be assessed?
B. Do the items of the measure cover different aspects
of the course and objectives?
Criterion-Related Validity
 Criterion-related validity indicates whether the scores
on a test predict scores on a well-specified,
predetermined criterion.
 There are two types of criterion-related validity. They
are concurrent validity and predictive validity.

Cont.
Concurrent validity uses correlation coefficients to
describe the degree of relationship between the scores
of two tests of students given at about the same time.
A high relationship suggests that the tests are
assessing something similar. This type of evidence is
used in the development of new standardized tests or
other measuring instruments that measure – in a
different, perhaps more efficient, way – the same thing
as an old instrument. The purpose of doing this is
usually to substitute one test by another.
In predictive validity, the data on the criterion variable
are collected some time after the data on the predictor
variable are collected. Take the case of ESLCE and
college performance. Students take the national
examination in May. Their college performances are
collected certain months in February. In both
concurrent and predictive procedures a test is related

 The term construct refers to a psychological construct,
a theoretical conceptualization about an aspect of
human behavior that cannot be measured or observed
directly (Ebel and Frisbie, 1991).
 Construct validity is an interpretation or meaning that
is given to a set of scores from tests that assess a
behavior or theory that cannot be measured directly,
such as measuring an unobservable trait like
intelligence, creativity, or anxiety. For example, we use
tests to measure intelligence. Intelligence is a variable
that we cannot directly observe it.
 We infer it from the students’ test scores. Students
who scored high on the test are said to be intelligent.

Cont.
Face Validity
 Strictly speaking, face validity is not a major type of
validity. It refers to the degree of the content of the test
look valid or the extent to which a test appears to
measure what it is intended to measure, to those, for
example teachers who prepared the test items, who
administer and/or value the test (Worthen et al., 1999).
Face validity may not be as important as content
validity, criterion related validity or construct related
validity from measurement perspective.

 Validity is influenced by a number of factors. The
following are some major ones.
Factors in the test itself
 The following factors can prevent the test items from
functioning as intended and thereby lower the validity
of the interpretations from the assessment results. The
first five factors are equally applicable for
assessments requiring extended student performance
and traditional tests. The last five factors apply most
directly to tests with fixed choice or short answer
items that are scored right or wrong.

Cont.
 Unclear direction
 Too difficult vocabulary and sentence structure
 Ambiguity in sentence structure
 Inadequate time limits
 Overemphasis of easy to-assess aspects of domain at the
expense of important –but hard to assess aspects
 Test items inappropriate for the outcomes being
measured
 Poorly constructed test items
 Test too short
 Improper arrangement of items
 Identifiable pattern of answers

Cont.
Factors in administration and scoring
 In case of teacher made tests, such factors as insufficient
time, unfair aid to individual students who ask for help,
cheating, and unreliable scoring of student performances
tend to lower validity. In the case of published tests, failure
to follow the standard directions and time limits, giving
students unauthorized assistance, and errors in scoring
similarly contribute to lower validity.
Factors in student responses
 Some students may be bothered by emotional disturbances
that interfere with their performance. Others may be
frightened by the assessment situation and so are unable
to respond normally, and still others may be motivated to
put forth their effort.

RELIABILITY:
 The degree of consistency between two measures of
the same thing.
 The measure of how stable, dependable, trustworthy,
and consistent a test is in measuring the same thing
each time.
 Reliability can be defined as a measure of how
consistent our measurements are. Scores that are
highly reliable are accurate and can be reproduced.
Reliability refers to the consistency of assessment
results over time and with different samples of
students. It is the adequacy of assessment devices to
assess what they are supposed to assess again and
again.

Cont.
 Consistency can be a) consistency over a period of
time (stability),b) consistency over different forms and
c) internal consistency.
 Consistent measurement is a necessary condition for
high quality educational and psychological testing.
 Although reliability is a necessary condition for valid
test score, it is not a sufficient condition.
 Reliability affects validity and quality of decision.

Methods of estimating reliability
 There are several methods of estimating reliability of a
measuring instrument or a test. The common ones are
stability, equivalence, stability and equivalence, internal
consistency, and rater agreement.
Stability (Test-retest)
 A coefficient of stability is obtained by correlating scores
from the same test of a group of individuals on two
different occasions. If the scores of the individuals are
consistent (that is, if those scoring high the first time also
score high the second time, and so on) then the correlation
coefficient, and the reliability, are high. This test-retest
procedure assumes that the characteristic measured
remains constant.

Cont.
Equivalence (Parallel forms)
 It is obtained by giving two forms (with equal content,
means, and variances) of a test to the same group on
the same day and correlating these results. Here we
are determining how confidently we can generalize a
person’s score to what he would receive if he took a
test composed of similar but different questions. When
two equivalent or parallel forms of the same
instrument are administered to the same group of
students at about the same time, and the scores are
related, the reliability that results is a coefficient of
equivalence.

Cont.
Equivalence and Stability
 When a teacher needs to give a pretest and posttest to
assess a change in behavior, a reliability coefficient of
equivalence and stability should be established. In this
procedure, reliability data are obtained by
administering to the same group of individuals one
form of a test at one time and a second form at a later
date. To minimize the effects of memory, chance and
maturation factors, reliability coefficients estimated
from parallel-forms of a test are preferred to test-retest
reliability coefficients.

Cont.
Internal Consistency
 Internal consistency is another type of estimating
reliability. It is the most common type of reliability since it
can be estimated from giving one form of a test once.
There are three common types of internal consistency:
Split-half method, Kuder-Richardson, and the Cronbach
Alpha methods.
a. The Split-Half Method
In split-half reliability, the items of a test that have been
administered to a group are divided into two comparable
halves, and a correlation coefficient is calculated
between the halves. If each student has about the same
scores on each half, then the correlation is high and the
test has high reliability.

Cont.
 Each half should be of similar difficulty. This method
provides a lower reliability than other methods, since
the total number in the correlation equation contains
only half the items (and we know that other things
being equal, longer tests are more reliable than short
tests).
 This technique should not be used with speeded tests.
This is because all students do not answer all items, a
factor that tends to increase the correlations between
the items.
 In splitting the test into two halves, one might put, for
example, the odd and even-numbered items.

Cont.
 Fortunately, one can estimate the reliability (rxx) of the full
test from roe via the Spearman-Brown formula.
 Where roe = the Pearson correlation between the half-test
scores on the odd and the even items. The Spearman-
Brown split half method assumes the two halves have
equal standard deviations.
Kuder-Richardson Method
 They have developed a number of formulas in order to
correlate all items on a single test with each other when
each item is scored right or wrong, correct or incorrect, yes
or no, and so on. K-R reliability is thus determined from a
single administration of a test for the same group of
students, but without having to split the test into equivalent
halves.

Cont.
 This procedure assumes that all items in the test are
equivalent to each other, and it is appropriate when the
purpose of the test is to measure a single behavior, for
example, reading ability of students.
 If a test has items of varying difficulty or if it measures
more than one behavior, the KR estimates would
usually be lower than the split-half reliabilities.

 The Cronbach Alpha, or sometimes called Alpha
Coefficient, developed by Cronbach (1957), also
assumes that all items have similar difficulty levels. It
is a much more general form of internal consistency
than the KR, and it is used for items that are not
scored right or wrong, yes or no, true or false. The
Cronbach Alpha is generally the most appropriate type
of reliability for tests or questionnaires in which there
is a range of possible answers for each item.

Raters Agreement
The fifth type of reliability is expressed as a coefficient
of agreement. This is established by determining the
extent to which two or more persons agree about what
they have seen, heard, scored, or rated. For example,
when two or more teachers score the answers of
students for essay items, will they give the same or
similar scores for the students, i.e., do they agree on
what they score? If they do, then there is some
consistency in measurement.

FACTORS INFLUENCING RELIABILITY
A number of factors influence reliability of test scores.
a. Test related factors
 These factors include test length, difficulty of test items and
score variability.
 Test length. When we have a large number of items, the reliability
of the items would increase.
 Difficulty of test items. Score variability depends on difficulty
level of items. If items are too difficult, only few students will
answer them. As a result, scores will be all low. On the other
hand, if items are too easy, many students will answer them. As a
result, scores will be mostly high. In both instances, scores do
not vary very much. This contributes to low reliability index. In
contrast, in moderately difficult items students scores are highly
likely to vary which will result in high reliability index.

Cont.
 According to Ebel & Frisbie (1991) items having 40% to 80%
difficulty level contribute much to reliability. On the other
hand, items that have more than 90 percent or fewer than
30 percent of the examinees answer correctly cannot
possibly contribute to reliability.
Score Variability
 As scores vary, reliability tends to be higher. Compared to
true-false items, multiple-choice items yield higher
reliability indices. This is so because in true-false items
students have a 50% probability of getting a correct answer
by chance thereby contributing to low score variability
among students. In multiple choice items with four options,
on the other hand, the probability of getting an item right by
chance is 25% which results in better score variability
among students.

 These factors include the nature of the group tested,
student testwiseness, and student motivation
 Nature of the group tested. Assume that all students in your
class are brilliant. When you administer a test, they score
very high and the difference between the highest and the
lowest score is very narrow. If all the students are weak,
their scores would be low. Accordingly, the range of the
scores between the maximum and minimum would be low.
But if there are high achieving, average, and low achieving
students in the same classroom, their scores vary from
high to low. Thus, reliability is higher in heterogeneous
students than in homogenous students.

Cont.
 Student testwiseness. Some students are wise when
they are taking tests. Though they do not know the
idea, they are very clever at guessing the correct
answer for the item using some clues or something
that leads them to the correct answer. Students
therefore vary in their skill of taking tests. This skill
creates a difference in the students’ scores. When
students vary considerably in the level of
testwiseness, error scores will be higher which in turn
results in lower reliability of scores.

Cont.
 Student Motivation. Who performs well: A motivated
student or an unmotivated student? Does motivation
have any effect on academic performance of students?
Obviously, academic motivation has a strong effect on
students’ performance. Students who are motivated
academically tend to achieve higher than those who
are not motivated. When students are unmotivated,
their performances may not be the reflections of their
actual performances. In a classroom where students
vary in terms of motivation there will be variability in
scores leading to lowered reliability.

 These factors include time limits and cheating
opportunities
 Time Limits. Whether a test is speeded or power matters
when it comes to the reliability of the test. When internal
consistency reliability indices are determined from a single
administration speeded tests, they will vary highly. This is
because in speeded tests students' scores are likely to be
the number of items attempted. Thus, it is suggested that
when reliability is to be determined to speeded tests, two
test administration be used and correlation calculated.
 Cheating. Any form of cheating reduces score reliability.
Cheating includes copying answers from others, using
cheat sheets, passing answers to next exam halls, getting a
test prior to its administration, etc.

Ethical Standards of assessment
Ethical and Professional Standards of
Assessment and its Use
 Ethical standards guide teachers in fulfilling their
obligation to provide and use tests that are fair to all
test takers regardless of age, gender, disability,
ethnicity, religion, linguistic background, or other
personal characteristics.

Cont.
 Fairness is a primary consideration in all aspects of
testing. It:
helps to ensure that all test takers are
given a comparable opportunity to
demonstrate what they know and how they
can perform in the area being tested.
implies that every test taker has the
opportunity to prepare for the test and is
informed about the general nature and
content of the test.
also extends to the accurate reporting of
individual and group test results.

teachers may consider in their assessment
practices.
1. Teachers should be skilled in choosing assessment
methods appropriate for instructional decisions.
2. Teachers should develop tests that meet the intended
purpose and that are appropriate for the intended test
takers.
3. The teacher should be skilled in administering,
scoring and interpreting the results from diverse
assessment methods.

Cont
4. Teachers should be skilled in using assessment results
when making decisions about individual students, planning
teaching, developing curriculum, and school improvement
5. Teachers should be skilled in developing valid pupil
grading procedures which use pupil assessments.
6. Teachers should be skilled in communicating assessment
results to students, parents, other stakeholders and other
educators.
7. Teachers should be skilled in recognizing unethical, illegal,
and otherwise inappropriate assessment methods and uses
of assessment information.

Cont.
In addition, the following are principles of grading that can
guide the development of a grading system.
 The system of grading should be clear and understandable
(to parents, other stakeholders, and most especially
students).
 The system of grading should be communicated to all
stakeholders (e.g., students, parents, administrators).
 Grading should be fair for all students regardless of gender,
socioeconomic status or any other personal
characteristics.
 Grading should support, enhance, and inform the
instructional process.

assessments
 Fairness is fundamentally a socio-cultural, rather than
a technical, issue.
 Students represent a variety of cultural and linguistic
backgrounds. If the cultural and linguistic
backgrounds are ignored, students may become
alienated or disengaged from the learning and
assessment process. Teachers need to be aware of
how such backgrounds may influence student
performance and the potential impact on learning.
Teachers should be ready to provide accommodations
where needed.

Cont
Classroom assessment practices should be sensitive to
the cultural and linguistic diversity of students in order
to obtain accurate information about their learning.
Assessment practices that attend to issues of cultural
diversity include those that
 acknowledge students’ cultural backgrounds.
 are sensitive to those aspects of an assessment that
may hamper students’ ability to demonstrate their
knowledge and understanding.
 use that knowledge to adjust or scaffold assessment
practices if necessary.

Cont
Assessment practices that attend to issues of linguistic
diversity include those that
 acknowledge students’ differing linguistic abilities.
 use that knowledge to adjust or scaffold assessment
practices if necessary.
 use assessment practices in which the language demands
do not unfairly prevent the students from understanding
what is expected of them.
 use assessment practices that allow students to accurately
demonstrate their understanding by responding in ways
that accommodate their linguistic abilities, if the response
method is not relevant to the concept being assessed (e.g.,
allow a student to respond orally rather than in writing).

Cont
 Teachers must make every effort to address and minimize the
effect of bias in classroom assessment practices. Bias occurs
when irrelevant or arbitrary factors systematically influence
interpretations and results made that affect the performance of
an individual student or a subgroup of students.
 Assessment should be culturally and linguistically appropriate,
fair and bias-free.
For an assessment task to be fair, its content, context, and
performance expectations should:
 reflect knowledge, values, and experiences that are equally
familiar and appropriate to all students;
 tap knowledge and skills that all students have had adequate
time to acquire;
 be as free as possible of cultural and ethnic stereotypes

Disability and Assessment
Practices
 It is quite obvious that many countries education
system were exclusionary in fully accommodating the
educational needs of disabled students.
 This has been true not only in our country but in the
rest of the world as well, although the magnitude might
differ from country to country.
 It was in response to this situation that UNESCO has
been promoting the principle of inclusive education to
guide the educational policies and practice of all
governments.

Cont
 Different world conventions were held and documents
signed towards the implementation of inclusive
education. Our country, Ethiopia, has been a signatory
of these documents and therefore has accepted
inclusive education as a basic principle to guide its
policy and practice in relation to the education of
disabled students

Cont
 Inclusive education is based on the idea that all
students, including those with disabilities, should be
provided with the best possible education to develop
themselves. This implies for the provision of all
possible accommodations to address the educational
needs of disabled students. Accommodations should
not only refer to the teaching and learning process. It
should also consider the assessment mechanisms and
procedures.

cont
 There are different strategies that can be considered to
make assessment practices accessible to students with
disabilities depending on the type of disability. The
following strategies could be considered in summative
assessments:
 Modifying assessments: - This should enable disabled
students to have full access to the assessment without
giving them any unfair advantage.
 Others’ support: - Disabled students may need the support
of others in certain assessment activities which they can
not do it independently. For instance, they may require
readers and scribes in written exams; they may also need
others’ assistance in practical activities, such as using
equipments, locating materials, drawing and measuring.

Cont
 Time allowances: - Disabled students should be given
additional time to complete their assessments which the
individual instructor has to decide based on the purpose
and nature of the assessment.
 Rest breaks: Some students may need rest breaks during
the examination. This may be to relieve pain or to attend to
personal needs.
 Flexible schedules: In some cases disabled students may
require flexibility in the scheduling of examinations. For
example, some students may find it difficult to manage a
number of examinations in quick succession and need to
have examinations scheduled over a period of days.

cont
 Alternative methods of assessment:- In certain
situations where formal methods of assessment may
not be appropriate for disabled students, the instructor
should assess them using non formal methods such
as class works, portfolios, oral presentations, etc.
 Assistive Technology: Specific equipment may need to
be available to the student in an examination. Such
arrangements often include the use of personal
computers, voice activated software and screen
readers.

Gender issues in assessment
 Teachers’ assessment practices can also be affected
by gender stereotypes. The issues of gender bias and
fairness in assessment are concerned with differences
in opportunities for boys and girls. A test is biased if
boys and girls with the same ability levels tend to
obtain different scores.

cont
Test questions should be checked for:
 material or references that may be offensive to
members of one gender,
 references to objects and ideas that are likely to be
more familiar to men or to women,
 unequal representation of men and women as actors in
test items or representation of members of each
gender only in stereotyped roles.

Cont.
 If the questions involve objects and ideas that are
more familiar or less offensive to members of one
gender, then the test may be easier for individuals of
that gender. Standards for achievement on such a test
may be unfair to individuals of the gender that is less
familiar with or more offended by the objects and ideas
discussed, because it may be more difficult for such
individuals to demonstrate their abilities or their
knowledge of the material.

2015 PGDT 423 (1).pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2015 PGDT 423 (1).pptx

Similar to 2015 PGDT 423 (1).pptx (20)

Recently uploaded

Recently uploaded (20)

2015 PGDT 423 (1).pptx